Image via WikipediaLet us say you are a researcher and are doing a gene expression study on some tissue. Today, the chances are that you will run some microarrays and look at the expression profile and then try and correlate the expression profiles of a number of samples with associated data.
Fast forward a few years. I am convinced that a lot of such data will be available via search engines or data portals. Already you are beginning to see a number of commercial and public engines come to life (NextBio, Oncomine, etc). Earlier this week I read an announcement (sub reqd) by the NCI to create a Cancer Molecular Analysis Portal, which will integrate data sets from the Cancer Genome Atlas project and other cancer genomics studies.
The key here is that we already have a body of work using microarrays and other molecular profiling systems, and in many cases, people are just repeating experiments which someone, somewhere has already carried out. Unless there is something inherently proprietary in those studies (e.g specific dose-response studies), there is no reason to repeat that experiment, especially for technologies that are relatively stable and don’t have too much cross-platform/cross-lab variation (one of the goals of the MAQC projects has been to understand these variations). The second key, and to an extent perhaps even more important, is how these data are made available. Personally, I really like the NextBio interface. Will the business model work? I am not sure, but definitely the idea and concept make a lot of sense.
There is a lot of work being repeated over and over again because access to the data is not easily available. This is one of the big changes that will take place over the next few years, as the same principles that make PubMed or GenBank so useful start permeating all databases.