How things have changed

It is often hard to really see how things change when you are in the middle of it. We take for granted so much that was simply unattainable just a short while ago.

Web 2.0 tools allow the rapid prototyping of an idea for low cost. We can then work towards perfection by easily making modifications. An example.

Youtube allows us to easily access video created by other people. Great video can be done by almost anyone with a great idea and a strong vision. Matt Harding is a great example. He took some video he made while traveling and created something special. It has been watched about a million times at youtube.

Simple idea. Dance the same dance around the world. The familiar mixed with the exotic. All in less than 3 minutes.

He expanded this, using some corporate sponsorship, to become a 6 month trip though 39 countries, resulting in this video with an incredible opening. It was better.

It was not a big marketing agency that created this but a guy with a camera and an idea. The prototype demonstrated what would work, permitting another effort to enlarge the scope. This video has had over 10 million views since it came out 2 years ago.

Now he has a new one. It came from another idea. In the previous videos, he was the only one dancing. He wanted to include other people. So he went back to the corporate sponsors, pitched the idea to them and here is the result, in high definition.

Where the Hell is Matt? (2008) from Matthew Harding on Vimeo.

It has had almost 5 million views at Youtube since it was uploaded June 20. It does not have the spectacular opening of the second video but it has so much more humanity. From the opening, similar to the first video, to the people from around the world.

Seeing children and adults from every continent dancing ‘together’ is incredible. Simple yet so evocative.

Creativity can come from anywhere. The tools of innovation are so simple and cheap today that a much larger pool of talent can be accessed. Smart companies will access them.

Successful companies will help foster an environment where ideas can be seen, examined and modified as we work towards perfection.

Technorati Tags: ,

Scientific commuity building

sand by …†∆†¡∆µ∆
Building scientific communities:
[Via business|bytes|genes|molecules]
Here is an interesting point that should be discussed more, especially with scientific community building (my bolding).

I will start with something I have quoted all too often

Data finds data, then people find people

That quote by Jon Udell, channeling Jeff Jonas is one that, to me at least, defines what the modern web is all about. Too many people tend to put the people first, but in the end without common data to commune around, there can be no communities.

A community needs a purpose to exist, a reason to come together. Some communities arise because of similar political or gardening interests. Most research communities come together for one major reason – to deal with data.

Now data simply exists, like grains of sand. It requires human interaction to gain context and become information. In social settings, this information can be transformed into the knowledge that allows a decision to be made, decisions such as ‘I need to redo the experiment’ or ‘I can now publish.’

It used to be possible for a single researcher, or a small number, to examine a single handful of sand in order to generate information needed to answer scientific questions. Now we have to examine an entire beach or even an entire coastline. A much larger group of people must now be brought together to provide context for this data in any reasonable timeframe.

However, standard approaches are too slow and cumbersome. When one group can add 45 billion bases of DNA sequence to the databases a week, the solution cycle has to be shortened.

Science is an intellectual pursuit, whether it is formal academic science or just casual common interest. That’s where all the tools available today come into the picture. The data has always been there. Whether at the backend, or at the front end, we can think about how to get everything together, but being able to discovery and find some utility is very important. One of the reasons the informatics community seems to thrive online, apart from inherent curiosity and interest in such matters, is that we have a general set of interests to talk about, from programming languages, to tools to methods, to just whining about the fact that we spend too much time data munging. Successful life science communities need that common ground. In a blog post, Egon talks about JMOL and CDK. Why would I participate in the CDK community, or the JMOL one? Cause I have some interest in using or modifying JMOL, or finding out more about the CDK toolkit and perhaps using it. Successful communities are the ones that can take this mutual interest around the data and bring together the people.

Part of what is being discussed here is a common language and interest that allows rapid interactions amongst a group. In some ways, this is not different than a bunch of people coalescing around a cult TV show and forming a community. A difference is that the latter is a way to transform information that has purely entertainment value.

The researchers are actually trying to get their work done. What Web 2.0 approaches do is permit scientists to come together in virtual ad hoc communities to examine large amounts of data and help transform that into knowledge. Instead of one handful at a time, buckets and truckloads of sand can be examined at one time, with a degree of intensity impossible for a small group.

The size and depth of these ad hoc communities, as well as their longevity, will depend on the size of the beach, just how much data must be examined. But I guarantee that there will always be more data to examine, even after publication.

So my advice to anyone building a scientific community (the one that jumped out at me during the workshop was the EcoliHub) is to think about what the underlying data that could bring together people is first. Data here is used in a general sense. Not just scientific raw data, but information and interests as well. Then trying and figure out what the goals are that will make these people come together around the data and then figure out what the best mechanism for that might be. Don’t put the cart before the horse. In most such cases, you need a critical mass to make a community successful, to truly benefit from the wealth of networks. In science that’s often hard, so any misstep in step 1, will usually end up in a community that has little or no traction.

EcoliHub is a great example of a website in the wild that is supported almost entirely in an Open Source fashion. This is a nice way to create a very strong community focussed on a single, rich topic. On the wide open Internet, though, it may be harder for smaller communities to come into existence, simply because of how hard it might be for the individual members of the community to find one another.

But there are other processes allowing other communities to come together with smaller goals and more focussed needs. The decoupling of time and space seen with Web 2.0 approaches, frees these groups from having to wait until the participants can occupy the same space at the same time. These group can examine a large amount of data rapidly and move on. There is not the need to assure the community that it will be around for a long time.

This is the sort of community that may be more likely to come into existence inside an organization. There are other pressures that drive the creation of these types of groups than simply a desire to talk with people of similar interests about some data.

A grant deadline for example.

Technorati Tags: ,

Two a day

hard drive platters by oskay
15 human genomes each week:
[Via Eureka! Science News – Popular science news]

The Wellcome Trust Sanger Institute has sequenced the equivalent of 300 human genomes in just over six months. The Institute has just reached the staggering total of 1,000,000,000,000 letters of genetic code that will be read by researchers worldwide, helping them to understand the role of genes in health and disease. Scientists will be able to answer questions unthinkable even a few years ago and human medical genetics will be transformed.

Some of this is part of the 1000 Genomes Project, an effort to sequence that many human genomes. This will allow us to gain a tremendous amount of insight into just what it is that makes each of us different or the same.

All this PR really states is that they are now capable of sequencing about 45 billion base pairs of DNA a day. They are not directly applying all of that capability to the human genome. While they, or someone, possibly could, the groups involved with 1000 genomes will take a more statistical approach to speed things up and lower costs.

It starts with in depth sequencing of a couple of nuclear families (about 6 people). This will be high resolution sequencing equivalent to 20 passes of the entire genome of each. This level of redundancy will help edit out any sequencing errors from the techniques themselves. All these approaches will help the researchers get a better handle on the most optimal processes to use.

The second step will look at 180 genomes but with only 2 sequencing passes. The high level sequence from the first step will serve as a template for the next 180. The goal here is to be able to rapidly identify sequence variation, not necessarily to make sure every nucleotide is sequenced. It is hoped that the detail learned from step 1 will allow them to be able to infer similar detail here without having to essentially re-sequence the same DNA another 18 times.

Once they have these approaches worked out, and have an idea of the level of genetic variation expected to be seen, they will examine just the cgene oding regions of about 1000 people. This will inform them of how best to proceed to get a more detailed map of an individual’s genome.

This is because the actual differences expected to be found among any two humans’ DNA sequences is expected to be quite low. So they want to identify processes that will highlight these differences as rapidly and effectively as possible.

They were hoping to be sequencing the equivalent of 2 human genomes a day and they are not too far off of that mark. At the end of this study, they will have sequenced and deposited into databases 6 trillion bases (a 6 followed by 12 zeroes). In December 2007, GenBank, the largest American database had a total of 84 billion bases (84 followed by 9 zeroes) that took 25 years to produce.

So this effort will add over 60 times as much DNA sequence to databases as have already been deposited! It plans to to this in only 2 years. The databases, and the tools to examine them, will have to adapt to this huge influx of data.

And, more importantly, the scientists doing the examining will have to appreciate the sheer size of this. It took 13 years to complete the Human Genome Project. Now, 5 years after that project was completed, we can potentially sequence a single human genome in half a day.

The NIH had projected that technology will support sequencing a single human genome in 1 day for under $1000 in 4 years or so. The members of 1000 genomes are hoping to be able to accomplish their work for $30-50,000 per genome. So, the NIH projection may not be too far off.

But what will the databases look like that store and manipulate this huge amount of data? The Sanger Institute is generating 50 Terabytes of data a week, according to the PR.

Maybe I should invest in data storage companies.

Technorati Tags: ,

Using other scientific disciplines

fractal fern by SantaRosa OLD SKOOL
Three Thoughts on Interdisciplinary Research:
[Via Michael Jubb’s blog]
Comments on Michael’s three thoughts following some meetings he has attended recently:

The first was a suggestion, perhaps a hypothesis, that interdisciplinary research will lead (has led?) to an increase in researchers’ interest in open access. The thought here is that researchers in some disciplines (notably some areas of the biosciences) are more inclined to adopt some form of open access in publishing their work; and that as researchers from other disciplines less inclined to open access join with, say, bioscientists in their research, they will be introduced to open access ways of thought. It seems a plausible hypothesis, and one that could fairly easily be tested. Does interdisciplinary research feature particularly prominently in OA journals, or in the contents of repositories?

I think part of this is that working in a interdisciplinary fashion fosters openness. That is, such researchers are often working in and relying on access to scientific disciplines other than the one that the researcher was trained in. If they can not access research from a discipline, they will not really be able to work in that discipline.

It would seem likely that collaborative efforts would most easily flow to those areas that foster open communication with collaborators. Hard to be multidisciplinary it there is not open collaboration with others.Thus open access becomes part of the culture of multidisciplinary research.

The second thought comes from a presentation by Carol Tenopir of the findings of the latest Tenopir and King reader surveys. One of the interesting findings is that interdisciplinary researchers are more likely than other researchers to follow citation links as their means of getting access to journal articles; and that the latest article they have read is more likely to be in digital, as distinct from print, format. Why that should be is perhaps worth some investigation.

Online is all about finding information quickly, incorporating it into the local community and then using it to create knowledge to make decisions. Rapid analysis followed by community synthesis. The collaborative cycle cranks much faster when online tools and Web 2.0 approaches are used. This allows multidisciplinary efforts to be launched that would be virtually impossible without these tools. This pace of collaboration can not be as rapidly sustained using paper means.

The third thought comes from a presentation by Mayur Amin of Elsevier about surveys of usage of journals in Science Direct. One of the interesting findings here is that while for researchers in physics and maths, 70% or more of usage is of journals within the discipline, for researchers in other disciplines, such including chemistry and environmental sciences, usage of journals within the discipline is at less than half that level. This may of course be an effect of the way in which Elsevier classify the journals. But it is at least open to the suggestion that researchers in some disciplines are more inclined to read beyond their own discipline. Is this evidence that some disciplines are more interdisciplinary than others? Is this something worth investigating?

One hypothesis is based on the hierarchy of science and the natural world. Math as a discipline is the most abstract; it can exist without any real need to be part of any other discipline but almost every other discipline needs math. Physics then comes next. It needs math to describe itself but little else other than physics.

Then comes chemistry and biology. Each level down involves lesser abstraction and closer dealings with the natural world. Each requires more and more simple experimentation and observation. Physics has gedanken experiments, which come close to the Greek ideal of not needing to do any experimentation. Math needs no experiments at all and can be done simply in one’s head.

I’m stretching a point but to really understand biology, you need to at least be familiar with chemistry, with physics and with math (not necessarily comfortable since I often think some people go into biology because the math requirements in college are easier than for physics). Physics, though, does not really require a knowledge of chemistry or biology. So, perhaps, this need to understand other fields in order to be trained in biology instills a little more attraction to interdisciplinary approaches, as can be seen in the journal usage seen by Elsevier.

Or maybe it is just sampling error.

Technorati Tags: ,