Scientific commuity building

sand by …†∆†¡∆µ∆
Building scientific communities:
[Via business|bytes|genes|molecules]
Here is an interesting point that should be discussed more, especially with scientific community building (my bolding).

I will start with something I have quoted all too often

Data finds data, then people find people

That quote by Jon Udell, channeling Jeff Jonas is one that, to me at least, defines what the modern web is all about. Too many people tend to put the people first, but in the end without common data to commune around, there can be no communities.

A community needs a purpose to exist, a reason to come together. Some communities arise because of similar political or gardening interests. Most research communities come together for one major reason – to deal with data.

Now data simply exists, like grains of sand. It requires human interaction to gain context and become information. In social settings, this information can be transformed into the knowledge that allows a decision to be made, decisions such as ‘I need to redo the experiment’ or ‘I can now publish.’

It used to be possible for a single researcher, or a small number, to examine a single handful of sand in order to generate information needed to answer scientific questions. Now we have to examine an entire beach or even an entire coastline. A much larger group of people must now be brought together to provide context for this data in any reasonable timeframe.

However, standard approaches are too slow and cumbersome. When one group can add 45 billion bases of DNA sequence to the databases a week, the solution cycle has to be shortened.

Science is an intellectual pursuit, whether it is formal academic science or just casual common interest. That’s where all the tools available today come into the picture. The data has always been there. Whether at the backend, or at the front end, we can think about how to get everything together, but being able to discovery and find some utility is very important. One of the reasons the informatics community seems to thrive online, apart from inherent curiosity and interest in such matters, is that we have a general set of interests to talk about, from programming languages, to tools to methods, to just whining about the fact that we spend too much time data munging. Successful life science communities need that common ground. In a blog post, Egon talks about JMOL and CDK. Why would I participate in the CDK community, or the JMOL one? Cause I have some interest in using or modifying JMOL, or finding out more about the CDK toolkit and perhaps using it. Successful communities are the ones that can take this mutual interest around the data and bring together the people.

Part of what is being discussed here is a common language and interest that allows rapid interactions amongst a group. In some ways, this is not different than a bunch of people coalescing around a cult TV show and forming a community. A difference is that the latter is a way to transform information that has purely entertainment value.

The researchers are actually trying to get their work done. What Web 2.0 approaches do is permit scientists to come together in virtual ad hoc communities to examine large amounts of data and help transform that into knowledge. Instead of one handful at a time, buckets and truckloads of sand can be examined at one time, with a degree of intensity impossible for a small group.

The size and depth of these ad hoc communities, as well as their longevity, will depend on the size of the beach, just how much data must be examined. But I guarantee that there will always be more data to examine, even after publication.

So my advice to anyone building a scientific community (the one that jumped out at me during the workshop was the EcoliHub) is to think about what the underlying data that could bring together people is first. Data here is used in a general sense. Not just scientific raw data, but information and interests as well. Then trying and figure out what the goals are that will make these people come together around the data and then figure out what the best mechanism for that might be. Don’t put the cart before the horse. In most such cases, you need a critical mass to make a community successful, to truly benefit from the wealth of networks. In science that’s often hard, so any misstep in step 1, will usually end up in a community that has little or no traction.

EcoliHub is a great example of a website in the wild that is supported almost entirely in an Open Source fashion. This is a nice way to create a very strong community focussed on a single, rich topic. On the wide open Internet, though, it may be harder for smaller communities to come into existence, simply because of how hard it might be for the individual members of the community to find one another.

But there are other processes allowing other communities to come together with smaller goals and more focussed needs. The decoupling of time and space seen with Web 2.0 approaches, frees these groups from having to wait until the participants can occupy the same space at the same time. These group can examine a large amount of data rapidly and move on. There is not the need to assure the community that it will be around for a long time.

This is the sort of community that may be more likely to come into existence inside an organization. There are other pressures that drive the creation of these types of groups than simply a desire to talk with people of similar interests about some data.

A grant deadline for example.

Technorati Tags: ,

Leave a Reply