Next Generation Sequencing: Using Hadoop for Transcriptomics

Monday, August 8, 2011

Using Hadoop for Transcriptomics

Those of us trying to analyze next-gen sequencing data often feel constrained by the availability of computing power. Buying a very large computer (‘large’ measured by RAM size, not body mass index) is the most conventional solution, but that solution comes with a hefty price tag. Many institutions already invested heavily into distributed computing centers, and they encourage users to take full advantage of the existing resources.

Typically, distributed systems in computing clusters and supercomputing centers implement MPI-based architecture for parallel computing. Another type of distributed architecture named Hadoop/MapReduce has become popular among the internet companies processing terabytes of data. Hadoop is accessible to bioinformaticians through Amazon cloud (Elastic MapReduce), but many researchers do not understand what advantage Hadoop would provide over conventional parallel architecture. Here we explain the difference with simple examples.

Continue here

Next Generation Sequencing

Monday, August 8, 2011

Using Hadoop for Transcriptomics

No comments:

Post a Comment

About Me

My Blog List

Followers

Blog Archive