Those of us trying to analyze next-gen sequencing data often feel constrained by the availability of computing power. Buying a very large computer (‘large’ measured by RAM size, not body mass index) is the most conventional solution, but that solution comes with a hefty price tag. Many institutions already invested heavily into distributed computing centers, and they encourage users to take full advantage of the existing resources.
Typically, distributed systems in computing clusters and supercomputing centers implement MPI-based architecture for parallel computing. Another type of distributed architecture named Hadoop/MapReduce has become popular among the internet companies processing terabytes of data. Hadoop is accessible to bioinformaticians through Amazon cloud (Elastic MapReduce), but many researchers do not understand what advantage Hadoop would provide over conventional parallel architecture. Here we explain the difference with simple examples.
Continue here
Monday, August 8, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.