Thursday, July 28, 2011

De Bruijn graphs - I

New algorithms for short read assembly (categories B and D) often use de Bruijn graphs to store and represent sequence data. What is a de Bruijn graph and why is it so popular for analyzing short read sequences? We will explain the concept here.

De Bruijn graph is an efficient way to represent a sequence in terms of its k-mer components. Although de Bruijn graphs can be used for a broad range of problems, our discussion will be limited to nucleotide sequences. Most papers talk about constructing de Bruijn graphs from short reads and derive the genome sequence from the de Bruijn graph. For simplicity, here we will first introduce de Bruijn graph of a genome, and then explain how short reads fit into the picture.

A de Bruijn graph can be constructed for any sequence, short or long. Here is a simple example -

Continue reading De Bruijn graphs – I

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.