Wednesday, August 17, 2011

A Bird's Eye View of NCBI GEO Database - II

We presented the structure of NCBI GEO database in our earlier commentary – A Bird’s Eye View of NCBI GEO Database. Today we will inspect the contents of GEO database more closely.

As we explained earlier, GEO data sets are organized in terms of both GPLs (platforms/array design) and GSEs (collection of many measurements on one or more array designs). As an example, GPL570 is the human gene array designed by Affymetrix. At NCBI GEO database, all experiments using the above array can be downloaded together from their GPL570 link. On the other hand, a GSE ID typically represents all data from a researcher related to a publication. That GSE file may include any number of platforms (GPLs) depending on how the experiment was designed.

In the following chart, we show the most popular GPLs, i.e. the ones used by the highest number of GSEs. Please click on the chart to see it in a larger form. GPL570 is clearly the winner closely followed by GPL1261 (Affymetrix mouse array). Each of those arrays was used by over 1,000 publications. GEO also assigned single GPL IDs for all Illumina short read submissions for each organism. Those sets (GPL9052, GPL9058, etc.) are catching up fast given their limited history.

Continue here

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.