Genome Biology - Most accessed articles http://genomebiology.com The most accessed research articles published by Genome Biology 2010-06-02T00:00:00Z This is an RSS newsfeed from BioMed Central It is intended to be used with an RSS reader. For more information about RSS newsfeeds from BioMed Central, visit http://www.biomedcentral.com/info/about/rss/ Screening the human exome: a comparison of whole genome and whole transcriptome sequencing Background: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. Results: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. Conclusions: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels. http://genomebiology.com/2010/11/5/R57 Elizabeth Cirulli Abanish Singh Kevin Shianna Dongliang Ge Jason Smith Jessica Maia Erin Heinzen James Goedert David Goldstein The Center for HIV/AIDS Vaccine Immunology (CHAVI) Genome Biology 2010, 11:R57 2010-05-28T00:00:00Z doi:10.1186/gb-2010-11-5-r57 Genome Biology 1465-6906 11 R57 2010-05-28T00:00:00Z XML The case for cloud computing in genome informatics With DNA sequencing now getting cheaper more quickly than data storage or computation, the time may have come for genome informatics to migrate to the cloud. http://genomebiology.com/2010/11/5/207 Genome Biology 2010, 11:207 2010-05-05T00:00:00Z doi:10.1186/gb-2010-11-5-207 Genome Biology 1465-6906 11 207 2010-05-05T00:00:00Z XML A human functional protein interaction network and its application to cancer data analysis Background: One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes into protein functional relationship networks. We are building such a pathway-based analysis system. Results: We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information including protein-protein interactions, gene coexpression, protein domain interaction, gene ontology annotations and text mined protein interactions, which covers close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. Conclusions: We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases. http://genomebiology.com/2010/11/5/R53 Guanming Wu Xin Feng Lincoln Stein Genome Biology 2010, 11:R53 2010-05-19T00:00:00Z doi:10.1186/gb-2010-11-5-r53 Genome Biology 1465-6906 11 R53 2010-05-19T00:00:00Z PDF Evidence for natural antisense transcript-mediated inhibition of microRNA function Background: MicroRNAs (miRNAs) have the potential to regulate diverse sets of protein targets. In addition, mammalian genomes contain numerous natural antisense transcripts, most of which also appear to be non-protein-coding RNAs (ncRNAs). We have recently identified and characterized a highly conserved non-coding antisense transcript for beta-secretase-1 (BACE1), a critical enzyme in Alzheimer's disease pathophysiology. The BACE1-antisense transcript is markedly up-regulated in brain samples from Alzheimer's disease patients and promotes the stability of the (sense) BACE1 transcript. Results: We report here that BACE1-antisense prevents miRNA-induced translational repression of BACE1 mRNA by masking the binding site for miR-485-5p. Indeed, miR-485-5p and BACE1-antisense compete for binding within the same region in the open reading frame of the BACE1 mRNA. We observed opposing effects of BACE1-antisense and miR-485-5p on BACE1 protein in vitro and showed that Locked Nucleic Acid-antimiR mediated knockdown of miR-485-5p as well as BACE1-antisense over-expression can prevent the miRNA-induced translational suppression. The expression of BACE1-antisense as well as miR-485-5p was shown to be dysregulated in RNA samples from Alzheimer's disease subjects as compared to control individuals. Conclusion: Our data demonstrates an interface between two distinct groups of regulatory RNAs in the computation of BACE1 gene expression. Moreover, bioinformatics analyses revealed a theoretical basis for many other potential interactions between natural antisense transcripts and miRNAs at the binding sites of the latter. http://genomebiology.com/2010/11/5/R56 Mohammad Faghihi Ming Zhang Jia Huang Farzaneh Modarresi Marcel Van der Brug Michael Nalls Mark Cookson Georges St-Laurent Claes Wahlestedt Genome Biology 2010, 11:R56 2010-05-27T00:00:00Z doi:10.1186/gb-2010-11-5-r56 Genome Biology 1465-6906 11 R56 2010-05-27T00:00:00Z PDF Towards a comprehensive structural variation map of an individual human genome Background: Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions. Results: We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association. Conclusions: Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies. http://genomebiology.com/2010/11/5/R52 Andy Pang Jeffrey MacDonald Dalila Pinto John Wei Muhammad Rafiq Donald Conrad Hansoo Park Matthew Hurles Charles Lee J Craig Venter Ewen Kirkness Samuel Levy Lars Feuk Stephen Scherer Genome Biology 2010, 11:R52 2010-05-19T00:00:00Z doi:10.1186/gb-2010-11-5-r52 Genome Biology 1465-6906 11 R52 2010-05-19T00:00:00Z XML The role of transposable elements in the evolution of non-mammalian vertebrates and invertebrates Background: Transposable elements (TEs) have played an important role in the diversification and enrichment of mammalian transcriptomes through various mechanisms such as exonization and intronization (the birth of new exons/introns from previously intronic/exonic sequences, respectively), and insertion into first and last exons. However, no extensive analysis has compared the effects of TEs on the transcriptomes of mammals, non-mammalian vertebrates and invertebrates. Results: We analyzed the influence of TEs on the transcriptomes of five species, three invertebrates and two non-mammalian vertebrates. Compared to previously analyzed mammals, there were lower levels of TE introduction into introns, significantly lower numbers of exonizations originating from TEs and a lower percentage of TE insertion within the first and last exons. Although the transcriptomes of vertebrates exhibit significant levels of exonization of TEs, only anecdotal cases were found in invertebrates. In vertebrates, as in mammals, the exonized TEs are mostly alternatively spliced, indicating that selective pressure maintains the original mRNA product generated from such genes. Conclusions: Exonization of TEs is widespread in mammals, less so in non-mammalian vertebrates, and very low in invertebrates. We assume that the exonization process depends on the length of introns. Vertebrates, unlike invertebrates, are characterized by long introns and short internal exons. Our results suggest that there is a direct link between the length of introns and exonization of TEs and that this process became more prevalent following the appearance of mammals. http://genomebiology.com/2010/11/6/R59 Noa Sela Eddo Kim Gil Ast Genome Biology 2010, 11:R59 2010-06-02T00:00:00Z doi:10.1186/gb-2010-11-6-r59 Genome Biology 1465-6906 11 R59 2010-06-02T00:00:00Z XML Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu. http://genomebiology.com/2009/10/3/R25 Ben Langmead Cole Trapnell Mihai Pop Steven Salzberg Genome Biology 2009, 10:R25 2009-03-04T00:00:00Z doi:10.1186/gb-2009-10-3-r25 Genome Biology 1465-6906 10 R25 2009-03-04T00:00:00Z XML Between a chicken and a grape: estimating the number of human genes Many people expected the question 'How many genes in the human genome?' to be resolved with the publication of the genome sequence in 2001, but estimates continue to fluctuate. http://genomebiology.com/2010/11/5/206 Genome Biology 2010, 11:206 2010-05-05T00:00:00Z doi:10.1186/gb-2010-11-5-206 Genome Biology 1465-6906 11 206 2010-05-05T00:00:00Z XML Modeling non-uniformity in short-read rates in RNA-Seq data After mapping, RNA-Seq data can be summarized by a sequence of read counts commonly modeled as Poisson variables with constant rates along each transcript, which actually fit data poorly. We suggest using variable rates for different positions, and propose two models to predict these rates based on local sequences. These models explain more than 50% of the variations and can lead to improved estimates of gene and isoform expressions for both Illumina and Applied Biosystems data. http://genomebiology.com/2010/11/5/R50 Jun Li Hui Jiang Wing Hung Wong Genome Biology 2010, 11:R50 2010-05-11T00:00:00Z doi:10.1186/gb-2010-11-5-r50 Genome Biology 1465-6906 11 R50 2010-05-11T00:00:00Z XML Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes Background: Gene-expression analysis is increasingly important in biological research, with real-time reverse transcription PCR (RT-PCR) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. Given the increased sensitivity, reproducibility and large dynamic range of this methodology, the requirements for a proper internal control gene for normalization have become increasingly stringent. Although housekeeping gene expression has been reported to vary considerably, no systematic survey has properly determined the errors related to the common practice of using only one control gene, nor presented an adequate way of working around this problem. Results: We outline a robust and innovative strategy to identify the most stably expressed control genes in a given set of tissues, and to determine the minimum number of genes required to calculate a reliable normalization factor. We have evaluated ten housekeeping genes from different abundance and functional classes in various human tissues, and demonstrated that the conventional use of a single gene for normalization leads to relatively large errors in a significant proportion of samples tested. The geometric mean of multiple carefully selected housekeeping genes was validated as an accurate normalization factor by analyzing publicly available microarray data. Conclusions: The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which, among other things, opens up the possibility of studying the biological relevance of small expression differences. http://genomebiology.com/2002/3/7/research/0034 Jo Vandesompele Katleen De Preter Filip Pattyn Bruce Poppe Nadine Van Roy Anne De Paepe Frank Speleman Genome Biology 2002, 3:research0034 2002-06-18T00:00:00Z doi:10.1186/gb-2002-3-7-research0034 Genome Biology 1465-6906 3 research0034 2002-06-18T00:00:00Z XML