Our work focuses on a number of theoretical and applied problems that are of relevance to genomics, genetics, molecular biology and medicine. We have been a) developing algorithms that are generic in nature and, consequently, applicable to a very large number of the problems one encounters in the field of Life Sciences; and, b) using these algorithms in novel ways in order to answer many of the questions faced by basic researchers and clinicians. Our ultimate goal is to improve the understanding of the regulation of cellular processes.
Through the work of many people in the worldwide community, the last decade has witnessed an avalanche of discoveries that led to a renewed appreciation of the complex regulatory circuits governing cellular processes. Among other, these discoveries included unexpected interactions and novel regulatory elements (both active molecules and targets) whose sequences can be either phylogenetically conserved or organism-specific.Several years ago, we published a novel method, rna22, for computationally predicting the targets of microRNAs and for predicting novel microRNA precursors and their mature microRNAs (Cell 2006). As we described in that manuscript, extensive experimentation and application of the rna22 algorithm to the full-length mRNAs from several genomes provided strong support for a paradigm whereby a) a microRNA can target thousands of different genes, b) a microRNA can target outside the 3’UTR, and c) a microRNA target site need not be conserved across genomes. In follow-up work, we provided additional evidence in support of the last two statements by showing that several microRNAs target the Nanog, Oct4 and Sox2 transcription factors within their amino acid coding regions and at locations that are not conserved in the human orthologues of these factors (Nature 2008).In parallel work, we linked genic space from nearly every single known gene in the human and mouse genomes to numerous non-coding regions within the rest of the genome’s vast ‘uncharted real estate’ (PNAS 2006). The key insight was the discovery of specific sequence motifs, the “pyknons”, which are shared between the exons of protein-coding genes and intergenic/intronic regions, and the recognition that these motifs participate in unanticipated RNA-driven regulatory interactions. In follow-up work, we showed that very large portions of intronic space in the human and mouse genomes that are covered by the pyknons participate in the same functional associations in the two genomes in the absence of sequence conservation (Nucl Acids Res 2008). This line of research correctly presaged the existence of novel classes of short RNAs, including the subsequently discovered class of piRNAs, the recently discovered dual role of messenger RNA as both a source of short RNAs with regulatory roles and a target of short RNAs, and, the targeting of non-coding ‘intronic’ sequences by coding ‘exonic’ short RNAs with approximately antisense sequences. More recently, we also showed for the Alu (primates) and B1 (rodents) categories of repeat elements that they have been selectively retained in the regions immediately surrounding the TSS of the same gene groups in the human and mouse genomes, suggesting possible regulatory roles (PLoS Comp Biology 2009).Our own findings, and those of colleagues, paint a picture of cell process regulation that is far more complex than one might have anticipated only a few years ago. WIth that in mind, we will continue to develop new approaches for discovering previously unidentified regulatory elements, for understanding the rules that underlie the interactions in which they are involved, and, for shedding light on the identity and the specifics of the mechanisms that employ these elements and interactions.
In recent years, there has been widespread use of the term “Systems Biology” in the open literature. Our working definition of the term has as follows: “systems biology is an integrated approach that brings together and leverages theoretical, experimental and computational approaches in order to establish connections among important molecules or groups of molecules so as to aid the eventual mechanistic explanation of cellular processes and systems.” We view systems biology as an endeavor that aims to uncover concrete molecular relationships for targeted analysis through the interpretation of cellular phenotype in terms of integrated biomolecular networks. The fidelity and breadth of a given network and its state characterization are intimately related to the degree of our understanding of the system under study.Systems biology is expressly cross-disciplinary in nature and its domain of study spans a hierarchy of organismal organization levels, with each level comprising units diverse in nature (e.g. genes, proteins, pathways, organelles, etc). Assuming a comprehensive list of these units, systems biology seeks to characterize the static and dynamic behavior of these units as well as the complex inter- and intra-level relationships in which these units participate. The eventual reward is the building of a “holistic view” of the organism under study that is expected in turn to enhance our knowledge of the organism’s static and dynamic behavior.
Next generation sequencing is a generic term used to refer to novel methodologies for carrying out sequencing in a high-throughput manner. Platforms implementing these methodologies have been generating increasingly larger outputs, which now range in the hundreds of million of reads per sequencing run. Our involvement with the new technology began in 2005 and culminated in the development of PhyloPythia (Nature Methods 2007) an algorithm for the automated classification variable length sequence fragments assembled from the high-throughput sequencing of microbial metagenomes. PhyloPythia was used to analyze the microbial community of the hindgut of a wood feeding termine (Nature 2007) as well as in several other projects (Nature Biotech 2006, Nature Biotech 2008). More recently, we have been using next generation sequencing as an enabling technology to help us address increasingly more specific questions such as the temporal changes of genome-wide methylation in differentiating human embryonic stem cells (Genome Research 2010), profile the expression of short and long RNAs in different tissues and in normal and disease states, investigate the binding patterns of important transcription factors and of RNA binding proteins, etc.