Our lab develops novel computational methods to study cellular biological systems from a global and data-driven perspective. We seek to exploit diverse, high-throughput functional and genomic data to understand the molecular networks underlying fundamental cellular processes, including regulation of transcription, pre-mRNA processing, signaling, and post-transcriptional gene silencing. Our algorithmic methods draw on machine learning, a computational field concerned with learning accurate, predictive models from noisy and high-dimensional data.
Modeling cell-type specific transcriptional programs
Leslie Lab @ cBio — Visit the Leslie lab Web page at cbio.mskcc.org [More »]Gene regulatory programs in distinct cell types are maintained in large part through the cell-type-specific binding of transcription factors (TFs). Next-generation sequencing technology now allows us to generate unprecedented data on the cell-dependent binding sites of TFs and their chromatin context. We have recently introduced new, discriminatively trained models for capturing the subtle DNA sequence preferences of TFs, and we have shown that some TFs recognize cell-type-specific sequence signals, due for example to differences in the composition of the TF binding complex.
We are now using DNase-seq data in human and mouse primary cells to model the establishment, maintenance, and loss of regulatory regions during lineage development. Our analysis of DNase-seq data in the T cell lineage and of Foxp3 ChIP-seq in regulatory T cells (Tregs) revealed that many Foxp3-bound loci are DNase accessible not only in Tregs but also in CD4+ T cells, even though Foxp3 is expressed only in Tregs. These various examples suggest a model where regulatory regions of some genes are made accessible and occupied by “place-holder TFs” at an earlier stage in differentiation.
Dissecting co- and post-transcriptional regulation
A strong focus of our lab is the computational analysis of post-transcriptional gene regulation, in particular microRNA-mediated gene regulation. We introduced the mirSVR target prediction method, computationally identified key targets of oncogenic and tumor suppressor microRNAs, and studied the system-level effects of competition of microRNAs for RISC and of targets for microRNAs. We are now exploiting recent next-generation RNA sequencing technologies to map co- and post-transcriptional regulatory events and to begin to dissect the layers of regulation that shape the transcriptome.
CLIP-seq combines cross-linking immunoprecipitation with high-throughput sequencing to identify the transcriptome-wide binding sites of RNA-binding proteins. We analyzed AGO CLIP-seq experiments in activated T cells from wild type and miR-155 knock-out mice and computationally identified miR-155-dependent AGO binding sites, providing a comprehensive target list for a key microRNA involved in immune system homeostasis. This study also demonstrated the prevalence of non-canonical target site patterns.
In addition, we are carrying out a large-scale computational effort to map and characterize alternative cleavage and polyadenylation (ApA) events and aberrant ApA in cancer cells. In order to map 3’ ends transcriptome-wide at single base resolution and to quantify the differential usage of 3’UTR isoforms between cell types, we have developed a computational and statistical analysis framework for 3’-seq, a new tag-based next-generation sequencing protocol. We have compiled an atlas of human 3’ cleavage events from 3’-seq experiments across numerous tissues and cell lines. Our analysis reveals coordinated and tissue-specific changes in relative 3’UTR isoform expression.
Cancer systems biology
Cancer genomics projects are generating rich multimodal tumor-profiling data sets, but this data is still underused. To move forward, we need novel and statistically sound approaches that integrate multiple data types in a mechanistically informed way in order to dissect the molecular pathology of cancer. We have developed an integrative strategy to combine mRNA, copy number, and miRNA profiles together with regulatory elements to decipher transcriptional and miRNA-mediated regulatory programs in cancer. We use a statistical framework based on lasso regression to explain tumor versus normal expression changes in terms of direct regulators (microRNAs and TFs) of gene expression. Our study on a large glioblastoma data set identified novel potential regulators for the proneural subtype of glioblastoma.
We are also developing computational methods to dissect the complex interplay between cancer cells and cells of the host microenvironment and model how these diverse interactions contribute to cancer progression and invasion. Our data comes from mouse models of primary tumors and metastasis as well as cell-based co-culture assays developed from these models. Our lab has been developing statistical models of tumor-stromal interactions using both standard expression profiling techniques as well as technologies tailored to the tumor microenvironment.