The main focus of our Computational and Structural Biology research group is the integration of two fields of research: genomics and structural biology. Our primary goal is to reveal yet unknown molecular mechanisms underlying gene regulation using bioinformatics analyses of high-throughput sequencing and DNA methylation data of whole genomes, integrated with experimental molecular biology approaches.
Professor Rohs can accept graduate students from the following Ph.D. Programs as primary thesis adviser: Computational Biology and Bioinformatics, Molecular Biology, Chemistry, Physics, and Computer Science.
Yang et al. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.
Mol. Syst. Biol. In press (2017)
||We resequenced data from HT-SELEX experiments for 410 different transcription factors (TFs) from mouse and human, the most extensive mammalian TF–DNA binding data available to date, and demonstrated the contributions of DNA shape readout across diverse TF families and its importance in core motif flanking regions. Statistical machine-learning models combined with feature-selection techniques helped to reveal the nucleotide position-dependent DNA shape readout in TF-binding sites and the TF family-specific position dependence. Based on these results, we proposed novel DNA shape logos (Figure) to visualize the DNA shape preferences of TFs. This work suggests a way of obtaining mechanistic insights into TF–DNA binding without relying on experimentally solved all-atom structures.
Mathelier et al. DNA shape features improve transcription factor binding site predictions in vivo.
Cell Syst. 3, 278-286 (2016)
||Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features. Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs.
Chiu et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.
Bioinformatics 32, 1211-1213 (2016)
||DNAshapeR is a software package implemented in the statistical programming language R that predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input, and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies.
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2015/16
Dror et al. A widespread role of the motif environment on transcription factor binding across diverse protein families.
Genome Res. 25, 1268-1280 (2015)
||TFs bind to only a very small fraction of all potential DNA binding sites in the genome. Here, we revealed using in vitro HT-SELEX binding assays and in vivo ChIP-seq data that the surroundings of cognate binding sites have unique characteristics, which distinguish them from other sequences containing a similar motif that is not bound by the TF. Comparing the nucleotide content and DNA shape in the regions around the TF-bound sites to unbound sites containing the same consensus motifs revealed significant differences, which extend far beyond the core binding site (Figure). These unique features appear to be similar for TFs from the same protein family and likely assist in guiding TFs to their cognate binding sites.
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)
||Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the three-dimensional shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we teased them apart in the context of Hox-DNA binding by mutating residues that only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features (Figure). Introducing residues that recognize DNA shape from one Hox protein to another swapped binding specificity in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features and feature selection identified shape features important for recognition. Thus, shape readout is a direct and critical component of binding site selection by Hox proteins.
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)
||Genomes provide an abundance of putative binding sites for each TF. However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Since the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features (Figure), which consistently improves the modeling of in vitro TF-DNA binding specificities. In addition, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone.
Chiu et al. GBshape: a genome browser database for DNA shape annotations.
Nucleic Acids Res. 43, D103-109 (2015)
||GBshape provides DNA shape annotations of entire genomes. The database currently contains annotations for minor groove width, roll, propeller twist, helix twist and hydroxyl radical cleavage for 94 different organisms. Additional genomes can easily be added in the provided framework. GBshape contains two major tools, a genome browser and a table browser. The genome browser (Figure) provides a graphical representation of DNA shape annotations along standard genome browser annotations.
Dantas Machado et al. Evolving insights on how cytosine methylation affects protein–DNA binding.
Brief. Funct. Genomics 14(1), 61-73 (2014)
||Many anecdotal observations exist of a regulatory effect of DNA methylation on gene expression. However, the underlying mechanisms of this effect are poorly understood. In this review, we summarize what is currently known about how this important epigenetic mark impacts cellular function. DNA methylation can abrogate or enhance interactions with DNA-binding proteins, or it may have no effect, depending on the context. The presence of cytosine methyl groups (Figure) can affect direct interactions between the protein and its DNA binding site, cause an indirect effect on DNA structure, and alter nucleosome stability.
Slattery et al. Absence of a simple code: how transcription factors read the genome.
Trends Biochem. Sci. 39(9), 381-399 (2014)
||Transcription factors (TFs) play a key role in the central dogma of molecular biology by interpreting the language of DNA to control transcription. However, it has become clear that the “code” they read does not comprise DNA sequence alone. We discuss in this Feature Review the recent work that has used structural, computational, in vitro and in vivo approaches to move toward understanding the transcription factor code. We highlight the many variables that influence TF-DNA binding, including cofactors, cooperativity, and chromatin. The cover shows the IFN-β enhanceosome (Figure), an example of cooperativity through TF-TF interactions.
NAR Breakthrough Article
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2013/14
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-155 (2014)
||Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. This task is like solving a Rubik's cube (Figure; top face: DNA sequences with transcription factor binding sites (TFBS); left face: 3D structure of a protein-DNA complex; front face: heat map representing minor groove width patterns selected by a transcription factor (TF) in a high-throughput experiment). The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.|
Dror et al. Covariation between homeodomain transcription factors and the shape of their DNA binding sites.
Nucleic Acids Res. 42, 430-441 (2014)
||Using our new method for high-throughput prediction of DNA shape, we analyzed DNA binding sites of 168 mouse and 84 Drosophila homeodomains to determine a general DNA shape recognition code (Figure) for this family of transcription factors. We predicted DNA shape features for almost 25,000 DNA targets derived from protein binding microarray (PBM) and bacterial-one hybrid (B1H) experiments and found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites. |
Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-62 (2013)
||We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.|
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Gordân et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.
Cell Rep. 3, 1093-1104 (2013)
|How transcription factors (TFs) with highly similar DNA binding-site motifs recognize distinct targets in vivo is poorly understood. In this study, we show in collaboration with Martha Bulyk's lab that the paralogous Saccharomyces cerevisiae TFs Cbf1 and Tye7 exhibit different DNA binding preferences both in vitro and in vivo, depending on the genomic context of the sites. Results of computational analyses suggest that nucleotides outside of their core binding sites contribute to specificity by influencing the three-dimensional structure of the DNA targets. |
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Lazarovici et al. Probing DNA shape and methylation state on a genomic scale with DNase I.
Proc. Natl. Acad. Sci. USA 110, 6376-6381 (2013)
|To address the relationship between DNase I cleavage rate and minor groove geometry, we predicted DNA shape parameters for sequences covering the entire range from highly to poorly cleavable. The variation in these shape parameters turned out to be highly predictive of the variation in cleavage rate. Other insights obtained from this project in collaboration with Harmen Bussemaker's and John Stamatoyannopoulos' labs were related to DNA methylation. We found that even though cytosine methylation happens in the major groove, one of its key effects is to narrow the minor groove. Thus, varying the base sequence of genomic DNA is not the only way in which the cell can modulate the landscape of minor groove shape along its genome. |
Chang et al. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen.
Cell Rep. 3, 1117-1127 (2013)
|The first essential step in activating genomic DNA replication is the site-specific assembly of initiator proteins on origin (ori) DNA, a process that is not well characterized. In collaboration with Xiaojiang Chen's lab, we report a major step toward understanding this process by determining the long-sought cocrystal structure of the SV40 initiator/helicase, large tumor antigen (LTag), in complex with its ori DNA. The structure shows that multidomain LTag assembles on ori DNA differently from what one would expect from previous studies. The structure also reveals an intrinsic DNA shape readout mechanism using histidines. |
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2011
Slattery et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)
|In vivo transcription factor-DNA recognition is much more specific than in vitro binding. The eight Drosophila Hox proteins bind to very similar target sites but execute distinct in vivo functions. The figure illustrates that the cofactor Exd (yellow) unlocks a wide range in specificity of Hox proteins (cyan) for recognizing DNA target sites (metallic). Based on SELEX-seq experiments, we present specificity fingerprints of Hox proteins and reveal that DNA shape is a determining factor in achieving specificity. This is the first study, for which a preliminary version of our new approach for high-throughput DNA shape prediction has been applied to thousands of sequences, showing that anterior and posterior Hox proteins recognize different DNA shape. Moreover, DNA shape indicates how Hox genes have differentiated in evolution.|
Rohs et al. The role of DNA shape
in protein-DNA recognition
Nature 461, 1248-1253 (2009)
The figure illustrates the molecular shape of nucleosomal
DNA when wrapped around the histone core. The narrow minor groove is
color-coded in dark grey. The red mesh shows an isopotential surface with negative
electrostatic potential. The shape of narrow minor groove regions induces
an enhanced negative electrostatic potential, which attracts histone
arginines. Such interactions between the protein and DNA contribute to the
stabilization of the nucleosome core particle. |
Rohs et al. Origin of specificity in protein-DNA recognition
Annu. Rev. Biochem. 79, 233-269 (2010)
In order to carry out their unique biological functions, proteins need to recognize their DNA binding sites in a highly specific manner. Specificity in protein-DNA binding is achieved through the recognition of both linear sequence and three-dimensional structure. Therefore, the nucleotide sequence of a binding site is only one part of the story, and the three-dimensional structures of both the DNA and the protein must be taken into account to fully understand recognition on a molecular basis. DNA shape is specifically recognized by a variety of protein families, and we have identified different ways of modulating DNA shape. The figure shows the shape of the molecular surface (top) of ideal A-DNA (left), B-DNA (center), and Z-DNA, and the resulting specific variations in electrostatic potential (bottom).