Welcome to the Rohs Lab


The main focus of our Computational and Structural Biology research group is the integration of two fields of research: genomics and structural biology. Our primary goal is to reveal yet unknown molecular mechanisms underlying gene regulation using bioinformatics analyses of high-throughput sequencing and DNA methylation data of whole genomes, integrated with experimental molecular biology approaches.

Professor Rohs can accept graduate students from the following Ph.D. Programs as primary thesis adviser: Computational Biology and Bioinformatics, Molecular Biology, Chemistry, Physics, and Computer Science.

Selected Publications

Mathelier et al. DNA shape features improve transcription factor binding site predictions in vivo.
Cell Syst. In press (2016)

Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features. Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs.

Chiu et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.
Bioinformatics 32, 1211-1213 (2016)

DNAshapeR is a software package implemented in the statistical programming language R that predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input, and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies.

Dror et al. A widespread role of the motif environment on transcription factor binding across diverse protein families.
Genome Res. 25, 1268-1280 (2015)

TFs bind to only a very small fraction of all potential DNA binding sites in the genome. Here, we revealed using in vitro HT-SELEX binding assays and in vivo ChIP-seq data that the surroundings of cognate binding sites have unique characteristics, which distinguish them from other sequences containing a similar motif that is not bound by the TF. Comparing the nucleotide content and DNA shape in the regions around the TF-bound sites to unbound sites containing the same consensus motifs revealed significant differences, which extend far beyond the core binding site (Figure). These unique features appear to be similar for TFs from the same protein family and likely assist in guiding TFs to their cognate binding sites.

Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)

Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the three-dimensional shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we teased them apart in the context of Hox-DNA binding by mutating residues that only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features (Figure). Introducing  residues that recognize DNA shape from one Hox protein to another swapped binding specificity in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features and feature ​selection identified shape features important for recognition. Thus, shape readout is a direct and critical component of binding site selection by Hox proteins.

Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)

Genomes provide an abundance of putative binding sites for each TF. However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Since the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features (Figure), which consistently improves the modeling of in vitro TF-DNA binding specificities. In addition, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone. 

Chiu et al. GBshape: a genome browser database for DNA shape annotations.
Nucleic Acids Res. 43, D103-109 (2015)

GBshape GBshape provides DNA shape annotations of entire genomes. The database currently contains annotations for minor groove width, roll, propeller twist, helix twist and hydroxyl radical cleavage for 94 different organisms. Additional genomes can easily be added in the provided framework. GBshape contains two major tools, a genome browser and a table browser. The genome browser (Figure) provides a graphical representation of DNA shape annotations along standard genome browser annotations. 

Dantas Machado et al. Evolving insights on how cytosine methylation affects protein–DNA binding.
Brief. Funct. Genomics 14(1), 61-73 (2014)

Many anecdotal observations exist of a regulatory effect of DNA methylation on gene expression. However, the underlying mechanisms of this effect are poorly understood. In this review, we summarize what is currently known about how this important epigenetic mark impacts cellular function. DNA methylation can abrogate or enhance interactions with DNA-binding proteins, or it may have no effect, depending on the context. The presence of cytosine methyl groups (Figure) can affect direct interactions between the protein and its DNA binding site, cause an indirect effect on DNA structure, and alter nucleosome stability.

Feature Review
Slattery et al. Absence of a simple code: how transcription factors read the genome.
Trends Biochem. Sci. 39(9), 381-399 (2014)

Transcription factors (TFs) play a key role in the central dogma of molecular biology by interpreting the language of DNA to control transcription. However, it has become clear that the “code” they read does not comprise DNA sequence alone. We discuss in this Feature Review the recent work that has used structural, computational, in vitro and in vivo approaches to move toward understanding the transcription factor code. We highlight the many variables that influence TF-DNA binding, including cofactors, cooperativity, and chromatin. The cover shows the IFN-β enhanceosome (Figure), an example of cooperativity through TF-TF interactions.

NAR Breakthrough Article
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2013/14
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-155 (2014)

TFBSshape Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. This task is like solving a Rubik's cube (Figure; top face: DNA sequences with transcription factor binding sites (TFBS); left face: 3D structure of a protein-DNA complex; front face: heat map representing minor groove width patterns selected by a transcription factor (TF) in a high-throughput experiment). The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.

Dror et al. Covariation between homeodomain transcription factors and the shape of their DNA binding sites.
Nucleic Acids Res. 42, 430-441 (2014)

Using our new method for high-throughput prediction of DNA shape, we analyzed DNA binding sites of 168 mouse and 84 Drosophila homeodomains to determine a general DNA shape recognition code (Figure) for this family of transcription factors. We predicted DNA shape features for almost 25,000 DNA targets derived from protein binding microarray (PBM) and bacterial-one hybrid (B1H) experiments and found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites.

Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-62 (2013)

We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Gordân et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.
Cell Rep. 3, 1093-1104 (2013)

How transcription factors (TFs) with highly similar DNA binding-site motifs recognize distinct targets in vivo is poorly understood. In this study, we show in collaboration with Martha Bulyk's lab that the paralogous Saccharomyces cerevisiae TFs Cbf1 and Tye7 exhibit different DNA binding preferences both in vitro and in vivo, depending on the genomic context of the sites. Results of computational analyses suggest that nucleotides outside of their core binding sites contribute to specificity by influencing the three-dimensional structure of the DNA targets.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Lazarovici et al. Probing DNA shape and methylation state on a genomic scale with DNase I.
Proc. Natl. Acad. Sci. USA 110, 6376-6381 (2013)

To address the relationship between DNase I cleavage rate and minor groove geometry, we predicted DNA shape parameters for sequences covering the entire range from highly to poorly cleavable. The variation in these shape parameters turned out to be highly predictive of the variation in cleavage rate. Other insights obtained from this project in collaboration with Harmen Bussemaker's and John Stamatoyannopoulos' labs were related to DNA methylation. We found that even though cytosine methylation happens in the major groove, one of its key effects is to narrow the minor groove. Thus, varying the base sequence of genomic DNA is not the only way in which the cell can modulate the landscape of minor groove shape along its genome.

Chang et al. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen.
Cell Rep. 3, 1117-1127 (2013)

The first essential step in activating genomic DNA replication is the site-specific assembly of initiator proteins on origin (ori) DNA, a process that is not well characterized. In collaboration with Xiaojiang Chen's lab, we report a major step toward understanding this process by determining the long-sought cocrystal structure of the SV40 initiator/helicase, large tumor antigen (LTag), in complex with its ori DNA. The structure shows that multidomain LTag assembles on ori DNA differently from what one would expect from previous studies. The structure also reveals an intrinsic DNA shape readout mechanism using histidines.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2011
Slattery et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)

In vivo transcription factor-DNA recognition is much more specific than in vitro binding. The eight Drosophila Hox proteins bind to very similar target sites but execute distinct in vivo functions. The figure illustrates that the cofactor Exd (yellow) unlocks a wide range in specificity of Hox proteins (cyan) for recognizing DNA target sites (metallic). Based on SELEX-seq experiments, we present specificity fingerprints of Hox proteins and reveal that DNA shape is a determining factor in achieving specificity. This is the first study, for which a preliminary version of our new approach for high-throughput DNA shape prediction has been applied to thousands of sequences, showing that anterior and posterior Hox proteins recognize different DNA shape. Moreover, DNA shape indicates how Hox genes have differentiated in evolution.

Rohs et al. The role of DNA shape in protein-DNA recognition
Nature 461, 1248-1253 (2009)

nucleosome The figure illustrates the molecular shape of nucleosomal DNA when wrapped around the histone core. The narrow minor groove is color-coded in dark grey. The red mesh shows an isopotential surface with negative electrostatic potential. The shape of narrow minor groove regions induces an enhanced negative electrostatic potential, which attracts histone arginines. Such interactions between the protein and DNA contribute to the stabilization of the nucleosome core particle.

Rohs et al. Origin of specificity in protein-DNA recognition
Annu. Rev. Biochem. 79, 233-269 (2010)

In order to carry out their unique biological functions, proteins need to recognize their DNA binding sites in a highly specific manner. Specificity in protein-DNA binding is achieved through the recognition of both linear sequence and three-dimensional structure. Therefore, the nucleotide sequence of a binding site is only one part of the story, and the three-dimensional structures of both the DNA and the protein must be taken into account to fully understand recognition on a molecular basis. DNA shape is specifically recognized by a variety of protein families, and we have identified different ways of modulating DNA shape. The figure shows the shape of the molecular surface (top) of ideal A-DNA (left), B-DNA (center), and Z-DNA, and the resulting specific variations in electrostatic potential (bottom).

August 31, 2016
Carolina defended her Ph.D. thesis with flying colors. Congratulations, Carolina!

August 18, 2016
Our new paper proves the impact of DNA shape on in vivo TF binding based on 400 human ChIP-seq datasets.

August 16, 2016
Remo was promoted to Full Professor of Biological Sciences at USC. Fight on!

July 14, 2016
Remo was elected Head of Computational Biology and Bioinformatics at USC. Fight on!

June 6, 2016
Lin defended his Ph.D. thesis with flying colors. Congratulations, Lin!

May 4, 2016
Tsu-Pei received a competitive Enhancement Fellowship from the USC Graduate School. Congratulations, Tsu-Pei!

May 3, 2016
Lin received the highest honor for a USC graduate student, the PhD Achievement Award. Congratulations, Lin!

May 3, 2016
Remo was introduced as the incoming Vice Chair of the Department of Biological Sciences and Director of Biological Sciences Studies.

April 20, 2016
Remo presented our recent Zhou et al. PNAS paper as one of the few selected Highlights at the recent RECOMB conference.

April 19, 2016
Carolina received the Harrison M. Kurtz Award and Tsu-Pei the William E. Trusten Award. Congrats, Carolina and Tsu-Pei!.

April 6, 2016
Remo received the USC Mentoring award in the category mentoring of graduate students. Best award ever!

March 16, 2016
Remo received the ACS OpenEye Outstanding Junior Faculty Award in Computational Chemistry at the American Chemical Society National Meeting.

January 28. 2016
Remo received Tenure at USC and was promoted to Associate Professor. Fight on!

November 18, 2015
Our recent Abe et al. Cell and Zhou et al. PNAS papers were voted as RECOMB/ISCB Top Papers in regulatory and systems genomics in 2014/15.

Recent news

August 21-25, 2016
Symposium on Modeling Water and Solvation in Biochemistry: Developments and Applications, American Chemical Society National Meeting, Philadelphia, PA

July 5-8, 2016
Meeting on Measuring and Modeling Quantitative Sequence-Function Relationships, Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY

April 17-21, 2016
RECOMB 2016 Conference, Santa Monica, CA

March 23, 2016
Leibniz Institute on Aging - Fritz Lipmann Institute, Jena, Germany

March 15-19, 2016
CSHL Meeting on Systems Biology: Global Regulation of Gene Expression, Cold Spring Harbor Laboratory, NY

March 7-10, 2016
Workshop on Regulatory Genomics and Epigenomics, Simons Institute for the Theory of Computing, UC Berkeley, Berkeley, CA

February 5-7, 2016
Bridge@USC and Michelson Center for Convergent Biosciences Retreat, Catalina Island, CA

January 31- February 5, 2016
Epigenomics 2016 Meeting, Rio Mar, Puerto Rico

January 19, 2016
Bioinformatics and Computational Biology Research Center, Cedars-Sinai Medical Center, Los Angeles, CA

Recent presentations

BISC 481 syllabus
Structural Bioinformatics from Atoms to Cells