Welcome to the Rohs Lab


The main focus of our Computational Structural Biology research group is the integration of two fields of research: genomics and structural biology. Our primary goal is to reveal yet unknown molecular mechanisms underlying gene regulation using bioinformatics analyses of high-throughput sequencing and DNA methylation data of whole genomes.

Professor Rohs can accept graduate students from the following Ph.D. Programs as primary thesis adviser: Computational Biology and Bioinformatics, Molecular Biology, Chemistry, Physics, and Computer Science.

Selected Publications

Dror et al. A widespread role of the motif environment on transcription factor binding across diverse protein families.
Genome Res. In press (2015)

TFs bind to only a very small fraction of all potential DNA binding sites in the genome. Here, we revealed using in vitro HT-SELEX binding assays and in vivo ChIP-seq data that the surroundings of cognate binding sites have unique characteristics, which distinguish them from other sequences containing a similar motif that is not bound by the TF. Comparing the nucleotide content of the regions around the TF-bound sites to unbound sites containing the same consensus motifs revealed significant differences, which extend far beyond the core binding site (Figure). These unique features appear to be similar for TFs from the same protein family and likely assist in guiding TFs to their cognate binding sites.

Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)

Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the three-dimensional shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we teased them apart in the context of Hox-DNA binding by mutating residues that only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features (Figure). Introducing  residues that recognize DNA shape from one Hox protein to another swapped binding specificity in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features and feature ​selection identified shape features important for recognition. Thus, shape readout is a direct and critical component of binding site selection by Hox proteins.

Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)

Genomes provide an abundance of putative binding sites for each TF. However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Since the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features (Figure), which consistently improves the modeling of in vitro TF-DNA binding specificities. In addition, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone. 

Chiu et al. GBshape: a genome browser database for DNA shape annotations.
Nucleic Acids Res. 43, D103-109 (2015)

GBshape GBshape provides DNA shape annotations of entire genomes. The database currently contains annotations for minor groove width, roll, propeller twist, helix twist and hydroxyl radical cleavage for 94 different organisms. Additional genomes can easily be added in the provided framework. GBshape contains two major tools, a genome browser and a table browser. The genome browser (Figure) provides a graphical representation of DNA shape annotations along standard genome browser annotations. 

Dantas Machado et al. Evolving insights on how cytosine methylation affects protein–DNA binding.
Brief. Funct. Genomics 14(1), 61-73 (2014)

Many anecdotal observations exist of a regulatory effect of DNA methylation on gene expression. However, the underlying mechanisms of this effect are poorly understood. In this review, we summarize what is currently known about how this important epigenetic mark impacts cellular function. DNA methylation can abrogate or enhance interactions with DNA-binding proteins, or it may have no effect, depending on the context. The presence of cytosine methyl groups (Figure) can affect direct interactions between the protein and its DNA binding site, cause an indirect effect on DNA structure, and alter nucleosome stability.

Feature Review
Slattery et al. Absence of a simple code: how transcription factors read the genome.
Trends Biochem. Sci. 39(9), 381-399 (2014)

Transcription factors (TFs) play a key role in the central dogma of molecular biology by interpreting the language of DNA to control transcription. However, it has become clear that the “code” they read does not comprise DNA sequence alone. We discuss in this Feature Review the recent work that has used structural, computational, in vitro and in vivo approaches to move toward understanding the transcription factor code. We highlight the many variables that influence TF-DNA binding, including cofactors, cooperativity, and chromatin. The cover shows the IFN-β enhanceosome (Figure), an example of cooperativity through TF-TF interactions.

NAR Breakthrough Article
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2013/14
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-155 (2014)

TFBSshape Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. This task is like solving a Rubik's cube (Figure; top face: DNA sequences with transcription factor binding sites (TFBS); left face: 3D structure of a protein-DNA complex; front face: heat map representing minor groove width patterns selected by a transcription factor (TF) in a high-throughput experiment). The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.

Dror et al. Covariation between homeodomain transcription factors and the shape of their DNA binding sites.
Nucleic Acids Res. 42, 430-441 (2014)

Using our new method for high-throughput prediction of DNA shape, we analyzed DNA binding sites of 168 mouse and 84 Drosophila homeodomains to determine a general DNA shape recognition code (Figure) for this family of transcription factors. We predicted DNA shape features for almost 25,000 DNA targets derived from protein binding microarray (PBM) and bacterial-one hybrid (B1H) experiments and found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites.

Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-62 (2013)

We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Gordân et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.
Cell Rep. 3, 1093-1104 (2013)

How transcription factors (TFs) with highly similar DNA binding-site motifs recognize distinct targets in vivo is poorly understood. In this study, we show in collaboration with Martha Bulyk's lab that the paralogous Saccharomyces cerevisiae TFs Cbf1 and Tye7 exhibit different DNA binding preferences both in vitro and in vivo, depending on the genomic context of the sites. Results of computational analyses suggest that nucleotides outside of their core binding sites contribute to specificity by influencing the three-dimensional structure of the DNA targets.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Lazarovici et al. Probing DNA shape and methylation state on a genomic scale with DNase I.
Proc. Natl. Acad. Sci. USA 110, 6376-6381 (2013)

To address the relationship between DNase I cleavage rate and minor groove geometry, we predicted DNA shape parameters for sequences covering the entire range from highly to poorly cleavable. The variation in these shape parameters turned out to be highly predictive of the variation in cleavage rate. Other insights obtained from this project in collaboration with Harmen Bussemaker's and John Stamatoyannopoulos' labs were related to DNA methylation. We found that even though cytosine methylation happens in the major groove, one of its key effects is to narrow the minor groove. Thus, varying the base sequence of genomic DNA is not the only way in which the cell can modulate the landscape of minor groove shape along its genome.

Chang et al. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen.
Cell Rep. 3, 1117-1127 (2013)

The first essential step in activating genomic DNA replication is the site-specific assembly of initiator proteins on origin (ori) DNA, a process that is not well characterized. In collaboration with Xiaojiang Chen's lab, we report a major step toward understanding this process by determining the long-sought cocrystal structure of the SV40 initiator/helicase, large tumor antigen (LTag), in complex with its ori DNA. The structure shows that multidomain LTag assembles on ori DNA differently from what one would expect from previous studies. The structure also reveals an intrinsic DNA shape readout mechanism using histidines.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2011
Slattery et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)

In vivo transcription factor-DNA recognition is much more specific than in vitro binding. The eight Drosophila Hox proteins bind to very similar target sites but execute distinct in vivo functions. The figure illustrates that the cofactor Exd (yellow) unlocks a wide range in specificity of Hox proteins (cyan) for recognizing DNA target sites (metallic). Based on SELEX-seq experiments, we present specificity fingerprints of Hox proteins and reveal that DNA shape is a determining factor in achieving specificity. This is the first study, for which a preliminary version of our new approach for high-throughput DNA shape prediction has been applied to thousands of sequences, showing that anterior and posterior Hox proteins recognize different DNA shape. Moreover, DNA shape indicates how Hox genes have differentiated in evolution.

Bishop et al. A map of minor groove shape and electrostatic potential from hydroxyl radical cleavage patterns of DNA
ACS Chem. Biol. 6(12), 1314-1320 (2011)


The figure shows a nucleosome, the fundamental repeating unit of chromatin, bound to a compatible site in a vast sea of genomic DNA. We used the hydroxyl radial as a chemical probe to map the electrostatic potential along the DNA minor groove, and show how DNA shape underlies nucleosome formation. The nucleotide sequences in the background are occupied by nucleosomes in yeast.


Rohs et al. The role of DNA shape in protein-DNA recognition
Nature 461, 1248-1253 (2009)

nucleosome The figure illustrates the molecular shape of nucleosomal DNA when wrapped around the histone core. The narrow minor groove is color-coded in dark grey. The red mesh shows an isopotential surface with negative electrostatic potential. The shape of narrow minor groove regions induces an enhanced negative electrostatic potential, which attracts histone arginines. Such interactions between the protein and DNA contribute to the stabilization of the nucleosome core particle.

Rohs et al. Origin of specificity in protein-DNA recognition
Annu. Rev. Biochem. 79, 233-269 (2010)

In order to carry out their unique biological functions, proteins need to recognize their DNA binding sites in a highly specific manner. Specificity in protein-DNA binding is achieved through the recognition of both linear sequence and three-dimensional structure. Therefore, the nucleotide sequence of a binding site is only one part of the story, and the three-dimensional structures of both the DNA and the protein must be taken into account to fully understand recognition on a molecular basis. DNA shape is specifically recognized by a variety of protein families, and we have identified different ways of modulating DNA shape. The figure shows the shape of the molecular surface (top) of ideal A-DNA (left), B-DNA (center), and Z-DNA, and the resulting specific variations in electrostatic potential (bottom).

July 9, 2015
Our latest Genome research paper relates TF binding to the DNA motif environment. Congratulations, Iris!

July 2, 2015
Arg and His read DNA shape - Lys can also do the trick. Our new paper appeared in Nat. Commun.

June 26, 2015
Remo co-organized a successful and fully funded BIRS workshop on protein-DNA recognition in Oaxaca, Mexico.

June 3, 2015
We published a News & View article in Nat. Struct. Mol. Biol. on the secrets of sex determination.

May 18, 2015
Lin receives a prestigious international award, the Dan David Prize Scholarship, at Tel Aviv University. Congradulations, Lin!

May 5, 2015
A collaborative paper with the Paro group published in Cell Rep. shows that DNA shape is predictive of replication origins.

April 28, 2015
Our Genome Res. paper with Segal lab is highlighted in Nature Genetics.

April 21, 2015
Carolina receives the Willian Trusten Award and Lin receives the Harrison Kurtz Award from the Dept. of Biological Science. Congradulations, Carolina and Lin!

April 17, 2015
Our latest Cell paper that deconvolves DNA shape and sequence​ readout is highlighted in Nature Reviews Genetics.

April 6, 2015
Our recent PNAS paper on the quantitative modeling of TF binding specificities is discussed in a commentary in PNAS.

April 2, 2015
A USC press release highlights our recent papers in Cell, PNAS, and Genome Research.

April 2, 2015
Paper that deconvolves the recognition of DNA shape from sequence is published in Cell. Congratulations, Iris and Lin!

March 11, 2015
Paper on TF specificity determinants outside the core motif in collaboration with the Segal lab is published in Genome Research.

March 9, 2015
Paper on the quantitative modeling of TF binding specificities using DNA shape is published in PNAS. Congratulations, Tianyin!

January 1, 2015
Paper on genome browser database for DNA shape annotation published in NAR. Congratulations, Tsu-Pei!

Recent news

June 22-26, 2015
Banff International Research Station Workshop "Rules of Protein-DNA Recognition: Compuatational & Experimental Advances", Oaxaca, Mexico

June 9-13, 2015
The 19th Conversation on Biomolecular Structure & Dynamics, Albany, NY

April 9, 2015
Program in Bioinformatics, Boston University, Boston, MA

March 25, 2015
Bioinformatics & Computational Biology Program, Iowa State University, Ames, IA

February 27, 2015
Program in Computational Biology, Carnegie Mellon University & University of Pittsburgh, Pittsburgh, PA

February 9, 2015
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany

Recent presentations

BISC 481 syllabus
Strctural Bioinformatics from Atoms to Cells