Additional Data

N. Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)
Supplementary Information

Download

SYNOPSIS
./encode.py <input_path> <output_path> <path_to_encode.R> <path_to_feature_list_file>

DESCRIPTION
encode.py is a python script that invokes the R script encode.R to do feature encoding for data files under the input_path directory, and save the output encoded files to the output_path directory. The paths to the encode.R and the feature_list file need to be specified. The feature_list file allows users to customize for desired features. You may need to compile the DNAshape program first by going to folder DNAshape_v2.6 and run "make install".

The data files are expected to have the aligned sequences of the same length as the first column and the corresponding measured binding affinity signal as the second column as the following:

<Sequence 1> <Affinity 1>
<Sequence 2> <Affinity 2>
......
<Sequence N> <Affinity N>

Here is one example: (extracted from the file "sample_input.txt.s" in the folder "Sample_input")

TTGTCAATTATATGCTAAG  0.8
GCTGAGGTTACACTTGACT  0.6
...
TGCAGAGTTACGACATTAG  0.9

For a given input data file, the following features can be generated:
Sequence features: A mapped to [1,0,0,0], C mapped to [0,1,0,0], G mapped to [0,0,1,0], T mapped to [0,0,0,1]
MGW features: DNA minor groove width values normalized
Roll features: Roll angle between adjacent base-pairs normalized
ProT features: Propeller twist between paired bases normalized
HelT features: Helix twist between adjacent base-pairs normalized

Users can customize for desired features by modifying the file feature_list. The first column in feature_list specifies names of the desired feature combinations, which can be named as any string without space. The second column must be a 5-bit binary string that toggles the output status of sequence features, MGW features, Roll features, ProT features, and HelT features, respectively. For example, "11111" will enable the encoding of all five features. "10000" will enable the encoding of only sequence features. It is trivial to toggle the option in encode.R such that the final affinity uses the original value or its logarithm.

The output encoded files contain the affinity values, or responses, as the first column, followed by a constant column of 1's, followed by sequence features (if enabled), followed by MGW features (if enabled), followed by ProT features (if enabled), followed by HelT features (if enabled).

Feel free to contact us if you have any questions. (yang23@usc.edu)

February 6, 2017
Our new Mol. Syst. Biol. paper provides systematic analysis of DNA shape readout for many protein families. Congrats, Lin!

November 30, 2016
Our new Nature paper with the Leibniz Institute on Aging reveals role of Hoxa9 in muscle stem cell aging.

November 8, 2016
Our recent Dror et al. Genome Res. paper received a RECOMB/ISCB Top-10 Paper Award in regulatory and systems genomics in 2015/16.

August 31, 2016
Carolina defended her Ph.D. thesis with flying colors. Congratulations, Carolina!

August 18, 2016
Our new paper proves the impact of DNA shape on in vivo TF binding based on 400 human ChIP-seq datasets.

August 16, 2016
Remo was promoted to Full Professor of Biological Sciences at USC. Fight on!

July 14, 2016
Remo was elected Head of Computational Biology and Bioinformatics at USC. Fight on!

June 6, 2016
Lin defended his Ph.D. thesis with flying colors. Congratulations, Lin!

May 4, 2016
Tsu-Pei received a competitive Enhancement Fellowship from the USC Graduate School. Congratulations, Tsu-Pei!

May 3, 2016
Lin received the highest honor for a USC graduate student, the PhD Achievement Award. Congratulations, Lin!

May 3, 2016
Remo was introduced as the incoming Vice Chair of the Department of Biological Sciences and Director of Biological Sciences Studies.

April 20, 2016
Remo presented our recent Zhou et al. PNAS paper as one of the few selected Highlights at the recent RECOMB conference.

April 19, 2016
Carolina received the Harrison M. Kurtz Award and Tsu-Pei the William E. Trusten Award. Congrats, Carolina and Tsu-Pei!.

April 6, 2016
Remo received the USC Mentoring award in the category mentoring of graduate students. Best award ever!

March 16, 2016
Remo received the ACS OpenEye Outstanding Junior Faculty Award in Computational Chemistry at the American Chemical Society National Meeting.

January 28. 2016
Remo received Tenure at USC and was promoted to Associate Professor. Fight on!

November 18, 2015
Our recent Abe et al. Cell and Zhou et al. PNAS papers were voted as RECOMB/ISCB Top Papers in regulatory and systems genomics in 2014/15.

Recent news

August 21-25, 2016
Symposium on Modeling Water and Solvation in Biochemistry: Developments and Applications, American Chemical Society National Meeting, Philadelphia, PA

July 5-8, 2016
Meeting on Measuring and Modeling Quantitative Sequence-Function Relationships, Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY

April 17-21, 2016
RECOMB 2016 Conference, Santa Monica, CA

March 23, 2016
Leibniz Institute on Aging - Fritz Lipmann Institute, Jena, Germany

March 15-19, 2016
CSHL Meeting on Systems Biology: Global Regulation of Gene Expression, Cold Spring Harbor Laboratory, NY

March 7-10, 2016
Workshop on Regulatory Genomics and Epigenomics, Simons Institute for the Theory of Computing, UC Berkeley, Berkeley, CA

February 5-7, 2016
Bridge@USC and Michelson Center for Convergent Biosciences Retreat, Catalina Island, CA

January 31- February 5, 2016
Epigenomics 2016 Meeting, Rio Mar, Puerto Rico

January 19, 2016
Bioinformatics and Computational Biology Research Center, Cedars-Sinai Medical Center, Los Angeles, CA

Recent presentations

BISC 321 syllabus
Multidisciplinary Seminar: Science, Technology, and Society

BISC 577a syllabus
Computational Molecular Biology Laboratory