Skip to content

HitTracy/DAPPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DAPPL

Script repository for DAPPL. The R packages used include Biostrings_2.50.2, seqinr_3.4-5, GenomicRanges_1.34.0, ggseqlogo_0.1. The known motif database is CIS-BP 2.00.

Introduction

This is not a complete executable software.This repository contains the key code for analyzing DAPPL.

Analysis process

Step 1. Cut_seq_N20.R or Cut_seq_N16.R

The original data generated by sequencing is usually large and needs to be cut into small files with shell commands (it is recommended that 100k reads each file).Then use function in "Cut_seq_N20.R" to cut the binding sequences and barcodes from the raw data set. The sequences were assigned to each protein according to annotation file(here has a demo: annotation/annotation_sample.csv). The annotation file should be provided by the biological experiment designer.

Step 2. Call_motif_GST.R

Useing the function in "Call_motif_GST.R" to get the enriched sequences. Then calling funciton named make_sh to generate the shell script. Here you may need to adjust the code based on your Homer installation. Then run the shell script to call the motifs.

Step 3. Compare_modified.R

Using the function in "Compare_modified.R" to identification of epigenetic modification dependent TF-DNA interactions. To correct the possible library-specific bias, we first performed the Lowess data normalization for GST from different modified libraries. Then we compared the adjusted frequencies of 6-mer obtained from modified library and unmodified library.

Step 4. Re_draw_match_logo.R

In order to get modified motif,the motif calculated by Homer needs to be matched again in the modified library. the binding sequence containing the modified site (the modified CG in the random sequence) is taken out. The modifed motif is reconstructed based on these sequence .