Vladimir A. Kuznetsov[*], (email@example.com)
Yuriy L. Orlov (firstname.lastname@example.org)
Chia Lin Wei (email@example.com)
Yijun Ruan (firstname.lastname@example.org)
[*]Correspondent author: V.A.K.
Genome Institute of Singapore, Biopolis street, 60, 138672 Singapore
Advances in high-throughput technologies, such as ChIP-chip and ChIP-PET (Chromatin Immuno-Precipitation Paired-End diTag), and the availability of human and mouse genome sequences now allow us to identify transcription factor binding sites (TFBS) and analyze mechanisms of gene regulation on the level of the entire genome. Here, we have developed a computational approach which uses ChIP-PET data and statistical modeling to assess experimental noise and identify reliable TFBS for c-Myc, STAT1 and p53 transcription factors in the human genome. We propose a mixture probabilistic model and develop computational programs for Monte Carlo simulation of ChIP-PET data to define the background noise of the sequence clustering and to identify the probability function of specific DNA-protein binding in the eukaryotic genome. Our approach demonstrates high reproducibility of the method and not only distinguishes bona fide TFBSs from non-specific TFBSs with a high specificity, but also provides algorithmic and computational basis for further optimization of experimental parameters of the ChIP-PET method.