T-PIC (Tree shape Peak Identification
for ChIP-Seq) is a free software for determining
DNA/protein binding sites from a ChIP-Seq experiment.
The method is discussed in our paper.
V. Hower, S.N. Evans, L. Pachter, Shape-based peak identification for ChIP-Seq, BMC Bioinformatics, 12:15 (2011). DOI:10.1186/1471-2105-12-15
The identification of binding targets for proteins using
ChIP-Seq has gained popularity as an alternative to
ChIP-chip. Sequencing can, in principle, eliminate artifacts
associated with microarrays, and cheap sequencing offers the
ability to sequence deeply and obtain a comprehensive survey
of binding. A number of algorithms have been developed to
call "peaks" representing bound regions from mapped reads.
Most current algorithms incorporate multiple heuristics, and
despite much work it remains difficult to accurately
determine individual peaks corresponding to distinct binding
Our method for identifying statistically significant peaks
from read coverage is inspired by the notion of persistence
in topological data analysis and provides a non-parametric
approach that is statistically sound and robust to noise in
experiments. Specifically, our method reduces the peak
calling problem to the study of tree-based statistics
derived from the data. We validate our approach using
previously published data and show that it can discover
previously missed regions.
The difficulty in accurately calling peaks for ChIP-Seq
data is partly due to the difficulty in defining peaks, and
we demonstrate a novel method that improves on the accuracy
of previous methods in resolving peaks. Our introduction of
a robust statistical test based on ideas from topological
data analysis is also novel. Our methods are implemented in
a program called T-PIC (Tree shape Peak Identification for
ChIP-Seq) is available at http://www.math.miami.edu/~vhower/tpic.html.