Phys. Rev. E 72, 041917 (2005) [6 pages]

Segmentation algorithm for DNA sequences

Download: PDF (286 kB) or Buy this Article (Use Article Pack) Export: BibTeX or EndNote (RIS)

Chun-Ting Zhang1 *, Feng Gao1, and Ren Zhang2
1Department of Physics, Tianjin University, Tianjin 300072, China
2Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, China

Received 7 March 2005; published 17 October 2005

A new measure, to quantify the difference between two probability distributions, called the quadratic divergence, has been proposed. Based on the quadratic divergence, a new segmentation algorithm to partition a given genome or DNA sequence into compositionally distinct domains is put forward. The new algorithm has been applied to segment the 24 human chromosome sequences, and the boundaries of isochores for each chromosome were obtained. Compared with the results obtained by using the entropic segmentation algorithm based on the Jensen-Shannon divergence, both algorithms resulted in all identical coordinates of segmentation points. An explanation of the equivalence of the two segmentation algorithms is presented. The new algorithm has a number of advantages. Particularly, it is much simpler and faster than the entropy-based method. Therefore, the new algorithm is more suitable for analyzing long genome sequences, such as human and other newly sequenced eukaryotic genome sequences.


©2005 The American Physical Society

URL: http://link.aps.org/doi/10.1103/PhysRevE.72.041917
DOI: 10.1103/PhysRevE.72.041917
PACS: 87.15.Cc

* Email: ctzhang@tju.edu.cn

[ Abstract  |  Previous article  |  Next article  |  Issue 4 ]