A versatile statistical analysis algorithm to detect genome copy number variation

November 8, 2004
101 (46) 16292-16297

Abstract

We have developed a versatile statistical analysis algorithm for the detection of genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies, such as oligonucleotide array, bacterial artificial chromosome array, or array-based comparative genomic hybridization, that operate by hybridizing with genomic material obtained from cancer and normal cells and allow detection of regions of the genome with altered copy number. The number of probes (i.e., resolution), the amount of uncharacterized noise per probe, and the severity of chromosomal aberrations per chromosomal region may vary with the underlying technology, biological sample, and sample preparation. Constrained by these uncertainties, our algorithm aims at robustness by using a priorless maximum a posteriori estimator and at efficiency by a dynamic programming implementation. We illustrate these characteristics of our algorithm by applying it to data obtained from representational oligonucleotide microarray analysis and array-based comparative genomic hybridization technology as well as to synthetic data obtained from an artificial model whose properties can be varied computationally. The algorithm can combine data from multiple sources and thus facilitate the discovery of genes and markers important in cancer, as well as the discovery of loci important in inherited genetic disease.

Continue Reading

Acknowledgments

We thank Yi Zhou (New York University Bioinformatics Group) and two anonymous reviewers for many helpful discussions, suggestions, and relevant references to statistical literature. We also thank Lakshmi Muthuswami (Cold Spring Harbor Laboratory), and Eric Schoenmakers and Joris Veltman (Nijmegen University Medical Center, Nijmegen, The Netherlands) for providing the data used here and for explaining their biological significance. The work reported in this paper was supported by grants from the National Science Foundation (NSF) Qubic Program, the NSF Information Technology Research Program, the Defense Advanced Research Projects Agency, a Howard Hughes Medical Institute Biomedical Support Research Grant, the U.S. Department of Energy, the U.S. Air Force (Air Force Research Laboratory), the National Institutes of Health, and the New York State Office of Science, Technology and Academic Research.

References

1
Lucito, R., West, J., Reiner, A., Alexander, J., Esposito, D., Mishra, B., Powers, S., Norton, L. & Wigler, M. (2000) Genome Res. 10, 1726-1736.
2
Mishra, B. (2002) Comput. Sci. Eng. 4, 42-49.
3
Daniels, M. J. & Kass, R. E. (1999) J. Am. Stat. Assoc. 94, 1254-1263.
4
Lisitsyn, N., Lisitsyn, N. & Wigler, M. (1993) Science, 258, 946-951.
5
Lucito, R., Nakimura, M., West, J. A., Han, Y., Chin, K., Jensen, K., McCombie, R., Gray, J. W. & Wigler, M. (1998) Proc. Natl. Acad. Sci. USA 95, 4487-4492.
6
Albertson, D. G. & Pinkel, D. (2003) Hum. Mol. Genet, 12, Suppl. 2, R145-R152.
7
Vissers, L. E. L. M., de Vries, B. B. A., Osoegawa, K., Janssen, I. M., Feuth, T., Choy, C. O., Straatman, H., van der Vliet, W., Huys, E. H. L. P. G., van Rijk, A., et al. (2003) Am. J. Hum. Genet. 73, 1261-1270.
8
Jeffreys, H. (1946) Proc. R. Soc. London Ser. A 186, 453-461.
9
Bernardo, J. M. (1979) J. R. Stat. Soc. Ser. B 41, 113-147.
10
Berger, J. O. & Bernardo, J. M. (1992) in Bayesian Statistics 4, eds. Berger, J. O., Bernardo, J. M., Dawid, A. P. & Smith, A. F. M. (Oxford Univ. Press, Oxford), pp. 35-60
11
Kass, R. E. & Wasserman, L. A. (1996) J. Am. Stat. Assoc. 91, 1343-1370.
12
Raiffa, H. & Schlaifer, R. (1961) Applied Statistical Decision Theory (Wiley, New York).
13
Brown, L. D. (1986) Foundations of Exponential Families (Institute of Mathematical Statistics, Hayward, CA), Monograph Series 6.
14
Bernardo, J. M. & Smith, A. F. M. (1994) Bayesian Theory (Wiley, New York).
15
Berger, J. O. (1985) Statistical Decision Theory and Bayesian Analysis (Springer, New York), 2nd Ed.
16
Carlin, B. P. & Louis, T. A. (1996) Bayes and Empirical Bayes Methods for Data Analysis (Chapman & Hall, London).
17
Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. D. (1995) Bayesian Data Analysis (Chapman & Hall, London).
18
Robert, C. P. (2001) The Bayesian Choice (Springer, New York).
19
Brown, L. D. (1971) Ann. Math. Stat. 42, 855-903.
20
Brown, L. D. (1993) in Statistical Decision Theory and Related Topics 5, eds. Gupta, S. S. & Berger, J. O. (Springer, New York), pp. 1-18.
21
Brown, L. D. (2000) J. Am. Stat. Assoc. 95, 1277-1282.
22
Strawderman, W. E. (1971) Ann. Math. Stat. 42, 385-388.
23
Strawderman, W. E. (1974) J. Multivariate Anal. 4, 255-263.
24
Strawderman, W. E. (2000) J. Am. Stat. Assoc. 95, 1364-1368.
25
Donoho, D. & Johnstone, I. M. (1998) Ann. Stat. 26, 879-921.
26
Donoho, D. & Johnstone, I. M. (1999) Stat. Sinica, 9, 1-32.
27
Donoho, D. L. (1999) Ann. Stat. 27, 859-897.
28
Anderson, T. W. (1958) An Introduction to Multivariate Statistical Analysis (Wiley, New York).
29
Wilks, S. S. (1962) Mathematical Statistics (Wiley, New York).
30
Jong, K., Marchiori, E., Meijer, G., van der Vaart, A. & Ylstra, B. (June 16, 2004) Bioinformatics, 10.1093/bioinformatics/bth355.
31
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., et al. (2004) Science 305, 525-528.
32
Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. & Jain, A. N. (2004) J. Multivariate Anal. 90, 132-153.
33
Wang, J., Meza-Zepeda, L. A., Kresse, S. H. & Myklebost, O. (2004) BMC Bioinformatics 5, 74.
34
Wang, Y. & Guo, S. W. (2004) Front. Biosci. 9, 540-549.
35
Olshen, A. B. & Venkatraman, E. S. (2002) in American Statistical Association Proceedings of the Joint Statistical Meetings (American Statistical Association, Alexandria, VA) pp. 2530-2535.
36
Arias-Castro, E., Donoho, D. & Huo, X. (2003) Technical Report 2003-22 (Department of Statistics, Stanford University, Stanford, CA).
37
Arias-Castro, E., Donoho, D. & Huo, X. (2003) Technical Report 2003-17 (Department of Statistics, Stanford University, Stanford, CA).
38
Donoho, D. & Huo, X. (2001) in Multiscale and Multiresolution Methods, Springer Lecture Notes in Computational Science and Engineering, eds. Barth, T. J., Chan, T. & Haimes, R. (Springer, New York), Vol. 20, pp. 149-196.
39
Kolaczyk, E. D. (1999) J. Am. Stat. Assoc. 94, 920-933.

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 101 | No. 46
November 16, 2004
PubMed: 15534219

Classifications

Submission history

Received: April 10, 2004
Published online: November 8, 2004
Published in issue: November 16, 2004

Keywords

  1. array-based comparative genomic hybridization
  2. copy-number fluctuations
  3. maximum a posteriori estimator

Acknowledgments

We thank Yi Zhou (New York University Bioinformatics Group) and two anonymous reviewers for many helpful discussions, suggestions, and relevant references to statistical literature. We also thank Lakshmi Muthuswami (Cold Spring Harbor Laboratory), and Eric Schoenmakers and Joris Veltman (Nijmegen University Medical Center, Nijmegen, The Netherlands) for providing the data used here and for explaining their biological significance. The work reported in this paper was supported by grants from the National Science Foundation (NSF) Qubic Program, the NSF Information Technology Research Program, the Defense Advanced Research Projects Agency, a Howard Hughes Medical Institute Biomedical Support Research Grant, the U.S. Department of Energy, the U.S. Air Force (Air Force Research Laboratory), the National Institutes of Health, and the New York State Office of Science, Technology and Academic Research.

Authors

Affiliations

Raoul-Sam Daruwala
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012
Archisman Rudra
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012
Harry Ostrer
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012
Robert Lucito
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012
Michael Wigler
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012
Bud Mishra
Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012; Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724; and Human Genetics Program, New York University School of Medicine, New York, NY 10012

Notes

To whom correspondence should be addressed. E-mail: [email protected].
R.-S.D. and A.R. contributed equally to this work.
Communicated by Jacob T. Schwartz, New York University, New York, NY, September 30, 2004

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    A versatile statistical analysis algorithm to detect genome copy number variation
    Proceedings of the National Academy of Sciences
    • Vol. 101
    • No. 46
    • pp. 16083-16391

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media