Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning

  1. Thouis R. Jonesa,b,c,
  2. Anne E. Carpentera,b,1,2,
  3. Michael R. Lamprechtb,
  4. Jason Moffatb,3,
  5. Serena J. Silvera,
  6. Jennifer K. Greniera,
  7. Adam B. Castorenod,
  8. Ulrike S. Eggertd,
  9. David E. Roota,
  10. Polina Gollandc and
  11. David M. Sabatinia,b,e
  1. aThe Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142;
  2. bWhitehead Institute for Biomedical Research, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, MA 02142;
  3. cComputer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139;
  4. dDana-Farber Cancer Institute and Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 250 Longwood Avenue, Boston, MA 02115; and
  5. eDepartment of Biology, Massachusetts Institute of Technology, 31 Ames Street, Cambridge, MA 02139
  1. Edited by Edward M. Scolnick, The Broad Institute, Cambridge, MA, and approved December 12, 2008

  2. 1T.R.J. and A.E.C. contributed equally to this work. (received for review September 8, 2008)

Abstract

Many biological pathways were first uncovered by identifying mutants with visible phenotypes and by scoring every sample in a screen via tedious and subjective visual inspection. Now, automated image analysis can effectively score many phenotypes. In practical application, customizing an image-analysis algorithm or finding a sufficient number of example cells to train a machine learning algorithm can be infeasible, particularly when positive control samples are not available and the phenotype of interest is rare. Here we present a supervised machine learning approach that uses iterative feedback to readily score multiple subtle and complex morphological phenotypes in high-throughput, image-based screens. First, automated cytological profiling extracts hundreds of numerical descriptors for every cell in every image. Next, the researcher generates a rule (i.e., classifier) to recognize cells with a phenotype of interest during a short, interactive training session using iterative feedback. Finally, all of the cells in the experiment are automatically classified and each sample is scored based on the presence of cells displaying the phenotype. By using this approach, we successfully scored images in RNA interference screens in 2 organisms for the prevalence of 15 diverse cellular morphologies, some of which were previously intractable.

Keywords:

Footnotes

  • 2To whom correspondence should be addressed. E-mail: anne{at}broad.mit.edu
  • Author contributions: T.R.J., A.E.C., D.E.R., P.G., and D.M.S. designed research; T.R.J., A.E.C., M.R.L., J.M., S.J.S., J.K.G., A.B.C., and U.S.E. performed research; T.R.J., A.E.C., and P.G. analyzed data; and T.R.J. and A.E.C. wrote the paper.

  • 3Present address: Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Room 802, Toronto, Ontario, Canada M5S 3E1.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/cgi/content/full/0808843106/DCSupplemental.

  • Freely available online through the PNAS open access option.

« Previous | Next Article »Table of Contents
OPEN ACCESS ARTICLE