Characterizing treatment pathways at scale using the OHDSI network
- aDepartment of Biomedical Informatics, Columbia University Medical Center, New York, NY 10032;
- bMedical Informatics Services, NewYork-Presbyterian Hospital, New York, NY 10032;
- cObservational Health Data Sciences and Informatics, New York, NY 10032;
- dEpidemiology Analytics, Janssen Research and Development, Titusville, NJ 08560;
- eCenter for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN 46205;
- fCenter for Biomedical Informatics Research, Stanford University, CA 94305;
- gDepartment of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea, 443-380;
- hLister Hill National Center for Biomedical Communications (National Library of Medicine), National Institutes of Health, Bethesda, MD 20894;
- iDepartment of Biomathematics, University of California, Los Angeles, CA 90095;
- jDepartment of Biostatistics, University of California, Los Angeles, CA 90095;
- kDepartment of Human Genetics, University of California, Los Angeles, CA 90095;
- lReal World Evidence Solutions, IMS Health, Burlington, MA 01809;
- mDepartment of Medicine, University of Colorado School of Medicine, Aurora, CO 80045;
- nDepartment of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212;
- oGeriatric Research, Education and Clinical Center, VA Tennessee Valley Healthcare System, Nashville, TN 37212;
- pDepartment of Preventive Medicine, University of Southern California, Los Angeles, CA 90089;
- qDepartment of Pediatrics, University of Southern California, Los Angeles, CA 90089;
- rDivision of Health Sciences, University of South Australia, Adelaide, SA, Australia 5001;
- sDepartment of Statistics, Columbia University, New York, NY 10027
See allHide authors and affiliations
Edited by Richard M. Shiffrin, Indiana University, Bloomington, IN, and approved April 5, 2016 (received for review June 14, 2015)

Abstract
Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.
Footnotes
- ↵1To whom correspondence should be addressed. Email: hripcsak{at}columbia.edu.
Author contributions: G.H., P.B.R., J.D.D., N.H.S., C.G.R., and D. Madigan designed research; G.H., P.B.R., J.D.D., N.H.S., R.W.P., V.H., M.A.S., A.P., J.M.B., and D. Madigan performed research; G.H., P.B.R., J.D.D., M.A.S., M.J.S., F.J.D., and D. Madigan contributed new reagents/analytic tools; G.H., P.B.R., J.D.D., N.H.S., R.W.P., V.H., M.A.S., F.J.D., A.P., J.M.B., and D. Madigan analyzed data; and G.H., P.B.R., J.D.D., N.H.S., R.W.P., V.H., M.A.S., M.J.S., F.J.D., A.P., J.M.B., C.G.R., L.M.S., M.E.M., D. Meeker, N.P., and D. Madigan wrote the paper.
The authors declare no conflict of interest.
This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, “Drawing Causal Inference from Big Data,” held March 26–27, 2015, at the National Academies of Sciences in Washington, DC. The complete program and video recordings of most presentations are available on the NAS website at www.nasonline.org/Big-data.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1510502113/-/DCSupplemental.
Citation Manager Formats
Article Classifications
- Biological Sciences
- Medical Sciences