Science and data science
Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved June 16, 2017 (received for review March 15, 2017)
Abstract
Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions and insights. In this article, we ask why scientists should care about data science. To answer, we discuss data science from three perspectives: statistical, computational, and human. Although each of the three is a critical component of data science, we argue that the effective combination of all three components is the essence of what data science is about.
References
1
G Press, A very short history of data science. Forbes. Available at https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#4d91ab9455cf. Accessed June 1, 2017. (May 28, 2013).
2
T Persons, et al. Data and Analytics Innovation: Emerging Opportunities and Challenges (US Government Accountability Office, Washington, DC, 2016).
3
D Donoho, 50 Years of Data Science, Proceedings of the Tukey Centennial Workshop (Princeton). Available at https://pdfs.semanticscholar.org/f564/25ec56586dcfd2694ab83643e9e76f314e91.pdf. Accessed June 30, 2017. (2015).
4
J Ridgway, Implications of the data revolution for statistics education. Int Stat Rev 84, 528–549 (2015).
5
V Dhar, Data science and prediction. Commun ACM 56, 64–73 (2013).
6
L Getoor, et al., Computing research and the emerging field of data science. CRA Bulletin. Available at cra.org/data-science/. Accessed December 30, 2017. (2016).
7
JW Tukey, The future of data analysis. Ann Math Stat 33, 1–67 (1962).
8
A Gelman, et al. Bayesian Data Analysis (CRC, 2nd Ed, Boca Raton, FL, 2014).
9
K Murphy Machine Learning: A Probabilistic Approach (MIT Press, Cambridge, MA, 2013).
10
D Barber Bayesian Reasoning and Machine Learning (Cambridge Univ Press, Cambridge, UK, 2012).
11
T Hastie, R Tibshirani, M Wainwright Statistical Learning with Sparsity: The Lasso and Generalizations (CRC, Boca Raton, FL, 2015).
12
MI Jordan, TM Mitchell, Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015).
13
Y LeCun, Y Bengio, G Hinton, Deep learning. Nature 521, 436–444 (2015).
14
J Pearl Causality (Cambridge Univ Press, 2nd Ed, Cambridge, UK, 2009).
15
G Imbens, D Rubin Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction (Cambridge Univ Press, Cambridge, UK, 2015).
16
S Morgan, C Winship Counterfactuals and Causal Inference (Cambridge Univ Press, 2nd Ed, Cambridge, UK, 2015).
17
S Sra, S Nowozin, S Wright Optimization for Machine Learning (MIT Press, Cambridge, MA, 2012).
18
B Efron, R Tibshirani An Introduction to the Bootstrap (Chapman & Hall/CRC, Boca Raton, FL, 1993).
19
C Robert, G Casella, Monte Carlo Statistical Methods, Springer Texts in Statistics (Springer, New York), 2nd Ed. (2004).
20
PJ Green, K Łatuszyński, M Pereyra, CP Robert, Bayesian computation: A summary of the current state, and samples backwards and forwards. Stat Comput 25, 835–862 (2015).
21
J Dean, S Ghemawat, MapReduce: Simplified data processing on large clusters. Commun ACM 51, 107–113 (2008).
22
, eds R Bekkerman, M Bilenko, J Langford (Cambridge Univ Press, Cambridge, UK Scaling Up Machine Learning: Parallel and Distributed Approaches, 2011).
23
MI Jordan, On statistics, computation and scalability. Bern 19, 1378–1390 (2013).
24
WS Cleveland, Data science: An action plan for expanding the technical areas of the field of statistics. Int Stat Rev 69, 21–26 (2001).
25
J Hardin, et al., Data science in statistics curricula: Preparing students to “think with data.”. Am Stat 69, 343–353 (2015).
26
A Goodman, et al., Ten simple rules for the care and feeding of scientific data. PLOS Comput Biol 10, e1003542 (2014).
27
CL Borgman, et al., Knowledge infrastructures in science: Data, diversity, and digital libraries. Int J Digit Libr 16, 207–227 (2015).
Information & Authors
Information
Published in
Classifications
Submission history
Published online: August 7, 2017
Published in issue: August 15, 2017
Keywords
Notes
This article is a PNAS Direct Submission.
Authors
Competing Interests
The authors declare no conflict of interest.
Metrics & Citations
Metrics
Citation statements
Altmetrics
Citations
Cite this article
Science and data science, Proc. Natl. Acad. Sci. U.S.A.
114 (33) 8689-8692,
https://doi.org/10.1073/pnas.1702076114
(2017).
Copied!
Copying failed.
Export the article citation data by selecting a format from the list below and clicking Export.
Cited by
Loading...
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDFLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginRecommend to a librarian
Recommend PNAS to a LibrarianPurchase options
Purchase this article to access the full text.