Science and data science

Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved June 16, 2017 (received for review March 15, 2017)
August 7, 2017
114 (33) 8689-8692

Abstract

Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions and insights. In this article, we ask why scientists should care about data science. To answer, we discuss data science from three perspectives: statistical, computational, and human. Although each of the three is a critical component of data science, we argue that the effective combination of all three components is the essence of what data science is about.

Continue Reading

References

1
G Press, A very short history of data science. Forbes. Available at https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#4d91ab9455cf. Accessed June 1, 2017. (May 28, 2013).
2
T Persons, et al. Data and Analytics Innovation: Emerging Opportunities and Challenges (US Government Accountability Office, Washington, DC, 2016).
3
D Donoho, 50 Years of Data Science, Proceedings of the Tukey Centennial Workshop (Princeton). Available at https://pdfs.semanticscholar.org/f564/25ec56586dcfd2694ab83643e9e76f314e91.pdf. Accessed June 30, 2017. (2015).
4
J Ridgway, Implications of the data revolution for statistics education. Int Stat Rev 84, 528–549 (2015).
5
V Dhar, Data science and prediction. Commun ACM 56, 64–73 (2013).
6
L Getoor, et al., Computing research and the emerging field of data science. CRA Bulletin. Available at cra.org/data-science/. Accessed December 30, 2017. (2016).
7
JW Tukey, The future of data analysis. Ann Math Stat 33, 1–67 (1962).
8
A Gelman, et al. Bayesian Data Analysis (CRC, 2nd Ed, Boca Raton, FL, 2014).
9
K Murphy Machine Learning: A Probabilistic Approach (MIT Press, Cambridge, MA, 2013).
10
D Barber Bayesian Reasoning and Machine Learning (Cambridge Univ Press, Cambridge, UK, 2012).
11
T Hastie, R Tibshirani, M Wainwright Statistical Learning with Sparsity: The Lasso and Generalizations (CRC, Boca Raton, FL, 2015).
12
MI Jordan, TM Mitchell, Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015).
13
Y LeCun, Y Bengio, G Hinton, Deep learning. Nature 521, 436–444 (2015).
14
J Pearl Causality (Cambridge Univ Press, 2nd Ed, Cambridge, UK, 2009).
15
G Imbens, D Rubin Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction (Cambridge Univ Press, Cambridge, UK, 2015).
16
S Morgan, C Winship Counterfactuals and Causal Inference (Cambridge Univ Press, 2nd Ed, Cambridge, UK, 2015).
17
S Sra, S Nowozin, S Wright Optimization for Machine Learning (MIT Press, Cambridge, MA, 2012).
18
B Efron, R Tibshirani An Introduction to the Bootstrap (Chapman & Hall/CRC, Boca Raton, FL, 1993).
19
C Robert, G Casella, Monte Carlo Statistical Methods, Springer Texts in Statistics (Springer, New York), 2nd Ed. (2004).
20
PJ Green, K Łatuszyński, M Pereyra, CP Robert, Bayesian computation: A summary of the current state, and samples backwards and forwards. Stat Comput 25, 835–862 (2015).
21
J Dean, S Ghemawat, MapReduce: Simplified data processing on large clusters. Commun ACM 51, 107–113 (2008).
22
, eds R Bekkerman, M Bilenko, J Langford (Cambridge Univ Press, Cambridge, UK Scaling Up Machine Learning: Parallel and Distributed Approaches, 2011).
23
MI Jordan, On statistics, computation and scalability. Bern 19, 1378–1390 (2013).
24
WS Cleveland, Data science: An action plan for expanding the technical areas of the field of statistics. Int Stat Rev 69, 21–26 (2001).
25
J Hardin, et al., Data science in statistics curricula: Preparing students to “think with data.”. Am Stat 69, 343–353 (2015).
26
A Goodman, et al., Ten simple rules for the care and feeding of scientific data. PLOS Comput Biol 10, e1003542 (2014).
27
CL Borgman, et al., Knowledge infrastructures in science: Data, diversity, and digital libraries. Int J Digit Libr 16, 207–227 (2015).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 114 | No. 33
August 15, 2017
PubMed: 28784795

Classifications

Submission history

Published online: August 7, 2017
Published in issue: August 15, 2017

Keywords

  1. data science
  2. statistics
  3. machine learning

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

David M. Blei1 [email protected]
Department of Computer Science, Columbia University, New York, NY 10027;
Department of Statistics, Columbia University, New York, NY 10027;
Data Science Institute, Columbia University, New York, NY 10027;
Padhraic Smyth
Department of Computer Science, University of California, Irvine, CA 92697;
Department of Statistics, University of California, Irvine, CA 92697

Notes

1
To whom correspondence should be addressed. Email: [email protected].
Author contributions: D.M.B. and P.S. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Science and data science
    Proceedings of the National Academy of Sciences
    • Vol. 114
    • No. 33
    • pp. 8661-E7031

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media