Predicting Social Security numbers from public data

July 7, 2009
106 (27) 10975-10980
Science Sessions podcast
Privacy and Social Security numbers

Abstract

Information about an individual's place and date of birth can be exploited to predict his or her Social Security number (SSN). Using only publicly available information, we observed a correlation between individuals' SSNs and their birth data and found that for younger cohorts the correlation allows statistical inference of private SSNs. The inferences are made possible by the public availability of the Social Security Administration's Death Master File and the widespread accessibility of personal information from multiple sources, such as data brokers or profiles on social networking sites. Our results highlight the unexpected privacy consequences of the complex interactions among multiple data sources in modern information economies and quantify privacy risks associated with information revelation in public forums.

Continue Reading

Acknowledgments.

We thank Jimin Lee, Ihn Aee Choi, Dhruv Deepan Mohindra, and, in particular, Ioanis Alexander Biternas Wischnienski for outstanding research assistantship, and Mike Cook, Stephen Fienberg, John Miller, Mel Stephens, several colleagues and workshop participants, and 2 anonymous referees for insightful comments and criticisms (see SI Appendix for an extended list). We gratefully acknowledge research support from the National Science Foundation under Grant 0713361, from the U.S. Army Research Office under Contract DAAD190210389, from the Carnegie Mellon Berkman Fund, and from the Pittsburgh Supercomputing Center.

Supporting Information

Supporting Appendix (PDF)
Supporting Information

References

2
W Long, Social Security numbers issued: A 20-year review. Social Security Bulletin 56, 83–86 (1993).
3
, Identity theft and your Social Security number. www.ssa.gov/pubs/10064.html. (2007).
4
, Social Security numbers: Use is widespread and protections vary. www.gao.gov/new.items/d04768t.pdf. (2004).
5
, Combating identity theft: A strategic plan. www.idtheft.gov/reports/StrategicPlan.pdf. (2007).
6
L Sweeney, Protecting job seekers from identity theft. IEEE Internet Comput 10, 74–78 (2006).
7
C Hoofnagle, Security breach notification laws: Views from Chief Security Officers., http://groups.ischool.berkeley.edu/samuelsonclinic/files/cso_study.pdf. (2007).
8
, Internet resellers provide few full SSNs, but Congress should consider enacting standards for truncating SSNs. www.gao.gov/new.items/d06495.pdf. (2006).
9
J Franklin, V Paxson, A Perrig, S Savage, An inquiry into the nature and causes of the wealth of Internet miscreants. Computer and Communications Security Conference (Association for Computing Machinery, New York), pp. 375–388 (2007).
10
R Gross, A Acquisti, Information revelation and privacy in online social networks. ACM Workshop on Privacy in the Electronic Society. (Association for Computing Machinery, New York), pp. 71–80 (2005).
11
L Sweeney, Weaving technology and policy together to maintain confidentiality. J Law Medicine Ethics 25, 98–110 (1997).
12
, Report to Congress on options for enhancing the social security card. www.ssa.gov/history/reports/ssnreport.html. (1997).
13
, Social Security numbers: The SSN numbering scheme. www.ssa.gov/history/ssn/geocard.html.
14
G Block, G Matanoski, R Seltser, A method for estimating year of birth using Social Security number. Am J Epidemiol 118, 377–395 (1983).
15
L Sweeney, SOS Social Security number watch., http://privacy.cs.cmu.edu/dataprivacy/projects/ssnwatch/index.html. (2004).
16
J Crow, B Bennett Structure of Social Security Numbers, http://w2.eff.org/PrivacyID_SSN_fingerprinting/ssn_structure.article.
17
, Report to Congress under sections 318 and 319 of the Fair and Accurate Credit Transactions Act of 2003. www.ftc.gov/reports/facta/041209factarpt.pdf. (2004).
18
R Anderson, Method for constructing complete annual U.S. life tables. Vital and Health Statistics (National Center for Health Statistics, Hyattsville, MD, Ser 2, No 129. (1999).
19
C Papadimitriou Computational Complexity (Addison–Wesley, Reading, MA, 1994).
20
National Data Breach Analysis (ID Analytics, San Diego, 2006).
21
C Hoofnagle, Identity theft: Making the known unknowns known. Harvard J Law Technol 21, 98–122 (2007).
22
National Fraud Ring Analysis: Understanding Behavioral Patterns (ID Analytics, San Diego, 2005).
23
M Jakobsson, S Myers Phishing and Counter-Measures (Wiley, New York, 2006).
24
T Jagatic, N Johnson, M Jakobsson, F Menczer, Social phishing. Commun Assoc Comput Machinery 50, 94–100 (2007).
25
D Florêncio, C Herley, B Coskun, Do strong web passwords accomplish anything? USENIX HOTSEC, pp. 1–6, 2007, www.usenix.org/event/hotsec07/tech/full_papers/florencio/florencio.pdf. (2007).
26
E Cooke, F Jahanian, D Mcpherson, The zombie roundup: Understanding, detecting, and disrupting botnets. USENIX SRUTI, pp. 39–44, www.usenix.org/event/sruti05/tech/full_papers/cooke/cooke.pdf. (2005).
27
National Report on Identity Fraud (ID Analytics, San Diego, 2003).
28
IdentityTheft and Your Social Security Numberwww.ssa.gov/pubs/10064.pdf. (2007).
29
AM Matwyshyn, Penetrating the zombie collective: Spam as an international security issue. SCRIPTed 3 (2006).
30
Retail Hacking Ring Charged for Stealing and Distributing Credit and Debit Card Numbers from Major U.S. Retailerswww.usdoj.gov/opa/pr/2008/August/08-ag-689.html. (2008).
32
M Lesk, The new front line: Estonia under cyberassault. IEEE Security Privacy 5, 76–79 (2007).
33
Social Security Numbers Are Widely Available in Bulk and Online Records, but Changes to Enhance Security Are Occurringwww.gao.gov/new.items/d081009r.pdf, GAO-08-1009R. (2008).
34
A Acquisti, R Gross, Social insecurity: The unintended consequences of identity fraud prevention policies. Tech rep (Carnegie Mellon Univ, Pittsburgh, 2009).
36
G Duncan, SA Keller-McNulty, SL Stokes, Disclosure risk vs. data utility: The R–U confidentiality map. Tech rep no. 121 (National Institute of Statistical Sciences, Research Triangle Park, NC, 2001).
37
HR Varian, Economic aspects of personal privacy. Privacy and Self-Regulation in the Information Age (National Telecommunications and Information Administration, Washinton, DC, 1996).
38
Security in Numbers: Social Security Numbers and Identity Theftwww.ftc.gov/os/2008/12/P075414ssnreport.pdf. (2008).
39
D Solove, Identity theft, privacy, and the architecture of vulnerability. Hastings Law J 54, 1227–1252 (2003).
40
, Protecting the integrity of Social Security numbers. Federal Register 72, 36540 (2007).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 106 | No. 27
July 7, 2009
PubMed: 19581585

Classifications

Submission history

Received: January 18, 2009
Published online: July 7, 2009
Published in issue: July 7, 2009

Keywords

  1. identity theft
  2. online social networks
  3. privacy
  4. statistical reidentification

Acknowledgments

We thank Jimin Lee, Ihn Aee Choi, Dhruv Deepan Mohindra, and, in particular, Ioanis Alexander Biternas Wischnienski for outstanding research assistantship, and Mike Cook, Stephen Fienberg, John Miller, Mel Stephens, several colleagues and workshop participants, and 2 anonymous referees for insightful comments and criticisms (see SI Appendix for an extended list). We gratefully acknowledge research support from the National Science Foundation under Grant 0713361, from the U.S. Army Research Office under Contract DAAD190210389, from the Carnegie Mellon Berkman Fund, and from the Pittsburgh Supercomputing Center.

Notes

See Commentary on page 10877.
*
SSNs have been found in public records of federal agencies, states, counties, courts, hospitals, and so forth (5), as well as in personal documents, such as online résumés (6).
Companies exchange SSNs in personal information markets, and individuals obtain “credit reports,” containing their SSNs, from credit bureaus. However, the GAO recently found that only a few brokers offering SSNs for sale to the general public are actually able to sell whole SSNs (8). Stolen SSNs are lucratively exchanged in underground cybermarkets (9).
This article contains supporting information online at www.pnas.org/cgi/content/full/0904891106/DCSupplemental.
Recent legislative initiatives have focused on restricting the public usage of only the SSNs' first 5 digits, allowing the last 4 to remain associated with names in public documents (see www.ncsl.org/programs/lis/privacy/SSN2007.htm).
§
In the practice known as “pretexting” (5), criminals contact financial services and use information already available to them—such as names and partial SSNs—to learn the remaining SSN digits.

Authors

Affiliations

Alessandro Acquisti1 [email protected]
Carnegie Mellon University, Pittsburgh, PA 15213
Ralph Gross
Carnegie Mellon University, Pittsburgh, PA 15213

Notes

1
To whom correspondence should be addressed. E-mail: [email protected]
Communicated by Stephen E. Fienberg, Carnegie Mellon University, Pittsburgh, PA, May 5, 2009

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Predicting Social Security numbers from public data
    Proceedings of the National Academy of Sciences
    • Vol. 106
    • No. 27
    • pp. 10873-11425

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media