As has been recently reported, researchers from Carnegie Mellon University have announced that they have uncovered a method to accurately predict the Social Security Numbers (SSNs) of individuals by simply knowing two of the most basic and widely-available facts about people today: their dates of birth, and their States of birth. In their paper titled “Predicting Social Security Numbers from Public Data” (.pdf), researchers Alessandro Acquisti and Ralph Gross warn that they have uncovered a distinct and identifiable statistical pattern across SSNs of deceased persons – that, ironically, are made publicly available by the Social Security Administration (SSA or Agency) itself – and have used that pattern to accurately predict the SSNs of live Americans by simply knowing their birthdays and in which States they were born. In other words: “[A]ny third party with internet access and some statistical knowledge . . . [can deduce the pattern of SSN assignment] by analyzing publicly available records in the [Social Security Administration] Death Master File [and] interpolating an alive person’s state and date of birth with the patterns detected across deceased individuals.”
What has received considerably less media attention, however, is the SSA’s muted response to this fiasco, and, quite the opposite, the alarmingly broad set of explanatory guides and almost-complete SSNs that the Agency makes available to the public on their website.
While SSNs are the most ubiquitous personal identifiers today, and while their disclosure is a gateway to identity theft and other potentially disastrous mischief, the SSA’s response to the report has been astoundingly nonchalant. Rather than provide any assurances over the integrity of their SSN assignment system, the SSA instead appears amused that the researchers are taking credit for “crack[ing] a code” that, in the SSAs words, has been “a matter of public record for years.”
The SSA is right: the Agency’s website contains user-friendly guides that explain, in sometimes surprising detail, the SSN assignment system. The excerpt below, for example, is from the section helpfully titled “Structure of the Social Security Number (SSN)”:
The SSN consists of nine digits separated into three parts by hyphens (i.e., 000-00-0000) representing the area, group, and serial numbers.
1. Area Number The first three digits of the SSN are the area number. The area number reflects the State as derived from the ZIP Code in the mailing address the number holder provided on his/her application for an original SSN card.
2. Group Number The middle two digits of the SSN are the group number. The group number ranges from 01 to 99, but group numbers are not released for SSN assignment in consecutive order. Instead, for administrative reasons, group numbers are released in the following sequence:
Odd numbers from 01 through 09; then even numbers from 10 through 98; then even numbers from 02 through 08; and finally, odd numbers 11 through 99.
3. Serial Number The last four digits of the SSN are the serial number. The serial number represents a straight numerical series of numbers from 0001-9999 within each group.
The SSA website also provides a chart that enables any lay person to make a very reasonable guess at the first 3 digits of any person’s SSN if a person’s State of birth is known — and with extreme accuracy if that individual is born in smaller-sized States, such as Hawaii or Rhode Island. Other resources include:
- a list, updated monthly, of the Area Number and Group Number (or the first 5 digits of the SSN) that have been assigned each month beginning December, 2003;
- instructions on how to access the Death Master File, which researchers used to deduce the statistical patterns by which SSNs are assigned, and, presumably, any statistically savvy third party can do the same; and
- FAQs, directed to employers that, among other things, lets us know which SSNs are invalid: “no SSNs with an area number in the 800 or 900 series, or ‘000’ area number, have been assigned. No SSNs with an area number above 772 have been assigned in the 700 series.”
To its credit, the SSA at least acknowledges the obvious reality that “the use of the SSN as a general identifier has grown to the point where it is the most commonly used and convenient identifier for all types of record-keeping systems and data exchanges in the U.S.,” and that identity theft associated with SSNs is a pressing concern. While the SSA has for decades refused to adopt the most obvious safeguard of completely randomizing SSN assignment, the Agency has finally announced that it is currently developing a system to randomize all SSNs beginning next year. That system, however, would only apply to the assignment of new SSNs – and would in no way help the hundreds of millions of Americans alive today whose SSNs remain vulnerable.
- The report “Predicting Social Security Numbers from Public Data” (.pdf).
- Coverage of the report by the NYTimes, CNN, and PCWorld.
- The SSA’s website, and table of contents for general SSN information.