The origin of the data is near-impossible to tell.
FTC Chair Edith Ramirez said in 2014 that her agency didn’t even know how many data brokers existed, so monitoring what all of these groups are actually doing is out of the question.
Brokers are so secretive about their transactions that they actually defied requests by not only the FTC, but Senator (then-Representative) Edward Markey, and investigations by a Senate committee to provide specifics on their clients and their data sources.
The pathological secretiveness of the industry means that once data is collected it can end up in nearly anyone’s hands though and no one has a means of tracking it.
Aside from privacy concerns, the espionage implications are the most obvious threat this poses.
The widely distributed and redundant databases of personal data in so many different unmonitored and unsecured hands, means that hostile groups could buy or steal huge amounts of data on individuals with relative ease and (ironically) anonymity.
Consider a few examples of how China could use its apparent growing database of U.S. citizens to recruit espionage assets in the U.S. The theft of SSN, home address, etc. on 4.2 million Federal employees plus the theft of the SSNs of 21.5 million individuals involved in background checks, all stolen in two separate breaches of the Office of Personnel Management databases would be a fine place to start. Supplementing it with other information available from data brokers, China could find opportunities to identify those susceptible to all the classic MICE motivations for becoming spies (Money, Ideology, Coercion, and Ego).
Money is the simplest. Data brokers keep careful track of individuals’ financial status, as it’s a critical part of their segmentation. After all, their marketer clients need to whether they should be sending someone ads for payday loan services or luxury sports cars. Records of gambling addictions, like those sold by infoUSA and MEDbase, alongside court and financial records could provide insight into sudden needs for financial support. Information stolen from the many bulk healthcare data breaches (many of which are also thought to originate in China), could provide insights into ongoing, expensive medical issues and whether insurance will cover them. If household purchase data can be obtained from credit cards databases or other sources it could also be applied to identify red flags in purchase behavior that indicate financial distress.
Ideological motives might actually be the most complicated to discern from the available data. Still, brokers’ data segments do record individuals’ political and religious inclinations (which is one of the reasons that they are so valuable in political campaigns). Online tracking cookies and browser fingerprinting could provide further insight into someone’s more secretive ideological orientation. With sufficiently detailed ideological maps of the U.S., one could also purchase phone GPS data from apps to determine whether someone meets regularly with known ideological radicals. Chinese experience with identifying their own ideological dissidents could potentially help them develop models identifying behaviors associated with concealing one’s beliefs.
The opportunities for coercion from these data are many. China could acquire information from online tracking, purchase records, GPS records, and perhaps even use the flight manifests China stole from United Airlines to identify sexual proclivities or liaisons that one might not want their spouse or the public to know about—as could acquiring the data from dating sites like AshleyMadison, of course. Experiments in the creation of “shadow profiles” demonstrate that the structure of social media networks can be used to infer sexual orientation with only a fraction of the user base disclosing that information. The renowned Cambridge-Microsoft 2013 study using Facebook Likes to reliably identify personality and other traits not only found that sexual orientation could be predicted accurately, but that so could traits like drug usage. Intriguingly, the Likes correlated with drug use often seem entirely unrelated like “I Like Lyrics that Actually Mean Something” and “No You Ask,” suggesting a strong potential to identify such behavior through seemingly innocuous information.
Ego could be gauged through a number of factors. A follow-up on the Facebook Likes study concluded that algorithmic analysis they had developed was actually more accurate at predicting the typical personality of a user than that user’s friends and family. One study in particular showed progress in determining one’s degree of narcissism from Twitter behavior (though it admitted that its forecasts were imperfect). Temporary ego fragility could also be assessed through events like messy relationship break-ups as determined by social media activity or divorce court records. Dissatisfaction with one’s current employment out of a feeling of being underutilized might also be derived from job searching or application behavior on professional networking sites or job boards.
As well as recruiting subjects with direct access to sensitive information, personal data can assist with identifying, recruiting, or leveraging their family members. Data brokers keep records of family relationships, including, including, as one horrified father found out by mistake, the recent deaths of one’s family members.