Interdisciplinarity

Big Data: No Free Lunch for Protecting Privacy Interdisciplinarity
Ahh, for a simpler time, when privacy protections were actually much less robust than they are now!

Big Data: No Free Lunch for Protecting Privacy

July 18, 2014 1101

Ahh, for a simpler time, when privacy protections were actually much less robust than they are now!

Ahh, for a simpler time, when privacy protections were actually much less robust than they are now!

The past and future collided last year for the International Year of Statistics, when six professional organizations celebrated the multifaceted role of statistics in contemporary society to raise public awareness of statistics, and to promote thinking about the future of the discipline. The past came in the form of the 300th anniversary of Jacob Bernoulli’s Ars conjectandi (Art of Conjecturing) and the 250th anniversary of Thomas Bayes’ “An Essay Towards Solving a Problem in the Doctrine of Chances.” The future came in the form of an identity crisis for the discipline brought on by the rise of Big Data.

To cap the year, in November 100 prominent statisticians attended an invitation-only event in London to grapple with the challenges and possible pathways that future presents. Earlier this month, Statistics and Science: A Report of the London Workshop on the Future of the Statistical Sciences, the product of that high-level meeting, was released by the six societies: the American Statistical Association, the Royal Statistical Society, the Bernoulli Society, the Institute of Mathematical Statistics, the International Biometric Society, and the International Statistical Institute.

In the following weeks, Social Science Space will excerpt portions of that report highlighting case studies on the current use of statistics and the challenges the discipline faces, such as the reproducibility crisis. (For a PDF of the full report, click here.)

***

Without a doubt, the most-discussed current trend in statistics at the Future of Statistics Workshop was Big Data. The ubiquity of this phrase perhaps conceals the fact that different people think of different things when they hear it. For the average citizen, Big Data brings up questions of privacy and confidentiality: What information of mine is out there, and how do I keep people from accessing it? For computer scientists, Big Data poses problems of data storage and management, communication, and computation. And for statisticians, Big Data introduces a whole different set of issues: How can we get usable information out of databases that are so huge and complex that many of our traditional methods can’t handle them? …

International Year of Statistics logo

The London report examines a number of challenges and opportunities for the use of statistics, both within the milieu of Big Data –i.e. problems of scale, different kinds of data, privacy and confidentiality – and in the statistics world in general – i.e. reproducibility of scientific research, in controversial arenas like climate change, visualizing results and the rewards for professionals.

The year 2013 was the year when many Americans woke up to the volumes of data that are being gathered about them, thanks to the highly publicized revelation of the National Security Agency’s data-mining program called PRISM. In this climate, public concerns about privacy and confidentiality of individual data have become more acute. It would be easy for statisticians to say, “Not our problem,” but, in fact, they can be part of the solution.

Two talks at the London workshop, given by Stephen Fienberg and Cynthia Dwork, focused on privacy and confidentiality issues. Fienberg surveyed the history of confidentiality and pointed out a simple, but not obvious, fact: As far as government records are concerned, the past was much worse than the present. U.S. Census Bureau records had no guarantee of confidentiality at all until 1910. Legal guarantees were gradually introduced over the next two decades, first to protect businesses and then individuals. However, the Second War Powers Act of 1942 rescinded those guarantees. Block-by-block data were used to identify areas in which Japanese-Americans were living, and individual census records were provided to legal authorities such as the Secret Service and Federal Bureau of Investigation on more than one occasion. The act was repealed in 1947, but the damage to public trust could not be repaired so easily.

There are many ways to anonymize records after they are collected without jeopardizing the population-level information that the census is designed for. These methods include adding random noise (Person A reports earning $50,000 per year and the computer adds a random number to it, say –$10,000, drawn from a distribution of random values); swapping data (Person A’s number of dependents is swapped with Person B’s); or matrix masking (an entire array of data, p variables about n people, is transformed by a known mathematical operation—in essence, “smearing” everybody’s data around at once). Statisticians, including many at the U.S. Census Bureau, have been instrumental in working out the mechanics and properties of these methods, which make individual-level information very difficult to retrieve.

Cryptography is another discipline that applies mathematical transformations to data that are either irreversible, reversible only with a password, or reversible only at such great cost that an adversary could not afford to pay it. Cryptography has been through its own sea change since the 1970s. Once it was a science of concealment, which could be afford¬ed by only a few—governments, spies, armies. Now it has more to do with protection, and it is available to everyone. Anybody who uses a bank card at an ATM machine is using modern cryptography.

One of the most exciting trends in Big Data is the growth of collaboration between the statistics and cryptography communities over the last decade. Dwork, a cryptographer, spoke at the workshop about differential privacy, a new approach that offers strong probabilistic privacy assurances while at the same time acknowledging that perfect security is impossible. Differential privacy provides a way to measure security so that it becomes a commodity: A user can purchase just as much security for her data as she needs.

Still, there are many privacy challenges ahead, and the problems have by no means been solved. Most methods of anonymizing do not scale well as p or n get large. Either they add so much noise that new analyses become nearly impossible or they weaken the privacy guarantee. Network-like data pose a special challenge for privacy because so much of the information has to do with relationships be¬tween individuals. In summary, there appears to be “no free lunch” in the tradeoff between privacy and information.


Related Articles

Megan Stevenson on Why Interventions in the Criminal Justice System Don’t Work
Social Science Bites
July 1, 2024

Megan Stevenson on Why Interventions in the Criminal Justice System Don’t Work

Read Now
How ‘Dad Jokes’ Help Children Learn How To Handle Embarrassment
Insights
June 14, 2024

How ‘Dad Jokes’ Help Children Learn How To Handle Embarrassment

Read Now
How Social Science Can Hurt Those It Loves
Ethics
June 4, 2024

How Social Science Can Hurt Those It Loves

Read Now
Digital Scholarly Records are Facing New Risks
Research
May 21, 2024

Digital Scholarly Records are Facing New Risks

Read Now
Analyzing the Impact: Social Media and Mental Health 

Analyzing the Impact: Social Media and Mental Health 

The social and behavioral sciences supply evidence-based research that enables us to make sense of the shifting online landscape pertaining to mental health. We’ll explore three freely accessible articles (listed below) that give us a fuller picture on how TikTok, Instagram, Snapchat, and online forums affect mental health. 

Read Now
New Fellowship for Community-Led Development Research of Latin America and the Caribbean Now Open

New Fellowship for Community-Led Development Research of Latin America and the Caribbean Now Open

Thanks to a collaboration between the Inter-American Foundation (IAF) and the Social Science Research Council (SSRC), applications are now being accepted for […]

Read Now
Civilisation – and Some Discontents

Civilisation – and Some Discontents

The TV series Civilisation shows us many beautiful images and links them with a compelling narrative. But it is a narrative of its time and place.

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments