Analyzing Data From Facebook

The URLs Light dataset, which we have released, and the much larger URLs Full Dataset, which will be on its way soon, are protected by the principles of "differential privacy". We implemented differential privacy by adding specially calibrated noise to each dataset. The noise guarantees that individuals who may be represented in the data cannot be reidentified, and any clicks, shares, or others actions cannot be associated with any one person.  Despite the noise, differential privacy makes it possible for statistical analysts to learn social science patterns from the same data. However, unless special statistical procedures are used, the noise (which analysts would experience as measurement error) can cause statistical results to be attenuated, exaggerated, have the wrong sign, or have incorrect standard errors.  To ensure that researchers can easily avoid these problems, we have developed a new set of statistical methods and written a paper that describes how they work. Facebook is working on implementing these methods so they work fast at the scale needed for these enormous datasets.  For a copy of the paper, please see “Statistically Valid Inferences from Differentially Private Data Releases” (