February 2020

Unprecedented Facebook URLs Dataset now Available for Academic Research through Social Science One

Gary King and Nathaniel Persily

We are excited to announce that Social Science One and Facebook have completed, and are now making available to academic researchers, one of the largest social science datasets ever constructed. We processed approximately an exabyte (a quintillion bytes, or a billion gigabytes) of raw data from the platform.  The dataset itself contains a total of more than 10 trillion numbers that summarize information about 38 million URLs shared worldwide more than 100 times publicly on Facebook (between 1/1/2017 and 7/...

Read more about Unprecedented Facebook URLs Dataset now Available for Academic Research through Social Science One

Analyzing Data From Facebook

The URLs Light dataset, which we have released, and the much larger URLs Full Dataset, which will be on its way soon, are protected by the principles of "differential privacy". We implemented differential privacy by adding specially calibrated noise to each dataset. The noise guarantees that individuals who may be represented in the data cannot be reidentified, and any clicks, shares, or others actions cannot be associated with any one person.  Despite the noise, differential privacy makes it possible for statistical analysts to learn social science patterns from the... Read more about Analyzing Data From Facebook