While the project to facilitate the study of Facebook data to better understand the role of social media on elections and democracy has taken longer to build the privacy-preserving tools than expected, we are now making steady progress.
We are pleased to share that Facebook has made available for research access to a new (“differentially private”) dataset through Social Science One. The data set contains all URLs shared publicly more than 100 times on Facebook between 1/1/2017 and 2/19/2019. This dataset contains about 32 million URLs and 544 million cell values.
The dataset only includes links shared publicly and variables related to how many times these links were shared publicly. For each url, the dataset provides information about whether it was fact-checked, the number of users who labeled it fake news, spam or hate speech; how many times the url was shared publicly; how many times it was shared publicly without being clicked on; country in which it was most shared. It does not yet include, for example, the number of times users shared public links privately with their friends; when proper privacy-preserving technology is in place, we expect to make those data available as well.
Moreover, because these datasets meet the standard of differential privacy, which provides mathematical assurances of privacy for individuals, researchers will be unable to identify any individual, or any individual’s information or behavior on Facebook. Yet, even with this high level of privacy protection, scholars will be able to answer important questions on the breadth and penetration of misinformation on Facebook.
These data offer researchers an opportunity to learn a great deal about communication on social media, and how information travels on Facebook -- another step in our long path toward providing industry data for legitimate academic research. In the meantime, we will continue building the infrastructure and working with the community to provide even more valuable data to researchers around the world.
For more information on the dataset, see here. For information on how to access this data, see our RFP.