Update from Gary King and Nate Persily

When we created Social Science One to facilitate access for the world’s social scientific community to social media data, we promised to release periodic updates noting our progress and describing the challenges we confront.  In this post, we describe the substantial work accomplished over the past several months and highlight the remaining obstacles we face. We describe additional datasets to be made available in the coming months, and plans to announce the first group of researchers who will be granted (privacy-preserving) access to Facebook data and foundation funding through our partnership with the Social Science Research Council.  We also detail the important legal, technical, organizational, computational, privacy, and security challenges that have occupied our work to date. Despite these challenges, we believe we are building a firm foundation for a multi-year effort to investigate fundamental questions of social media’s impact on democracy around the world, which we hope can be expanded to other critical areas of research.

 

I.  An Update on the Progress of Social Science One

On July 11, 2018, we launched Social Science One as an unprecedented partnership between social scientists and private industry to advance the goals of understanding and solving society’s greatest challenges.  We embarked on a project with Facebook first because that social media platform, the largest in history, has been at the center of so many fundamental questions concerning the way the new communication ecosystem profoundly affects elections and democracy.  Of course, we recognized that working with Facebook would invite heavy scrutiny, given the maelstrom of controversy on many fronts that has engulfed the company since the 2016 election, not the least of it for the Cambridge Analytica scandal, which was an academic scandal as well.  We hoped, however, that rigorous and careful scientific analysis of Facebook data, without funding from or pre-publication approval by Facebook, would provide valuable independent assessment of the conventional wisdom as to the platform’s varied effects on elections and democracy around the world.  We also hoped that we could prove the model we had developed for industry-academic partnerships and show how company data could be made accessible in a legal, trusted, privacy-preserving, and secure fashion that benefits everyone. The potential benefits for the social sciences, and for society at large, are so large that getting this right is critical.

We have made important progress on all relevant fronts:

  • We initially secured funding from seven ideologically diverse nonprofit foundations and are happy to announce that we have recently added an eighth.  

  • We have appointed over 80 academics from around the world as advisors for this project, and continue to expand our contacts.

  • We have held meetings with academics and others in Johannesburg, Taipei, Singapore, Amsterdam, and Boston to gather feedback on the project.

  • We have worked tirelessly with a new team of Facebook employees assigned to this project to investigate the kinds of datasets in the company that can be made accessible in a secure, privacy-protected fashion. Our agreements with Facebook ensure that Social Science One can decide itself which datasets will be useful for research and ethically appropriate, while Facebook’s legal team can decide what they are permitted to do from a privacy perspective.

  • We have hired two teams of lawyers, enlisted a half dozen other teams of lawyers from several universities, and have negotiated (and continue to negotiate) the many legal agreements necessary for the project’s long term success.

  • In partnership with the Social Science Research Council, we released a Request for Proposals (RFP) for research on a dataset that would contain information about “URLs” shared publicly on Facebook -- one of the largest data sets ever available to researchers in the social sciences. We believe this will be the first of many. The SSRC then organized a peer review process for us to help determine with us at Social Science One which researchers would receive access and grants based on separate merit and ethical reviews.

  • We are close to being able to announce the first set of research teams approved for financial awards and data access. The merit, ethical, and other forms of review have been mostly completed, but we need to withhold a public announcement until we complete the remaining legal negotiations with Facebook. For example, these researchers and their universities must sign a data use agreement that we are presently negotiating with Facebook.  Grants cannot be awarded, of course, until we are prepared to explain the conditions of the awards. Our current forecast would be at least a month from now, but as with all of our predictions to this point, it is subject to change based on events outside of our control.

 

We are also able to announce today that the approved researchers will receive (1) funding, (2) immediate access to an initial, simplified version of the promised URLs dataset (described in greater detail below), (3) eventual access to the full URLs dataset they applied to analyze (or potentially an even richer version), and (4) immediate access to two data-rich APIs. These researchers did not apply for access to the APIs or other datasets, but we have secured access to them and believe these researchers will be able to make good use of them in learning about the effects of social media on elections and democracy. They include:

  • Crowdtangle API. Crowdtangle is a platform used by many media companies  around the world, allowing analysts to track the popularity of news items and other public postings across multiple platforms.  The Crowdtangle API will allow researchers to access both Facebook and Instagram data.
     

  • Ad Archive API.  Facebook has recently developed an archive of all political and issue ads run on Facebook since May 2018.  

Facebook will provision API access to grant winners and other researchers selected by Social Science One. Complete documentation for both APIs, and associated RFPs for interested researchers, will be available at SocialScience.one.  

 

II.  Ongoing Challenges

We knew when we undertook this project that our work would be difficult, sensitive, complicated, and sometimes even controversial.  Both before and during our work, Facebook has been on the receiving end of criticism from the press, civil society, and governments around the world.  It has not helped our work that in the six months during which we have been trying to get our project off the ground, stories in the press -- concerning privacy violations, security breaches, election interference, internal decision making scandals, social media-inspired violence in parts of the world, among others -- have appeared on the front pages almost on a weekly basis.  Complicating matters is the obvious fact that almost every part of our project has never even been attempted before.

We appreciate the support and commitment of the Facebook team throughout the foundational stages of establishing this program. They have been working as hard as we have to try to find ways to aid the academic community in investigating critical questions about social media’s impact on democracy around the world.  In order for this project to get off the ground, a variety of teams, including legal, privacy, security, and many others must all agree to move forward at each critical stage, and many issues involve cross-functional decisions requiring multiple members of the senior leadership team to weigh in. We are charting a path forward and making progress, but, despite the hard work of all those involved, we continue to confront difficult obstacles posed by the legal environment in which we are operating, especially as it relates to privacy and security.

The legal and regulatory issues surrounding our goals are considerable. Multiple teams of lawyers have worked tirelessly over many months to craft unprecedented agreements.  These agreements establish legal bilateral relationships Facebook, Social Science One, SSRC, the eight foundations supporting our work, and the Institute for Quantitative Social Science at Harvard University which is incubating Social Science One and housing its staff.  Additionally, after many months, we have crafted a data use agreement for the researchers’ universities to provide a contractual layer of privacy protection and security. Given the numerous national legal regimes that govern our project, as well as the various areas of law in all the countries implicated, these agreements have taken months to complete.

The area of law most central to our project is, of course, privacy.  Facebook not only must comply with the new General Data Protection Regulation of the European Union, as well as similar privacy laws in jurisdictions around the world, but it must also comply with Facebook’s consent decree with the U.S. Federal Trade Commission.  Because of the unprecedented nature of our project, Facebook is moving slowly and cautiously to ensure that our project complies with all relevant legal guidelines. Given the controversies over user data that have plagued the company and the related strict oversight this has engendered from regulatory authorities around the world, this approach is understandable.  However, for those of us hoping for fast answers to the great and pressing questions concerning the effect of social media on contemporary democracy, the slow pace at which the company is willing to provide data access to researchers is a source of frustration.

The immediate issue is that Facebook has determined that it cannot, at present, deliver the complete URLs dataset that it promised as part of the RFP announced over the summer. We are obligated to report to the public if Facebook reneges on its original agreement with Social Science One, such as by trying to keep data from researchers that would embarrass the company, but we do not believe that is the issue here. Instead, it turned out privacy experts at Social Science One, along with  Facebook staff, discovered more privacy-preserving and scalable ways of making data access available to outside researchers than had existed before. We agreed with Facebook that making this considerable investment was necessary for the long-term success of our project. It took Facebook longer than we would have liked, and they had originally indicated, to make this investment and implement these solutions by building the necessary systems and tools. We are happy to report that Facebook is now in the midst of a major project to construct an entirely new set of security and privacy systems, including cutting edge differential privacy tools, for data access by researchers. Social Science One includes some of the world’s foremost experts on these issues and are advising Facebook on aspects of this project. Once this system is built and tested by security experts, it will be useful not only for the full URLs data set, but for a long sequence of data sets we have queued up for release after that.

Therefore, we plan to release access to data for approved researchers in two stages instead of all at once.  First, a “URLs-light” version will include all URLs with a certain number of public shares but will not have exposure data – that is, information about the types of people who saw the URL.  Rather, it will only include information about the URL itself, such as whether it was fact-checked, etc. Then, when the full computational systems are available, we hope to release a full URLs dataset, which is the data as originally promised in the RFP; our prediction as of now is that this will occur this summer (a prediction we will update along the way). As of this moment, it appears that the full URLs data set we will eventually be able to make available will enable researchers to conduct much richer analyses than originally intended.

Even before Facebook finishes this new privacy and security infrastructure, approved researchers can begin their research by accessing the APIs described above. Researchers will be able to work with the URLs-light dataset, as well as a synthetic dataset that has the same structure as the full URLs dataset without individual data so that they can prepare their code ahead of time. We hope that working with these additional datasets will ensure that researchers will be fully prepared to run their analyses on the new system when it is ready.  In the interim, much of the research contained in the submitted proposals can be conducted on the APIs and modified datasets now made accessible.

This infrastructure is also critical for another important part of our project: combining survey results with Facebook data.  We continue to have as a goal of this project to facilitate the analysis of Facebook data alongside the data from a wide range of major academic surveys. Once the data are joined, researchers would be able to examine the relationships between information exposure on Facebook and certain survey items.  We have begun to prepare consent language for our first experiments in the field, which we hope to implement for many surveys conducted in the run up to various elections around the world. (We especially appreciate the numerous survey organizations around the world with whom we have negotiated about joining Facebook data with their survey data.) However, whether we can provide such data as part of our project is contingent both on the development of the infrastructure detailed above and, of course, participation from respondents.

We hope we have conveyed the enthusiasm we have for this first project of Social Science One and the work we have accomplished to make our vision a reality. We deeply appreciate the overwhelming support we have received from researchers around the world. We would like to speed the process, but everyone involved recognizes the priority of getting this done right.  Our hope is that the problems solved in these early stages not only inure to the long term benefit of our project, but will chart a path for similar research efforts going forward. We continue to believe in the critical importance of opening up access for researchers to the most important information private companies possess on the nature of modern society and social interaction.  To do so, however, we need to continue to build from scratch our new research model. We have only just begun, but we think we have the building blocks in place to meet the goals we set for ourselves when we envisioned this ambitious project only half a year ago.