The security and privacy of user data is extremely important to everyone involved in this project. Social scientists cannot accomplish their goals of learning about and ameliorating the challenges of human society to benefit everyone unless they are trusted with information about human characteristics, behaviors, and opinions. The opportunities afforded by this project, and the valuable information it will make available, make these stakes even more substantial.
Social Science One institutes the following procedures to ensure security and privacy.
- All proposals receiving access to data must be pre-approved by university Institutional Review Boards or their international equivalent. In addition, for any researcher who wishes to analyze the data, the researcher's university must be a party to the data sharing agreement to further ensure accountability for the actions of the individual researcher. The peer review process for awarding data access will provide an additional check to ensure that only responsible researchers are granted access.
- Our data access procedures are calibrated to the sensitivity of the dataset to be analyzed. For datasets that raise no legitimate privacy concerns, such as highly aggregated data that is commonly released by technology firms, data may be released publicly or with minimal safeguards. For more sensitive datasets, researchers may need to develop analysis code based on a synthetic data set and submit the code for automated (or manual) execution, and where all data analysis (and literally every keystroke) is subject to audit by us. These procedures effectively change from a regime of individual responsibility, where scholars legally agree to follow the rules and the rest of the community hopes they comply, to one of collective responsibility, where multiple people are always checking and the risk of improper actions by any one individual is greatly limited.
- Social science insights are normally about population averages and broad patterns, and for which facts about any one individual are unnecessary and not of interest. As such, we are exploring the use, for certain types of data, cutting edge work in computer science and statistics, such as on “differential privacy”. This will enable us to make available, in certain situations, modified data sets or statistical analysis procedures that (one can show via mathematical proof) can enable social scientists to discover aggregate patterns with essentially no chance of being able to learn or reveal anything about any one individual. This emerging field holds great promise to both protect privacy for individuals and advance scientific knowledge for all of society. As these techniques have not yet been developed for some types of data we are making available, we will also enlist researchers in this area to modify their techniques to facilitate social media data sharing.