Data Security & Privacy

black-and-white-city-crosswalkThe security and privacy of sensitive data is extremely important to everyone involved in Social Science One.  Social scientists cannot accomplish their goals of learning about and addressing societal challenges unless they can be trusted with important information about human characteristics, behaviors, and opinions. The opportunities afforded by our partnership with companies, and the valuable information our approach makes possible, make these stakes even more substantial.

As such, Social Science One institutes the following procedures to ensure security and privacy:

  1. All proposals receiving access to data must be pre-approved by university Institutional Review Boards or their international equivalent. Researchers who wish to access and analyze data must have their university be a part of the data-sharing agreement so as to further ensure accountability for the actions of the individual researcher. The peer review process for awarding data access will provide an additional check to ensure that only responsible researchers are granted access.
  2. Our data access procedures are calibrated to the sensitivity of the dataset to be analyzed. For example: for datasets that raise no legitimate privacy concerns, such as highly aggregated data commonly released by technology firms, data may be released publicly or with minimal safeguards. For more sensitive datasets, researchers may need to develop analysis code based on a synthetic data set and submit the code for automated (or manual) execution, and where all data analysis (and literally every keystroke) is subject to audit by us. These procedures effectively change from a regime of individual responsibility, where scholars legally agree to follow the rules and the rest of the community hopes they comply, to one of collective responsibility, where multiple people are always checking and the risk of improper actions by any one individual is greatly limited.
  3. Social science insights are normally about population averages and broad patterns, and for which facts about any one individual are unnecessary and not of interest.  As such, we are exploring the use, for certain types of data, cutting edge work in computer science and statistics, such as on “differential privacy.” This will enable us to make available, in certain situations, modified data sets or statistical analysis procedures that (one can show via mathematical proof) can enable social scientists to discover aggregate patterns with essentially no chance of being able to learn or reveal anything about any one individual. This emerging field holds great promise to both protect privacy for individuals and advance scientific knowledge for all of society. As these techniques have not yet been developed for some types of data we are making available, we have enlisted researchers in this area to modify their techniques to facilitate social media data sharing.