A Reviewer Asked You to Make Your Materials Open. Now What?
by Steve Haroz, Lewis Chuang, Matthew Kay, and Chat Wacharamanotham
“Research transparency is of utmost importance” - CHI guide to a successful submission
“Lack of transparency in the way research results are reported can be a ground to doubt the contribution” - CHI guide to reviewing
You’ve just gotten reviews back for your paper. Some are positive. But a reviewer wants you to “make all your empirical replication and computational reproducibility materials public”. What does that even mean? Why are they asking for it? How can you respond in a rebuttal???
THE CHECKLIST: What should you provide?
Data collection procedures: Did you run a study? Did you scrape data?
- Experiment presentation code
- Data scraping code
- Experiment protocol
The Data: Are your conclusions based on collected or analyzed data?
- Raw data, not aggregated: It is important to separate the data from the analysis. Aggregation and transformation is part of the analysis, so make sure to share the data in as raw a form as is feasible.
- (optional) Also share the aggregated data if it can help clarify the analysis.
- A data dictionary, which describes the columns or variables in the data.
Analysis and computation: Did you perform any analysis on data?
- Every part of the computational analysis to ensure that numbers and figures in the paper can be reproduced from your data exactly.
- A clear description of any code books.
A prototype or system:
- Any prototype or system code and instructions for running them.
- Any details or plans needed to recreate hardware prototypes.
How to share it
1) Here’s a short walkthrough to deposit your material on the Open Science Framework (OSF), an open and persistent repository. Alternatively, you might prefer zenodo.org, a university repository, or one of the FAIR-compliant repositories at re3data.org.
2) For the rebuttal, provide the URL or DOI. We could not find any written rule on the CHI website suggesting that URLs in the rebuttal were forbidden. For the revised paper, put the URL or DOI in the abstract to ensure that readers and reviewers won’t miss it. Abstracts are outside the paywall, so it broadens availability of your materials.
3) Provide an index or readme to explain what is where in the repository. Some repositories can become large and messy, and a simple readme makes the repository accessible. The wiki on OSF projects is a good place for this.
What about PCS or Github?
Sure, it’s intuitive to use PCS’s supplemental materials or just take the Github repo you may have already, but these repositories do not comply with the FAIR openness principles. Here are the important points:
Free and accessible: Many publishers lock supplemental materials behind a paywall, and while ACM’s supplemental materials are open, the flat file hierarchy is not discoverable or accessible.
Immutability: It is possible to delete a Github repository and upload a new one with the same name. This is not possible with OSF and Zenodo. The provided URL or DOI for the repository will always point to only that repository. Don’t worry, you can upload versioned updates thereafter, but your readers will always know when you did.
Persistence: Both OSF and Zenodo have long-term plans to ensure that the material will be available for decades; even if they run out of funding.
Why are reviewers asking for these materials?
1) To assess “technical accuracy”. If a reviewer has some questions about the analysis calculations or experiment implementation, they may need to look at the material to answer those questions. Were the degrees of freedom calculated correctly? Was any of the data not analyzed? Did the experiment code have bugs in it?
A reviewer may want to check one particular aspect where mistakes are common or where they have a particular concern. Providing comprehensive replicability and reproducibility materials can ensure that reviewers are able to check aspects of concern and help give reviewers more confidence in the correctness and reliability of the work.
2) To assess “clarity of exposition”. Written methods are rarely sufficient to ensure that future readers can perform a thoroughly accurate replication of all procedures. How exactly were the questions in a questionnaire phrased? What did initial training entail? What exactly was the presentation timing? What exactly were the stimuli? How did the response options look?
Clarity is judged not just for the reviewer, but also for future readers. Including replication and reproducibility materials provides a more complete description of your methods and results, making it easier for future readers to understand your work, and potentially even adopt your methods (increasing the likelihood they might cite you, too!).
3) A pledge to improve norms. When trying to replicate, extend, or apply past publications, many people have found authors to be frustratingly unresponsive or unforthcoming [1, 2, 3] with critical details not specified in the original paper. Unwilling to review submissions that inhibit future research, some have pledged not to accept submissions that do not share data and material on an accessible persistent repository, unless the submission explicitly states why the material cannot be shared.
4) Credibility. Some fields have taken serious hits to their credibility due to calculation errors or fabricated data that could have been easily confirmed if the data and code were available for scrutiny. Undetected invalidity affects the entire field, including the reviewer’s publications, so there is incentive to ensure that all publications can be scrutinized and replicated in the future.
But there are reasons why I can’t!
The general goal is to be as transparent as possible.
- Share what you can.
- If anything can’t be shared, explain why in paper.
I cannot reveal my identity. CHI is double-blind.
OSF allows you to create an anonymous view of a project that removes names from the project metadata. For instructions, see Creating a View-only Link for a Project. In the rebuttal, authors can submit an OSF link.
Note the “Anonymize” checkbox (#2):
My data includes identifiable information about subjects. Or I collected video recordings or transcripts of participants.
Share as much as you can without the identifiable information. For the rest, a Protected-Access Repository allows you to make your data available only to those who go through an IRB approval process to access it. Such repositories ensure that only approved researchers can get access, but the data is not lost if the authors have something unfortunate happen.
Here is a list of Protected-Access Repositories. We recommend Databrary, which stores all kinds of data and is particularly adept at storing audio or video recordings of subjects.
My code and materials are messy
Conforming with a specific style guide or quality standard is ideal but not necessary. What is important is that the materials are technically accurate and clear enough for a reviewer or reader to understand them.
Any other questions?
Slack: You are welcome to join the slack on Transparent Research to discuss transparent research practices and ask/answer questions. Just send an email to email@example.com
Twitter: @ transprnt_rsrch
Endorsed by: Lonni Besançon (Monash University), Alan Dix (Swansea), Pierre Dragicevic (INRIA), Cody Dunne (Northeastern University), Florian Echtler (Aalborg University), Shion Guha (Marquette University), Jessica Hullman (Northwestern University), Luiz Morais (INRIA), Lace Padilla (UC Merced), Briane Paul V. Samson (De La Salle University), Theophanis Tsandilas (INRIA, Université Paris-Saclay), Jan B. Vornhagen (Aalto University)