GitHub - PERC-Lab/GDPR_discussion_data

This repository contains the datasets used in the survey analysis and the Reddit post analysis.

The survey/ folder contains several files related to the survey portion of the paper.

survey_questions.pdf contains a complete list of survey questions.
survey_dataset.csv contains the full dataset of responses.
survey_analysis.ipynb contains the scripts used for all of the statistical analyses.
annotated_job_titles.csv contains the manual annotation of job titles provided from the survey.

The post_analysis/ folder contains several folders related to the Reddit post analysis:

/final_data/ contains our final post results from each step. Our fully labeled dataset of 666 posts is contained in the file titled final_posts_666.json, whereas raw_dataset_2986.json contains the full set of extracted posts, and privacy_dev_dataset_2248.json contains all posts checked for their relation to privacy and development, both manually and via GPT-5. The two folders, /GDPR_challenges/ and /privacy_dev/, contain the ground truth files, the final predicted LLM files, and the LLM predictions after manual corrections.
/prompts/ contains our prompts for each step (zeroshot, fewshot, and Chain-of-Thought), with the CoT prompt in each subfolder being our final prompt used.
/results/ contains the raw results of each automated labeling step for each model. The training data for each step was tested on multiple prompts (i.e., zero-shot, few-shot, and Chain-of-Thought) and contains the respective results for each of the 8 models, while the test data contains the results for the final CoT prompt used.
/samples/ contains the individual sample files used to train and test the models in each stage.
scripts/ contains all relevant scripts, including those used to run each model as well as those used for statistics.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
post_analysis		post_analysis
survey		survey
ARTIFACT-APPENDIX.md		ARTIFACT-APPENDIX.md
LICENSE		LICENSE
README.md		README.md

Provide feedback