Skip to content

PERC-Lab/GDPR_discussion_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the datasets used in the survey analysis and the Reddit post analysis.

The survey/ folder contains several files related to the survey portion of the paper.

  1. survey_questions.pdf contains a complete list of survey questions.

  2. survey_dataset.csv contains the full dataset of responses.

  3. survey_analysis.ipynb contains the scripts used for all of the statistical analyses.

  4. annotated_job_titles.csv contains the manual annotation of job titles provided from the survey.

The post_analysis/ folder contains several folders related to the Reddit post analysis:

  1. /final_data/ contains our final post results from each step. Our fully labeled dataset of 666 posts is contained in the file titled final_posts_666.json, whereas raw_dataset_2986.json contains the full set of extracted posts, and privacy_dev_dataset_2248.json contains all posts checked for their relation to privacy and development, both manually and via GPT-5. The two folders, /GDPR_challenges/ and /privacy_dev/, contain the ground truth files, the final predicted LLM files, and the LLM predictions after manual corrections.

  2. /prompts/ contains our prompts for each step (zeroshot, fewshot, and Chain-of-Thought), with the CoT prompt in each subfolder being our final prompt used.

  3. /results/ contains the raw results of each automated labeling step for each model. The training data for each step was tested on multiple prompts (i.e., zero-shot, few-shot, and Chain-of-Thought) and contains the respective results for each of the 8 models, while the test data contains the results for the final CoT prompt used.

  4. /samples/ contains the individual sample files used to train and test the models in each stage.

  5. scripts/ contains all relevant scripts, including those used to run each model as well as those used for statistics.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors