Project Scratchpad
I’ve already started on the work in a blog post: https://crumplab.com/blog/771_GPT_Stroop/
Need to briefly list what I’ve done already.
Then, come up with some goals for moving the project forward.
Blog post summary
Briefly described the Stroop task
Develops some motivation for why I care whether or not LLMs can simulate performance in a Stroop-task
- data-spoofing, such as mturk workers using LLMs to exploit online tasks.
Describe methods
- use the openai API and R to send instructions for a Stroop task, and then individual trials (in text format)
- have the model simulate trial-by-trial responses and reaction times, return them in JSON
- analyse the data and see what happens
The post shows some draft code to simulate a single subject, and to simulate multiple subjects
Got data from 10 simulated subjects
Results showed:
- gpt-3.5-turbo generated data files that showed Stroop effects
- Simulated RTs were different across simulated subjects
- Accuracy was 100%
- RTs looked almost credible. Many individual RTs had 0 ending, and were too round looking.
Modelling To do
Not an exhaustive list, something to get me started.
- Settle on one script that can be extended across the examples.
- Run multiple subjects, say batches of 20-30, which should be enough for the kinds of questions I want to ask
Answer the following basic questions
Does the model produce Stroop effects in RT and accuracy?
Does the model produce different answers for each run of simulated subjects?
Does the model produce RTs that look like human subject RTs?
Does the model produce additional Stroop phenomena without further prompting?
- Congruency sequence effect?
- Proportion Congruent effect?
Can the instruction prompt be used to control how the model simulates performance.
- simulate 75% correct
- [] simulate 50% correct
- simulate some long reaction times
- simulate a reverse Stroop effect.
- [] simulate proportion congruent effects
[] simulate some other tasks
Flanker task
Simon Task
[] SRT (Nissen & Bullemer)
[] Negative Priming
[] Inhibition of Return
[] SNARC effect
Writing to do
interim overview
[] consider in more detail what I’m trying to accomplish here and whether this project turns into something more formal than a shared repo on github
[] Things that make me go hmmm.
- the modeling work is not reproducible
- openAI says the new default is that data sent through the API will not be used to train models, but this is an opt-in option
- I did not opt in, but if someone else did and re-ran this kind of code with feedback to alter the model resposnes, then the results from this kind of work would change
- Unclear what happens with newer models like GPT 4