Project Scratchpad

Author
Affiliation

Matthew J. C. Crump

Brooklyn College of CUNY

Published

June 27, 2023

Abstract
Scratchpad for thinking through the project.

I’ve already started on the work in a blog post: https://crumplab.com/blog/771_GPT_Stroop/

Need to briefly list what I’ve done already.

Then, come up with some goals for moving the project forward.

Blog post summary

  • Briefly described the Stroop task

  • Develops some motivation for why I care whether or not LLMs can simulate performance in a Stroop-task

    • data-spoofing, such as mturk workers using LLMs to exploit online tasks.
  • Describe methods

    • use the openai API and R to send instructions for a Stroop task, and then individual trials (in text format)
    • have the model simulate trial-by-trial responses and reaction times, return them in JSON
    • analyse the data and see what happens
  • The post shows some draft code to simulate a single subject, and to simulate multiple subjects

  • Got data from 10 simulated subjects

  • Results showed:

    • gpt-3.5-turbo generated data files that showed Stroop effects
    • Simulated RTs were different across simulated subjects
    • Accuracy was 100%
    • RTs looked almost credible. Many individual RTs had 0 ending, and were too round looking.

Modelling To do

Not an exhaustive list, something to get me started.

  • Settle on one script that can be extended across the examples.
  • Run multiple subjects, say batches of 20-30, which should be enough for the kinds of questions I want to ask

Answer the following basic questions

  • Does the model produce Stroop effects in RT and accuracy?

  • Does the model produce different answers for each run of simulated subjects?

  • Does the model produce RTs that look like human subject RTs?

  • Does the model produce additional Stroop phenomena without further prompting?

    • Congruency sequence effect?
    • Proportion Congruent effect?
  • Can the instruction prompt be used to control how the model simulates performance.

    • simulate 75% correct
    • [] simulate 50% correct
    • simulate some long reaction times
    • simulate a reverse Stroop effect.
    • [] simulate proportion congruent effects
  • [] simulate some other tasks

  • Flanker task

  • Simon Task

  • [] SRT (Nissen & Bullemer)

  • [] Negative Priming

  • [] Inhibition of Return

  • [] SNARC effect

Writing to do

  • interim overview

  • [] consider in more detail what I’m trying to accomplish here and whether this project turns into something more formal than a shared repo on github

  • [] Things that make me go hmmm.

    • the modeling work is not reproducible
    • openAI says the new default is that data sent through the API will not be used to train models, but this is an opt-in option
    • I did not opt in, but if someone else did and re-ran this kind of code with feedback to alter the model resposnes, then the results from this kind of work would change
    • Unclear what happens with newer models like GPT 4