Using GPT for writing in R Studio

LLMs
OpenAI
Rstats
RStudio
I’m spending the morning writing addins for RStudio that interface with the OpenAI API. This should make it possible to integrate GPT as a writing aid inside RStudio.
Author

Matt Crump

Published

June 7, 2023

Preamble

I do most of my writing inside RStudio using Quarto documents. This means most of my documents are plain text using a combo of markdown and R code chunks that I export to multiple formats like this blog.

I’ve been curious about using LLMs as an everyday writing assistant for a while. I’ve occasionally used the OpenAI playground and ChatGPT to give editorial suggestions about my writing, but I found the experience of bouncing in and out of my main writing environment too clunky.

This semester notion.so became another writing environment that I used heavily (also markdown-ish), and I tried their in-app AI writing assist for a hot second. But, they wanted $10 a month, and I didn’t want to pay them.

However, the in-app AI assist experience with notion was not clunky. Just highlight some text, right click, and then select variously useful AI editor prompts to do thinks like: line edit for clarity, expand bullet points to paragraph, convert paragraph to bullet points, generative writing, etc.

After a bit of messing around yesterday, it became clear that it wouldn’t be too hard to write addins for RStudio to interface with the OpenAI API, and achieve some kind of in-app writing assistant situation while I’m writing in Quarto.

This post is me doing that along with laying down some breadcrumbs so my future self can remember what I did here.

What you’ll need

  1. An OpenAI account that allows you to create a secret key to access the API service.

  2. R Studio with Quarto (comes with the latest version)

  3. I’m using the openai R library by Iegor Rudnytskyi, which wraps the API for R and makes sending inputs and receiving outputs very easy.

Pricing

I haven’t crunched the numbers too much yet. You can check out the pricing model here https://openai.com/pricing.

My napkin calculator says this should be cheaper than $10/month on notion. Currently, the gpt-3.5-turbo model is $0.002 for 1,000 tokens (about 750 words). It would take 5,000 interactions that use up 1,000 tokens each to get to $10 a month. That’s about 750 words times 5000, or 3,750,000 words… or, way more words than I type monthly. Plus, it’s all pay as you go, and it is possible to set spending limits if you don’t want to go over some amount. Bottom line, cheaper than notion and more customizable.

NOTE: The day is done and combined with this post and another one where I sent a whole bunch of requests and used many a token, the cost for all of this heavy personal activity was $0.05.

There are other costs to consider, such as whether or not you want to send your words to OpenAI.

Basic workflow

Setting up the API keys

  1. Create a secret key in openai, copy it and put it somewhere safe. Don’t share it.
  2. The openai R library looks for an environment variable called OPENAI_API_KEY, which can be set a number of ways.

My goal is to create addins for RStudio to use while I’m writing. I don’t want to input an API key everytime I use the addin, or disclose it in a public repository. There are different strategies for managing environment variables, and this is a helpful guide by posit.

My strategy was to add a key-value pair directly to the .Renviron file at the user level.

# open the .Renviron file
usethis::edit_r_environ()

Then edit the file to add the new key-value pair, and restart R.

# add new environment variable to above and save
OPENAI_API_KEY=XXXXXXXXXXXXX

After you restart R, you can check if R recognizes the new environment variable:

Sys.getenv("OPENAI_API_KEY")

Test the API

If everything is working it should be possible to send prompts to OpenAI and return outputs.

library(openai)

gpt <- create_completion(
    model = "ada",
    prompt = "Explain what is larger a pumpkin or a dinosaur?"
)

The result is an R list object with the output and other info like the number of tokens used.

$id
[1] "cmpl-7OnnSJ0QgpVF4a86qKYmSWgSvAiym"

$object
[1] "text_completion"

$created
[1] 1686145646

$model
[1] "ada"

$choices
                                                    text index logprobs finish_reason
1 \n\nAnswer: I don't know of an easy way to express it.     0       NA        length

$usage
$usage$prompt_tokens
[1] 11

$usage$completion_tokens
[1] 16

$usage$total_tokens
[1] 27

Writing addins for R Studio

The next step is to write custom addins for RStudio, which is documented here. RStudio addins are available from a dropdown menu and/or triggered by hot-key macros.

The steps for writing addins are:

  1. Make a new R package project
  2. Write functions as you normally would do in the R folder
  3. Register the functions as addins using a .dcf file placed in inst/rstudio/addins.dcf
  4. Install the R package
  5. The addins should now be available for use.

My goal is to start by writing an addin that can take highlighted text in the RStudio editor, send it to openai, and return an output that would do something useful like improving the writing for clarity, flow, and length.

gptaddin

I went ahead and created an R package called gptaddin. It has one working function (as of right now).

Repo: https://github.com/CrumpLab/gptaddin

pkgdown website: https://crumplab.com/gptaddin/

I’m sharing this in case the development process is useful for other people to make something similar for themselves.

line_editor()

The first function I wrote is a basic line editor. I wanted to be able to highlight text within RStudio and send it to an LLM for clarity, flow, and length suggestions.

Here’s the function:

line_editor <- function(){

  # get selected text
  selected_text <- rstudioapi::selectionGet()

  # run the api call to openai
  gpt <- openai::create_chat_completion(
    model = "gpt-3.5-turbo",
    messages = list(
      list(
        "role" = "system",
        "content" = "You are an editorial writing assistant. Your task is to edit the content you receive for improved clarity, flow, and reduced word count."
      ),
      list(
        "role" = "user",
        "content" = selected_text$value
      )
    )
  )

  # print the results to the console
  print(gpt$choices$message.content)
}

This function is located in R/addins.R, and it is registered as an addin in inst/rstudio/addins.dcf. The registration text looks like:


Name: line_editor
Description: openai edits text for clarity, flow, and brevity
Binding: line_editor
Interactive: false

With the package installed, I can now see line_editor() in the dropdown menu for Addins in Rstudio. When I select text and run the addin, I get edits suggested by gpt-3.5-turbo printed to the console. For example, I will take the text in this paragraph and run the addin.


“Upon installing the package, line_editor() now appears in the Addins dropdown menu in Rstudio. Selecting text and running the addin produces suggested edits from gpt-3.5-turbo, which are then printed to the console. For instance, I can utilize this addin to edit the text within this paragraph.”


This seems to work. I guess I should add a message in the console so that it looks like something is happening…Added.

points_to_prose()

The purpose of this writing aid is to take a series of point form statements and transform them into prose.

points_to_prose <- function(){


  # get selected text
  selected_text <- rstudioapi::selectionGet()

  message("Submitting query...")

  # run the api call to openai
  gpt <- openai::create_chat_completion(
    model = "gpt-3.5-turbo",
    messages = list(
      list(
        "role" = "system",
        "content" = "You are a writing assistant. Your task is to take point form notes and convert them into longform prose. Please write in a straightforward and clear manner. The user prompt will contain point form notes."
      ),
      list(
        "role" = "user",
        "content" = selected_text$value
      )
    )
  )

  # print the results to the console
  print(gpt$choices$message.content)
}

Let’s test this out. Here are some point form notes:

  • Introduce the Computational cognition lab at Brooklyn College

  • Describe lab research interests, which include learning, memory, and attention

  • Elaborate with a few example questions about cognition that relate to real world issues.

The longform prose that was generated is below… It’s, well, it said a bunch of stuff I wouldn’t say. Not sure how much I would use this, could also depend on the prompt I’m using. In any case, the basic tool is there as an addin.


The Computational Cognition Lab at Brooklyn College is dedicated to conducting research on learning, memory, and attention, among other cognitive processes. To achieve its goals, the lab uses a combination of behavioral experiments, computational modeling, and brain imaging techniques. In terms of research interests, the lab focuses on the following questions:

  1. How does learning occur in real-world settings, and how can we optimize it? The lab uses virtual environments to simulate complex learning scenarios, such as learning how to navigate a new city or operate complex machinery.

  2. How do we consolidate memories over time, and how can we improve this process? The lab investigates how information is stored in long-term memory and how we can enhance memory retention through various interventions, such as sleep and mindfulness practices.

  3. How do we selectively attend to certain stimuli while filtering out others, and how can we improve this process? The lab examines how attentional processing occurs in the brain, including its neural mechanisms and its impact on real-world decision-making.

Overall, the Computational Cognition Lab at Brooklyn College seeks to contribute to our understanding of how cognitive processes work and how they can be optimized to improve everyday life. By using cutting-edge techniques and exploring real-world questions, the lab hopes to make meaningful contributions to the field of cognitive psychology and beyond.


summarize_main_points()

This addin should take a paragraph input and return a series of point form notes. I’m going to stop printing out each function that I add to the package (which can be found here https://github.com/CrumpLab/gptaddin/blob/master/R/addins.R). But, I will copy out the prompts I’m using:

The prompt for this is:

“You are a writing assistant. Your task is to summarize input text into clear pointform notes.”

Let’s try it out on the following paragraph (taken from my lab website):

The Computational Cognition Lab at Brooklyn College of CUNY investigates learning, memory, attention, performance, and semantic cognition. We ask questions like how do people learn new skills such as playing an instrument or typing on a keyboard? How does memory work? How do people learn the meaning of words? How do people learn patterns and regularities in their environment? These kinds of questions are investigated using behavioral experiments and computational modeling of cognitive processes. The lab also uses a variety of computational methods and tools to conduct and communicate research.


  • Computational Cognition Lab at Brooklyn College of CUNY studies learning, memory, attention, performance, and semantic cognition.

  • Lab investigates how people learn new skills, how memory works, how people learn the meaning of words, and how people learn patterns and regularities in their environment.

  • Research conducted using behavioral experiments and computational modeling of cognitive processes.

  • Lab uses various computational methods and tools to conduct and communicate research.


completion_ada()

This addin uses the openai::create_completion() function. The model choice is text-ada-001, and the max number of tokens is 256. Highlighted text in the RStudio editor is taken as the prompt, the output is sent to the console.

I highlighted the following text and used it as a prompt.


Prompt:

Generate a description of a five legged cat in 100 words


This five legged cat is a Reflex cat. He or she isotonically ungrateful and always wants things from others that are unavailable to them. However, the individual is oftenmanent and resourceful when necessary.


Hmmmm, not great. This was using the text-ada-001 model.


Prompt:

Describe a three legged cat in 50 words


or less

One legged cat is a cat that perceptions are two-legged and one half legged cat is a cat that perceptions are three legs long.


My conclusion is that this model isn’t great for general text completion (or my prompts sucked).

completion_davinci()

Same as above but using a slightly more expensive and apparently better text model, text-davinci-003.


Tell an engaging story in 100 words or less.

I had a five legged cat that


I named Logan. I found him outside my childhood home and decided to take care of him. He was always loved and well taken care of but due to his extra leg, he wouldn’t be able to do what other cats could. Despite this, Logan would still try his hardest when chasing down a laser pointer, climbing the curtains, or running around the house. He may have been a bit clumsy but nonetheless, he brought joy to everyone. My parents used to joke that he rubbed that extra leg for luck and he proved them right time and again. He was truly a special cat.


Decent.

completion_chat()

This is like the above, except it uses the gpt-3.5-turbo model. This model works a little bit different in that it handles both a “system” prompt defining the role of the chatbot, and a user prompt. This function defines the system as “you are helpful assistant”. Highlighted text in RStudio is then sent to the chatbot, and the output is returned in the console.


Can you help with some travel plans?


Absolutely! Where are you planning to travel to and when? Do you need help with transportation, accommodations, activities, or all of the above?


all of the above


I’m sorry, I don’t understand what you mean by “all of the above”. Could you please provide more context or a specific question so I can assist you better?


This implementation does not include the message history, so the model didn’t know how to interpret the second prompt “all of the above”. It would be possible to add the message context, but I’m not going to do that right now.

Closing thoughts

…Describe some benefits of using Rstudio addins to access the openAI API and use it as a writing assistant…

I won’t bother printing the output from gpt.

My actual closing thoughts are that this approach is on par or better than what notion.so was offering, especially in terms of the ability to customize everything (and price). The prompts I’m using in the examples could be improved, and I wouldn’t be surprised if the quality of the outputs were more useable. It would be easy enough to make lots of other addins to try out different calls and responses. I’m probably interested enough to continue fiddling and see what happens.