Software
0.2 Data
Data files used for the labs are all taken from open data sources. Links are provided for each lab. For convenience, all of the data files are also available here as single files in the github repository for this lab manual
0.2.1 Data Repository
0.2.2 CSV format
All of the data files in .csv format are also available to download as a .zip file
0.2.3 SPSS format
All of the data files in SPSS format are also available to download as a .zip file
0.3 R
In this course we will be using R as a tool to analyze data, and as a tool to help us gain a better understanding of what our analyses are doing. Throughout each lab we will show you how to use R to solve specific problems, and then you will use the examples to solve homework and lab assignments. R is a very deep programming language, and in many ways we will only be skimming the surface of what R can do. Along the way, there will be many pointers to more advanced techniques that interested students can follow to become experts in using R for data-analysis, and computer programming in general.
R is primarily a computer programming language for statistical analysis. It is free, and open-source (many people contribute to developing it), and runs on most operating systems. It is a powerful language that can be used for all sorts of mathematical operations, data-processing, analysis, and graphical display of data. I even used R to write this lab manual. And, I use R all the time for my own research, because it makes data-analyis fast, efficient, transparent, reproducible, and exciting.
0.3.1 Why R?
There are lots of different options for using computers to analyze data, why use R?. The options all have pros and cons, and can be used in different ways to solve a range of different problems. Some software allows you to load in data, and then analyze the data by clicking different options in a menu. This can sometimes be fast and convenient. For example, once the data is loaded, all you have to do is click a couple buttons to analyse the data! However, many aspects of data-analysis are not so easy. For example, particular analyses often require that the data be formatted in a particular way so that the program can analyze it properly. Often times when a researcher wants to ask a new question of an existing data set, they have to spend time re-formatting the data. If the data is large, then reformatting by hand is very slow, and can lead to errors. Another option, is to use a scripting language to instruct the computer how reformat the data. This is very fast and efficient. R provides the ability to everything all in one place. You can load in data, reformat it any way you like, then anlayze it anyway you like, and create beautiful graphs and tables (publication quality) to display your findings. Once you get the hang of R, it becomes very fast and efficient.
0.3.2 Installing R and R Studio
Download and install R onto your computer. The R website is: http://www.r-project.org
Find the download R using the link. This will take you to a page with many different mirror links. You can click any of these links to download a version of R that will work on your computer. After you have installed R you can continue.
After you have installed R on your computer, you should want to install another program called R studio. This program provides a user-friendly interface for using R. You must already have installed R before you perform this step. The R-studio website is: http://www.rstudio.com
Find the download link on the front-page, and then download R studio desktop version for your computer. After you have installed R studio you will be ready to start using R.
The website R-fiddle allows you to run R scripts in the cloud, so you can practice R from your web-browser!
0.3.3 R studio notes and tips
0.3.3.1 Console
When you open up R studio you will see three or four main windows (the placement of each are configurable). In the above example, the bottom left window is the command line (terminal or console) for R. This is used to directly enter commands into R. Once you have entered a command here, press enter to execute the command. The console is useful for entering single lines of code and running them. Oftentimes this occurs when you are learning how to correctly execute a line of code in R. Your first few attempts may be incorrect resulting in errors, but trying out different variations on your code in the command line can help you produce the correct code. Pressing the up arrow while in the console will scroll through the most recently executed lines of code.
0.3.3.2 Script Editor
The top left corner contains the script editor. This is a simple text editor for writing and saving R scripts with many lines. Several tabs can be opened at once, with each tab representing a different R script. R scripts can be saved from the editor (resulting in a .r file). Whole scripts can be run by copy and pasting them into the console and pressing enter. Alternatively, you can highlight portions of the script that you want to run (in the script editor) and press command-enter to automatically run that portion in the console (or press the button for running the current line/section: green arrow pointing right).
0.3.3.3 Workspace and History
The top right panel contains two tabs, one for the workspace and another for history. The workspace lists out all of the variables and functions that are currently loaded in R’s memory. You can inspect each of the variables by clicking on them. This is generally only useful for variables that do not contain large amounts of information. The history tab provides a record of the recent commands executed in the console.
0.3.3.4 File, Plot, Packages, Help
The bottom-right window has four tabs for files, plots, packages, and help. The files tab allows browsing of the computers file directory. An important concept in R is the current working directory. This is file folder that R points to by default. Many functions in R will save things directly to this direct, or attempt to read files from this directory. The current working directory can be changed by navigating to the desired folder in the file menu, and then clicking on the more option to set that folder to the current working directory. This is especially important when reading in data to R. The current working directory should be set to the folder containing the data to be inputted into R. The plots tab will show recent plots and figures made in R. The packages tab lists the current R libraries loaded into memory, and provides the ability to download and enable new R packages. The help menu is an invaluable tool. Here, you can search for individual R commands to see examples of how they are used. Sometimes the help files for individual commands are opaque and difficult to understand, so it is necessary to do a Google search to find better examples of using these commands.
0.3.4 How to complete the R Labs
Each of the labs focuses on particular data-analysis problems, from graphing data, computing descriptive statistics, to running inferential tests in R. All of the labs come in three parts, a training part, a generalization part, and a writing part. The training part includes step-by-step examples of R code that solves particular problems. The R code is always highlighted in grey. The generalization part gives short assignments to change parts of the provided code to solve a new problem. The writing part tasks you with answering questions about statitiscal concepts.
The way to complete each lab is to open a new R Markdown document in R-studio, and then document your progression through each of the parts. By doing this, you will become familiar with how R and R-studio works, and how to create documents that preserve both the code and your notes all in one place. There are a few tricks to getting started that are outline below.
- Open R-studio
0.3.4.1 R projects
- Create a new R project
- Go to the file menu and select new project, or go to the top right-hand corner of R-studio, you should see a blue cube with an R in it, then select New project from the dropdown menu
- Save the new R project somewhere that you can find it. If you are working on a lab computer, then save the new R project to the desktop.
What is an R project? When you create a new R project you are creating two things, 1) a new folder on your computer, and 2) a “.Rproj” file. For example, if you gave your R project the name “Lab1”, then you will have created a folder title “Lab1”, and inside the folder you will find an R project file called “Lab1.Rproj”.
As you work inside R-studio you will be creating text documents, and you will be doing things like loading data, and saving the results of your analyses. As your work grows and becomes more complex, you can often find yourself creating many different files. The R project folder is a very useful way of organizing your files all in one place so you can find them later. If you double-clik an R project file, R-studio will automatically load and restore your last session. In the labs, you will be using your R project folder to:
- save data files into this folder
- save R-markdown files that you will use to write your R-code and lab notes
- save the results of your analysis
0.3.4.2 Installing libraries
When you install R and R-studio, you get what is called Base R. Base R contains many libraries that allow you to conduct statistical anlayses. Because R is free and open-source, many other developers have created add-on libraries that extend the functionality of R. We use some of these libraries, and you need to install them before you can do the labs.
For example, in any of the labs, whenever you see a line code that uses the word library like this library(libraryname)
, this line of code telling R to load up that library so it can be used. The libraryname
would be replaced with the actual name of the library. For example, you will see code like this in the labs:
library(data.table)
This line of code is saying that the data.table
library needs to be loaded. You can check to see if any library is already loaded by clicking on the “packages” tab in the bottom right hand panel. You will see many packages listed in alphabetical order. Packages that are currently loaded and available have a checkmark. If you scroll down and find that you do not have data.table
installed, then you need to install it. To install any package follow these steps:
- Click on the packages tab
- Find the “install” button in the top left hand corner of the packages tab.
- Click the install button
- Make sure “install from:” is set to CRAN repository
- Make sure “dependencies” is clicked on (with a checkmark)
- type the name of the library into the search bar.
- As you type, you should see the names of different packages you can install pop-up in a drop-down menu. You must be connected to the internet to install packages from CRAN
- Once you find the package (e.g.,
data.table
), click it, or just make sure the full, correctly spelled name, is in the search bar - Press the install button
You should see some text appear in the console while R installs the package.
- After you have installed the package, you should now see that it is listed in the packages tab.
- You can turn the package on by clicking it in the package tab.
- OR, you can turn the packge on by running the command
library(data.table)
in the console, to do this typelibrary(data.table)
into the console, and press enter.
0.3.4.3 Quick install
If you are using R on one of the lab computers, you may find that some of the packages are not installed. The lab computers get wiped everynight, so it may be necessary to install packages each time you come back to the lab. Fortunately, we can tell R to install all of the packages we need in one go. Copy the following lines of code into the console, and press enter. Note you can select all of the lines at once, then copy them, then paste all of them into the console, and press enter to run them all. After each of the packages are installed, you will then be able to load them using library()
.
install.packages(ggplot2)
install.packages(dplyr)
install.packages(data.table)
install.packages(summarytools)
install.packages(gapminder)
install.packages(ggpubr)
0.3.4.4 R markdown
Once you have the necessary packages installed you can begin creating R markdown documents for each lab. We admit that at the beginning, R markdown documents might seem a little bit confusing, but you will find they are extremely useful and flexible. Basically, what R markdown allows you to do is combine two kinds of writing, 1) writing R code to conduct analyses, and 2) writing normal text, with headers, sub-headers, and paragraphs. You can think of this like a lab journal, that contains both your writing about what you are doing (e.g., notes to self), and the code that you use for analysis. Additionally, when your code does something like make a graph, or run a statistical test, you can ask R markdown to print the results.
The R markdown website has an excellent tutorial that is well worth your time to check out: https://rmarkdown.rstudio.com/lesson-1.html
0.3.4.5 R markdown lab templates
We have created a set of template documents for each lab that can be downloaded here: download lab templates.
When you unzip the file you should find the following:
- A new folder titled “RMarkdownsLab”
- Inside the folder you will see the “RMarkdownsLab.Rproj” file
- A data folder containing data files for the labs
- A “LabTemplates” folder containing the R markdown templates for each lab.
To get started with Lab 1, follow these steps:
- copy the template file for lab 1, “Lab 01 Graphing_Student Name.Rmd”, and place it into the “RMarkdownsLab” (copy it out of the template folder, and into the RMarkdownsLab folder).
- Rename the file to add your own name, eg., “Lab1GraphingMattCrump.Rmd”
- double-click the “RMarkdownsLab.Rproj” file
- R-studio will now load up.
- If you click the files tab, you will see all of the files and folders inside the “RMarkdownsLab” folder
- Click on your lab1 .rmd file, it will now load into the editor window.
Each lab template .rmd file contains three main sections, one for each part of the lab. You will write things inside each section to complete the lab.
0.3.5 Screencast tutorial
Follow this guide to get up running for Lab 1.
0.3.6 R-studio Cloud
R-studio is also in the cloud. This means that if you want to use R and R-studio through your web-browser you can do that without even installing R or R-studio on your computer. It’s also free!
sign up for an R-studio cloud account here: https://rstudio.cloud
You can make new R projects, work inside them, and everything is saved in the cloud!
To see how everything would work, follow the steps in this video. You will need to download this .zip file to your computer to get started
The link to the video is https://www.youtube.com/watch?v=WsbnV0t7FE4, or you can watch it here: