R is a language for statistical computing as well as a general purpose programming language. Increasingly, it has become one of the primary languages used in data science and for data analysis across many of the natural sciences.
Thank you for taking the time out of your busy schedule to join us for this training. We hope you will find this workshop both fun and helpful, and we appreciate your patience as we encounter any bumps in our new virtual format! Also, we would like to thank Allison Horst for allowing us to use her incredible monsteRs and R illustrations that you will see included in this tutorial.
The goals of this training are to expose you to fundamentals and to develop an appreciation of what’s possible with this software. We also provide resources that you can use for follow-up learning on your own. You should be able to answer these questions at the end of this section:
There are many programming languages available and each has it’s specific benefits. R was originally created as a statistical programming language but now it is largely viewed as a ‘data science’ language.
R is also an open-source programming language - not only is it free, but this means anybody can contribute to it’s development.
In the old days, the only way to use R was directly from the Console - this is a bare bones way of running R only with direct input of commands. Now, RStudio is the go-to Interactive Development Environment (IDE) for R. Think of it like a car that is built around an engine. It is integrated with the console (engine) and includes many other features to improve the user’s experience.
Let’s get familiar with RStudio before we go on.
If you haven’t done so already, download and install RStudio from the link above (version 1.3.1073). After it’s installed, find the RStudio shortcut and fire it up. You should see something like this:
There are four panes in RStudio (starting from the bottom left and moving clockwise):
The first part of RStudio that we will be working in is called the Console. This is the part of the application that tells you what R is running and it performs many of the same commands as you would be able to do in a calculator or in Microsoft Excel.
Type the following text into the console, press enter
, and you should see the following results:
4
## [1] 4
4+8
## [1] 12
You can also create variables with custom values. In this case the first part of the code is a name of your choosing. Ideally it should have some meaning, but the only rules are that it can’t start with a number and must not have any spaces. The second bit, <-
, is the assignment operator. This tells R to take the result of whatever is on the right side of the <-
and store it in a new object.
Type the following into your Console and press enter
after each one to see their output:
stream <- 4
pebble <- 8
You might notice is no output in the Console for the lines of code you have just run. Instead, they have been stored in your Environment in the top right pane of your RStudio window. Because they have been stored, you can run these variables by typing their name in the Console and pressing enter
.
There are two possible outcomes when you run code. First, the code will simply print output directly in the console, as it did with the calculations you entered above. Second, there is no output because you have stored it as a variable in the Environment. The Environment is the collection of named objects that are stored in memory for your current R session. Anything stored in memory will be accessible by it’s name without running the original script that was used to create it.
Add the variables you just created together, and examine the output:
stream + pebble
## [1] 12
You can also create new variables using existing variables like so:
habitat <- stream + pebble
Try to call the variable by typing:
habitatt
print("Error: object 'habitatt' not found")
## [1] "Error: object 'habitatt' not found"
This line of code will give you an error, because you must type your variable names exactly as they appear. Please keep in mind that R is also case-sensitive.
Try to call the variable once more in the Console using the correct spelling:
habitat
## [1] 12
Please note: Clicking on the broom button in the Environment will permanently clear out your existing variables. Only do this if you are certain you want to remove/reset all saved variables and datasets.
In this same pane in the RStudio window is the History tab, which will record all the code you’ve run, and the Connections tab will show connections to other databases, etc.
Immediately below the Environment is the third section of RStudio, where all of your Packages are stored. The base install of R is quite powerful, but you will soon have a need to go beyond this. Packages provide this ability. They are a standardized way of extending R with new methods, techniques, and programming functionality.
One of the reasons for R’s popularity is CRAN, The Comprehensive R Archive Network. This is where you download R and also where you can gain access to additional packages. All of the packages we’ll be using during this tutorial will be downloaded from CRAN. As of 2020-10-15, there are 16415 on CRAN!
When a package gets installed, that means the source code is downloaded and put into your library. Let’s give it a shot using the tidyverse, a set of packages assembled for data tidying and visualization purposes.
We’re going to use our very first function - install.packages()
- to install this package. Type the following into your Console and press enter
:
install.packages("tidyverse")
You should see it appear in the Packages tab. To find it, you can either scroll through the list, or type the package name into the search bar at the top of the pane.
In order to use a package, you must attach it to your current workspace. To attach the tidyverse
, type the following into your Console and press Enter
:
library(tidyverse)
Now your package is loaded, and ready to use. You can be certain your package is attached if there is a check mark next to the package name in the Packages tab.
The remaining tabs in this pane allow you to see the Files that are associated with your code, the Plots you’ve created, Help documents for when you get stuck, and view additional HTML content you might create with the Viewer tab.
Great job! You’ve opened up RStudio, learned about some basic functionalities, and now you’re ready to get going on your first R project!
Below we have included some excellent resources for R learning, troubleshooting, and communities.
Being able to find help and interpret that help is probably one of the most important skills for learning a new language. R is no different. Help on functions and packages can be accessed directly from R, can be found on CRAN and other official R resources, searched on Google, found on StackOverflow, or from any number of fantastic online resources.
Getting help from the console is straightforward and can be done numerous ways.
# Using the help command/shortcut
# When you know the name of a function
help("print") # Help on the print command
?print # Help on the print command using the `?` shortcut
# When you know the name of the package
help(package = "dplyr") # Help on the package `dplyr`
# Don't know the exact name or just part of it
apropos("print") # Returns all available functions with "print" in the name
??print # shortcut, but also searches demos and vignettes in a formatted page
In addition to help from within R itself, CRAN and the R-Project have many resources available for support. Two of the most notable are the mailing lists and the task views.
There are also a number of platforms created to support communities using R and RStudio including RStudio Community, RLadies and RStudioEDU. Many of our workshop helpers today are members of RLadies! Following #rstats on Twitter is also an excellent way to stay informed of RStudio updates and new package developments.
While the resources already mentioned are useful, often the quickest way is to just turn to Google. Google works great if you search for a given package or function name. You can also type in the error you are experiencing verbatim - chances are, someone before you has encountered the exact same problem.
Blind googling can require a bit of strategy to get the info you want. Some pointers:
One specific resource that I use quite a bit is StackOverflow with the ‘r’ tag. StackOverflow is a discussion forum for all things related to programming. You can then use this tag and the search functions in StackOverflow and find answers to almost anything you can think of. However, these forums are also very strict and I typically use them to find answers not to ask questions.
Below are just a few more resources I like: