1 Basics of R
1.1 Introduction to R and RStudio
R is a computer programming language that is particularly well-suited for statistical analysis and data visualisation, but also can handle text processing, among other functions. For data science, R is typically considered one of the best programming languages alongside Python. A recent index lists R as the 8th most popular programming language behind languages like C, C++, Python, and JavaScript.
RStudio is a user-friendly program for running the R language. If you had clicked on just the R application, it will display what’s called a “console” that is a very simple interface that allows you to enter code interactively. You write a line, and it prints out the response, very much like a calculator. RStudio also has a console at the bottom lefthand corner of your screen. You’ll see that each line begins with a ‘>’ and a blinking cursor; this ‘>’ means you should start entering your code there.
In addition to having a console, RStudio also has a display window in the upper lefthand corner for viewing objects like your datasets, scripts (which contain saved code), among other things. This is one of the most useful aspects of RStudio.
In the upper righthand corner is your environment window. This lists all of the objects you have saved in your workspace. These objects might be things like datasets, general variables, etc.
Finally, in the lower lefthand corner is another display window with a few tabs: Files, Plots, Packages, Help, and Viewer. We won’t cover all of these, but as the module progresses, you’ll see that any plot you create will be displayed in this window. If you need to access the manual to look up how a specific R function, the manual in the Help tab will also be displayed here.
Note that you can customize how this all looks! Is the font size too small, or does the bright white screen bother your eyes? Feel free to customize RStudio to work best for you. Under Tools –> Global Options…, you can find the Appearances option to change aspects of the editor window.
Now that you’ve had a very brief orientation, let’s get started!
1.2 R as a calculator
1.2.1 Mathematical operators
- + addition, - subtraction, * multiplication, / division, ^ power, () parentheses / bracketing
- these characters are generally reserved for these mathematical operations
- spacing generally doesn’t matter, but please keep it clean
R follows the mathematic order of operations. From left to right, it will first evaluate:
- elements in parentheses, brackets
- exponents, powers, indices, logs (from left to right)
- multiplication/division (from left to right)
- addition/subtraction (from left to right)
1.2.2 Mathematical functions
R has several built-in functions that you can use. Try not to overwrite built-in functions when you start to use variables. A function takes at least one argument, and can frequently also take more than one argument. The following functions demonstrate the use of 1 argument in a function (the thing inside the parentheses):
Square root function:
Square root expressed with an operator instead of the function. Remember that the square root is the same as raising a number to the power of 1/2:
The logarithmic function: Recall that log functions can have different bases. The default log(x) uses base e, the natural log (aka ln); you can also specify log10(x) which is log base 10. Related built-in functions include:
- sine function: sin(x)
- cosine function: cos(x)
- tangent function: tan(x)
- expononent function (e to the power of x; inverse of log(x)): exp(x)
Right now, you do not need to remember what all of these functions mean yourself. The goal is to simply see how R works, how functions work in R, and how operators work in R. One of the main uses of R is as a very powerful calculator – exactly what one needs for statistical analysis.
1.2.3 Comparing numbers and strings
Why might we want to compare numbers against other numbers or strings against other strings? When you start processing large amounts of data, you might want a truth value on two whole columns of data all at once. For example, for each row in a spreadsheet, is the number in column 1 less than the corresponding number in column 2? One practical example is perhaps calculating accuracy in a dataset: did the participant’s response (indicated in column1) match the actual answer (indicated in column 2)?
- is equal to: ==
- is not equal to: != (the exclamation point typically means NOT)
- less than: <
- greater than: >
- less than or equal to: <=
- greater than or equal to: >=
These statements will return a value of TRUE or FALSE (also known as a “logical”).
1.3 Variables
<- is called the assignment operator; it’s the most commonly used one in the R community. It is directional, so you can say 5 + 3 -> x or x <- 5 + 3
= is another assignment operator. It is not directional. The variable name must be on the left.
The statement below won’t work:
Note that you can reuse the variable to update the variable (as in x <- x+2):
1.3.1 Constraints on variable names
- Variable names must begin with a letter and only contain letters, numbers, _ (underscore), or . (period)
- Spaces are not allowed
- To make your life easier, do not name your columns in spreadsheets with spaces
- To make your life easier, do not use spaces in any variable or filename on the computer
Bad examples below:
Good examples, but note that variables are case-sensitive:
1.4 Some useful functions and notations
1.5 Vectors
A vector is a one-dimensional list of items of the same type. Think of it like a column in a spreadsheet.
As an example, you can create a range of integers with the colon :
A sequence of numbers can be created with seq()
.
The function seq() takes at least two arguments: a start value and an end value. You can also specify a third argument indicating the interval between subsequent numbers.
The following line of code will generate an error:
You can get the length of a vector with length():
1.5.1 Accessing parts of a vector
You can use an index (numerical position) within square brackets to get the value of the cell at that position. A vector is one-dimensional, so there’s just one index inside the brackets. When we start trying to access cells within data frames, which have two dimensions (rows and columns), we’ll start to use two numbers inside the brackets.
You can also access parts of the vector with comparison operators or ranges. Here we’re starting to combine functions:
Before, we created vectors by specifying a range of numbers. We can also create a vector that is an arbitrary list of elements with the c() function. In this case, c stands for combine, concatenate, collect or whatever word makes most sense to you.
We can also create a list of strings; strings must be in quotes!
We can then use this c() function to identify items 1, 20 and 40 in a vector (a random set of indices)
1.6 Practice
Before starting this practice, we’re going to set up a way to save the code we write. To do this, we’re going to create an “R script”. This is a simple text document that contains R code and has a .R extension (like .txt or .docx, etc.).
An easy way to create an R script is through RStudio itself. Select File –> New File –> R Script, and a new R script should pop up in your RStudio View window (upper left). You can now paste your code in this script, and re-run it at a later date. To include non-code comments, use the # symbol.
For example, you might copy out some of the questions and put the answer in code below it, like the following:
# Session 1 Practice
# Eleanor Chodroff
# 6 July 2023
# 1. Create a variable called "cats". Assign the value 200 to it.
cats <- 200
# 2. Create a second variable called "dog". Assign the value 100 to it.
There’s the example for setting your R script up, and the answer to the first question. :)
1.6.1 Practice with unit variables
- Create a variable called “cats”. Assign the value 200 to it.
- Create a second variable called “dog”. Assign the value 100 to it.
- Update the variable “cats” by dividing it by 2. Before looking at the value of cats, what do you think it should be?
- Hopefully you know whether “cats” is greater than, equal to, or less than “dog”. Write a line of code that tests each of these equalities. (There should be three lines of code for your answer.)
- Update the variable “dog” by adding 5 to it.
- Write a line of code that tests whether “dog” is greater than or equal to “cats”, greater than “cats”, equal to “cats”, less than “cats”, or less than or equal to “cats”. (There should be five lines of code for your answer.)
1.6.2 Practice with vectors: numerical
- Create a vector called “myNumbers” that contains a sequence of numbers from 0 to 49, increased by 1 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, …)
- Update the vector “myNumbers” by adding 1 to it.
- Update the vector “myNumbers” by multiplying it by 4.
- What is the 21st element of the vector?
- What is the 43rd element of the vector?
- Create a new vector from “myNumbers” called “mySmallNumbers” that contain only the elements that have values less than 20.
1.6.3 Practice with vectors: strings
- Create a vector called “colors” with the values “green”, “blue”, “red”, and “yellow” in it.
- What is the second element in the vector “colors”?
- Get the length of the vector “colors”. (Before you run the line of code, what do you think the answer should be?)
1.4.2 Comments
Comments are made with hashmark. The hashmark tells R not to evaluate anything written from the # to the end of the line