The practicals that you will be participating in will make use of cloud compute resources that are provided by The University as a virtual machine (VM). The VM is basically a program that runs on a server but acts like a separate computer. You can log in to the VM and interact with the programs installed on it. The VMs provided each have 2 CPU cores, 16 GB of system memory, and 80 GB of hard disk space. They are yours to use for the semester, but you are also responsible for looking after them. The University hosts these VMs on AWS RONIN (Amazon Web Services) and pays by the minute for CPU time and storage. As a result, we have added an auto shutdown feature for VMs that stay idle for too long (idle means no jobs running or no user input).
Please go here for instructions on connecting to your VM.
S
(Bell Labs, 1976)
R offers considerable advantages over Excel, GraphPad Prism & the like:
Importantly: We have code to record and repeat our analysis. This enables highly reproducible (and reputable) research!
R can be installed on any laptop, and the most common way to interact with R is through the IDE called RStudio. When using RStudio, we can compare it to the cabin of a car, while R itself is like the engine. RStudio offers many extra features that are not directly related to R but make our experience much more enjoyable while working with R.
For these practicals, we have given you a Virtual Machine (VM), and we will use this to ensure minimal issues with the entire practical series. RStudio is how we will primarily interact with our VMs
Please go here for instructions on connecting to your VM.
Let’s have a quick look at R in its simplest form. To perform
calculations or run commands directly, we can use the
Console
. In the top left corner, just below the blue ‘R’
logo, you can see the word Console
. Click this and you will
see the following text (or similar depending on your version):
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Directly below this, you’ll see a flashing cursor next to the following symbol.
>
(If the cursor is not flashing, click on this pane and it should start flashing.)
This is the R
Console and this symbol is the prompt
within an R session. We enter our code here, and can even use this as a
simple calculator. Enter the following at the R prompt followed by
Enter after each line, which will evaluate our ‘difficult’
calculations.
Mess around with a few more simple calculations, then move on to the next section.
In the simple exercise above, we interacted directly with the R Console; however, most of the time we will use the RStudio Integrated Development Environment (IDE). As you’re probably aware, this is how we’ve logged into our VM, and this will be our main environment for the practical series.
Both R and R Studio can also be installed on your computer, and you may need to do this outside of this course. However, it will not be required for this course.
First we will set up an R Project
for today’s practical.
These are a simple way to manage your code for multiple analyses or
datasets, and are always named after the directory they are in.
These are not essential, but are very useful and make good practice. If
you don’t follow this step, you will be making your life immeasurably
harder and will not be following ‘best practice’ for R.
In the middle of the screen you will see the home icon (). Click on
directly above this and
call this folder
Practical_1
. Please ensure you
spell this correctly, including the underscore and capitalised first
letter. It is important when using scripting languages never to use
spaces in the names of files or folders, so please get into the practice
of following this rule from now on!
This is where we’ll run everything for today’s session. To make an R Project for this folder use the following commands in the menu bar at the top of your R Studio session:
1. `File` > `New Project`
2. When asked if you'd like to `Save Current Workspace`, choose `Don't Save`.
3. Choose `Existing Directory` >
+ Browse to `Practical_1`
+ `Choose`
+ `Create Project`
You will now find yourself in the Practical_1
is folder
we have created. If you look at the top right of your RStudio session
you will also see Practical_1
and an arrow for the
drop-down menu. We only have one project at this point, but this is
where you can switch between projects. For this practical series, we’ll
start a new project for every session. This will help you keep
everything organised so you can find everything later. Managing your
data and code intelligently is a considerable challenge in
bioinformatics, so it is important to be organised from the
beginning!
The final step is to copy the data we have provided for this
practical into your Practical_1
folder (it is best practice
when using R Projects
to keep all data, scripts, outputs,
reports within the R Project folder for maximum portability and
reproducibility).
The provided data is in the folder ~/data/intro_r
within
your Home folder on the VM (the ~/
part here expands to
your system Home
folder, which on the VM expands to
/shared//aXXXXXXX
). We want to preserve the original folder
structure, so we will copy it rather than move it. We can do this using
the Terminal (command line), or we can use the Files pane in RStudio. We
will keep things simple for now and use the Files pane as follows:
1. Click on the `Files` tab in the bottom right pane of RStudio.
- Ensure you are in the `~/Practical_1` folder
2. Click on the new folder icon (looks like a folder with a plus sign) to create a new folder.
- Name this folder `data`
3. Click on the `Home` icon (looks like a house) to go to your home folder.
4. Navigate to the `~/data` folder within `Home`.
5. Click on the box next to the `intro_r` folder to select it.
6. Click on the `More` button in the top right of the Files pane.
7. Select `Copy To` from the drop-down menu.
8. Click `Home`, then double-click 'Practical_1`, and then click `data`.
9. In the `File name` box type `intro_r` and select `Save`
- This will copy the `intro_r` folder into your `Practical_1/data` folder.
Great, the data is now in your Practical_1/data/intro_r
folder! Now we are ready to start using R and RStudio.
Instead of just entering commands into the console, we can enter our commands into a text file, known as a script, and execute them from this file. This enables us to keep a complete record of everything we have done in complex analyses. Sometimes, we even publish these are part of a research paper.
Let’s create a blank R Script
. As you may expect, these
are just plain text files we use to save our code. Using the RStudio
menu:
File > New File > R Script
Save this as Introduction.R
.
Your screen should now look like the following (without the blue text):
The common RStudio layout
Let’s take a tour through the various panes and tools within RStudio.
#
symbolTerminal
which we have already used to
change our password. We’ll also spend a fair bit of time here later in
the course.As well as performing simple calculations in the Console, as we’ve already done:
R
has what we call an Environment
(i.e. a
Workspace)R
objects are also similar to variables in many other
programming languagesR
performs calculations & runs processes on these
objects.To create an R
object:
R
, the process looks like
nameOfObject <- data
<-
symbol is like an arrow, and is the
assignment operatorR
to put the data
in the objectIn the Script Window type:
x
x
by entering it’s name
directly in the Console
, or by calling print()
and running it in the script. It is good practice to use the script
window rather than interact directly with the Console as it can be
directly saved for future use and easily edited.Where have we created the object x
?
We have placed x
in our R Workspace
, which
is more formally known as your Global Environment
.
The Environment
pane shows us which objects are
currently in our R Environment. The Environment
is very
much like the desk in your study (or bedroom) where we can have things
spread and piled up everywhere. In the R Environment, we can create
objects of multiple types. Once an object is in our
Environment
we can perform calculations on it. Using the
object x
we created as an example:
NB: R is case sensitive
R
has a series of in-built functions,
e.g. sqrt()
, log()
, max()
,
min()
etc. loaded into your Environment. We place an object
or value inside the ()
after the name of a function - these
are the arguments to the function.
Many in-built functions are organised into a package called
base
R
In the Console type ?base
+ Enter This will
take you to the Help
pane. Click on the underlined word
Index
at the bottom for a list of functions in the
base
packages. We can also search for help with any
function by starting with a question mark in the Console
.
For another example, we could try ?sqrt
Also in the pane are the Files
and Plots
panes. We’ve already used the Files
pane to create our
Practical_1
folder. Whenever will plot anything in a
script, or in the R Console, the figures will appear in the
Plots
pane in this section of RStudio.
If we generate any html output, this will appear in the
Viewer
pane. However, the Packages pane is quite useless
and slightly dangerous for the unwary. In R, we never click on anything
and this pane will tempt you into bad habits. We usually have this
disabled as it serves literally no purpose.
Tools > Global Options > Packages
, then uncheck the
box next to Enable packages pane
.
Best practice for all analysis is to enter every line of code in the Script Window
RStudio
will:
When writing code
#
#
at the end of a line of code with a
comment followingTo see how this works:
To send this to the Console:
Ctrl+Enter
(Cmd+Enter
on OSX), orCtrl+Enter
(Cmd+Enter
on OSX)Run
buttonAs well as creating objects, we can use this to write general code. Enter the following in your script, then send it to the R Console.
In R
, this type of object is known as a vector, and this
is quite similar to a single column in Excel. We can perform operations
on an entire vector. Again, type this into your script window, then send
it to the Console
As well as a logical test, we can perform mathematical operations on a vector.
We’ll learn more about vectors later.
RStudio
will give us suggestions when we ask it to.
?bas
then
wait for about half a second
Environment
Tab is the History
Tab
Console
Script Window
and execute