The practicals that you will be participating in will make use of
cloud compute resources that are provided by The University as a virtual
machine (VM). The VM is essentially a program that is running on a
server but which behaves as though it is an individual separate
computer. You will be able to log in to the VM and interact with the
programs that it has installed. The VMs provided each have 2 CPU cores,
16GB of system memory and 80GB of hard disk space. They are yours to use
for the semester, but they are also yours to look after. The University
runs these VMs on AWS RONIN (Amazon Web Services) and pays by the minute
for cpu
time and for the storage. Because of this, we have
implemented auto shutdown of VMs if they remain idle for too long (idle
means no jobs running or no user input).
Please go here for instructions on connecting to your VM.
S
(Bell Labs, 1976)
R offers considerable advantages over Excel, GraphPad Prism & the like:
Importantly: We have code to record and repeat our analysis. This enables highly reproducible (and reputable) research!
R
can be installed on any laptop and the most common
method for interacting with R is the IDE known as
RStudio. When using RStudio, we can think of RStudio as
being like the cabin of a car, whilst R
itself is the
engine. RStudio has a whole of extra features not directly related to
R
, but that make our experience much more enjoyable whilst
working with R
.
For these practicals we have given you a Virtual Machine (VM) and we will use these to ensure minimal issues with the entire practical series. RStudio is how we will primarily interact with our VMs
Please go here for instructions on connecting to your VM.
Let’s have a quick look at R in it’s simplest form. To make a
calculation or run a command directly, we can use the
Console
. In the top left corner, directly below the blue
‘R’ logo, you can see the word Console
. Click this and you
should see the following text (or similar depending on your
version):
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Directly below this you’ll see a flashing cursor next to the following symbol.
>
(If the cursor is not flashing, click on this pane and it should start flashing.)
This is the R
Console and this symbol is the prompt
within an R session. We enter our code here, and can even use this as a
simple calculator. Enter the following at the R prompt followed by
Enter after each line, which will evaluate our ‘difficult’
calculations.
1 + 1
2 * 2
1 / 4
3 ^ 2
Mess around with a few more simple calculations, then move on to the next section.
In the simple exercise above, we interacted directly with the R Console, but most of the time we will use the R Studio Integrated Development Environment (IDE). As you’re probably aware, this is how we’ve logged into our VM and this will be our main environment for the practical series.
Both R and R Studio can also be installed on your own computer, and you may need to do this outside of this course. However, it will not be required for this course.
R Project
First we will set up an R Project
for today’s practical.
These are a simple way of managing your code for multiple analyses or
datasets, and are always named after the directory they are in.
These are not essential, but are very useful and make good practice. If
you don’t follow this step, you will be making your life immeasurably
harder and will not be following ‘best practice’ for R.
In the middle of the screen you will see the home icon (). Click on directly above this and
call this folder Practical_1
. Please ensure you
spell this correctly, including the underscore and capitalised first
letter. It is important when using scripting languages to never use
spaces in the names of files or folders, so please get into the practice
of following this rule from now on!
This is where we’ll run everything for today’s session. To make an R Project for this folder use the following commands in the menu bar at the top of your R Studio session:
File
> New Project
Save Current Workspace
,
choose Don't Save
.Existing Directory
>
Practical_1
Create Project
You will now find yourself in this folder we have created. If you
look at the top right of your RStudio session you will also see
Practical_1
and an arrow for the drop-down menu. We only
have one project at this point, but this is where you can switch between
projects. For this practical series, we’ll start a new project for every
session. This will help you keep everything organised so you can find
everything later. Managing your data and code intelligently is a
considerable challenge in bioinformatics, so it is important to be
organised from the beginning!
R Script
Instead of just entering commands into the console, we can enter our commands into a text file, known as a script, and execute them from this file. This enables us to keep a complete record of everything we have done in complex analyses. Sometimes, we even publish these are part of a research paper.
Let’s create a blank R Script
. As you may expect, these
are just plain text files we use to save our code. Using the RStudio
menu:
File > New File > R Script
Save this as Introduction.R
.
Your screen should now look like the following (without the blue text):
Let’s take a tour through the various panes and tools within RStudio.
#
symbolTerminal
which we have already used to
change our password. We’ll also spend a fair bit of time here later in
the course.As well as performing simple calculations in the Console, as we’ve already done:
R
has what we call an Environment
(i.e. a
Workspace)R
objects are also similar to variables in many other
programming languagesR
performs calculations & runs processes on these
objects.To create an R
object:
R
, the process looks like
nameOfObject <- data
<-
symbol is like an arrow, and is the
assignment operatorR
to put the data
in the objectIn the Script Window type:
x <- 5
x
x
by entering it’s name
directly in the Console
, or by calling print()
and running it in the script. It is good practice to use the script
window rather than interact directly with the Console as it can be
directly saved for future use and easily edited.x
print(x)
Where have we created the object x
?
We have placed x
in our R Workspace
, which
is more formally known as your Global Environment
.
The Environment
pane shows us which objects are
currently in our R Environment. The Environment
is very
much like the desk in your study (or bedroom) where we can have things
spread and piled up everywhere. In the R Environment, we can create
objects of multiple types. Once an object is in our
Environment
we can perform calculations on it. Using the
object x
we created as an example:
1 + x
x^2
NB: R is case sensitive
R
has a series of in-built functions,
e.g. sqrt()
, log()
, max()
,
min()
etc. loaded into your Environment. We place an object
or value inside the ()
after the name of a function - these
are the arguments to the function.
sqrt(x)
log(x)
Many in-built functions are organised into a package called
base
R
In the Console type ?base
+ Enter This will
take you to the Help
pane. Click on the underlined word
Index
at the bottom for a list of functions in the
base
packages. We can also search for help with any
function by starting with a question mark in the Console
.
For another example, we could try ?sqrt
Also in the pane are the Files
and Plots
panes. We’ve already used the Files
pane to create our
Practical_1
folder. Whenever will plot anything in a
script, or in the R Console, the figures will appear in the
Plots
pane in this section of RStudio.
If we generate any html output, this will appear in the
Viewer
pane. However, the Packages pane is quite useless
and slightly dangerous for the unwary. In R, we never click on anything
and this pane will tempt you into bad habits. We usually have this
disabled as it serves literally no purpose.
Tools > Global Options > Packages
, then uncheck the
box next to Enable packages pane
.
Best practice for all analysis is to enter every line of code in the Script Window
RStudio
will:
When writing code
#
#
at the end of a line of code with a
comment followingTo see how this works:
# Create an object called x
x <- 1:5
To send this to the Console:
Ctrl+Enter
(Cmd+Enter
on OSX), orCtrl+Enter
(Cmd+Enter
on OSX)Run
buttonAs well as creating objects, we can use this to write general code. Enter the following in your script, then send it to the R Console.
# What do we have in the object `x`
print(x)
In R
, this type of object is known as a vector, and this
is quite similar to a single column in Excel. We can perform operations
on an entire vector. Again, type this into your script window, then send
it to the Console
# I'm not sure. Which values are greater than one?
x > 1
As well as a logical test, we can perform mathematical operations on a vector.
# Let's square every value in `x`
x^2
# And we can find the square root of every value
sqrt(x)
We’ll learn more about vectors later.
RStudio
will give us suggestions when we ask it to.
?bas
then
wait for about half a second
Environment
Tab is the History
Tab
Console
Script Window
and execute