ggplot2
graphics
R
has numerous plotting functions in the base package
graphics
. Whilst they can be useful for quickly drawing up
a plot, customising them can prove highly challenging.
?plot
?boxplot
?hist
Go to the Examples
at the bottom of each help
page and copy a few lines
The package ggplot2
gives much more flexibility and
power. However, it has a unique syntax and approach where we plot by
adding layers of relevant information, based on the concepts outlined in
The Grammar of
Graphics. As a result, we won’t explore the base
graphics
capabilities of R
any further as
ggplot2
is generally far superior. ggplot2
is
also one of the core tidyverse
packages.
ggplot2
Start a new script for this topic called
DataVisualisation.R
and add the usual first line.
library(tidyverse)
We’ll also use the transport dataset from our previous session. This
time, we’ll just overwrite the height
column with the same
values in m instead of cm. We’ll also remove
X1
and add the BMI information directly as we’re loading
the data. Hopefully you can see the usefulness of the
%>%
from this example. This way we never need to edit
our raw data files, but perform all manipulation in R
in a
completely reproducible manner. This strategy also allows for additional
data to be provided as more experiments or data points are added.
Remember to load all packages at the top of your
R
script for easier reference in the future!
csvFile <- file.path("~", "data", "intro_r", "transport.csv")
file.exists(csvFile)
data <- read_csv(csvFile) %>%
mutate(height = height/100, BMI = weight / height^2) %>%
select(-X1)
data
The main ggplot2
function is ggplot()
. In
this first stage we first define the aesthetic mappings using
aes()
. These simply define what is plotted on which axis,
and what defines the colour/shape etc. When calling this function alone
no data will be plotted, and we get a blank plot area
only.
ggplot(data, aes(x = weight, y = height))
After defining the aesthetic mappings, we:
+
symbol at the end of the linegeom_...()
functionsLet’s start by plotting points, for which we need
geom_point()
.
ggplot(data, aes(x = weight, y = height)) +
geom_point()
There are numerous aesthetics specifically available for
geom_point()
, so let’s check the help page.
?geom_point
We can colour points, so let’s do this based on the
transport
column. Note that if we add this during the call
to ggplot()
this will be passed down to every subsequent
geom_*
that we call.
ggplot(data, aes(x = weight, y = height, colour = transport)) +
geom_point()
We can also specify the shape to represent another variable, such as
gender
.
ggplot(data, aes(x = weight, y = height, colour = transport, shape = gender)) +
geom_point()
Let’s leave the general aesthetics in ggplot()
, with the
colour and the shape aesthetics being specifically assigned to the
points. Whilst this looks the same on the plot, this will help as we
build up the complexity of the plot.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender))
Anything set outside of the aes()
function are general
across all points. Here we can make all points larger. Size is a
relative scale and doesn’t specifically represent a value in pixels, mm
or anything similar.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender), size = 3)
In addition to the points, we can easily add a smoothed line. This
defaults to a loess fit, so let’s select a regression line straight away
by setting the method
argument to "lm"
.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender)) +
geom_smooth(method = "lm")
By default, the standard errors for the line are represented by
shading, and we can remove these bands by setting
se = FALSE
.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender)) +
geom_smooth(method = "lm", se = FALSE)
Axis and legend labels can be added using labs()
. The
default values are the column names, so we can tidy those up fairly
easily.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender)) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Weight (kg)", y = "Height (m)", shape = "Gender", colour = "Transport")
One of the most useful features of ggplot2
is the
ability to create multiple sub-plots using the function
facet_wrap()
. Let’s create separate plots based on the
gender
column. This will also plot separate regression
lines.
ggplot(data, aes(x = weight, y = height)) +
geom_point(aes(colour = transport, shape = gender)) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Weight (kg)", y = "Height (cm)", shape = "Gender", colour = "Transport") +
facet_wrap(~gender)
In important piece of syntax we have just seen for the first time is
the use of the tilde (~
). In R
this is how we
often define statistical models, and con often be interpreted as
depends on, so in this call to facet_wrap()
we are
asking ggplot
to draw facets that depend on the values in
the gender
column.
Enter geom_
in the Console followed by the
tab
key. Numerous options will appear such as
geom_density()
and geom_boxplot()
. You may
need to think carefully about what is plotted on which axis for some of
these so pay careful attention.
Here’s geom_density()
setting a degree of transparency
to look amazing.
ggplot(data, aes(x = height, fill = gender)) +
geom_density(alpha = 0.5)
We can easily make boxplots.
ggplot(data, aes(x = gender, y =height, fill = gender)) +
geom_boxplot()
We could plot separate facets based on the transport method as well.
ggplot(data, aes(x = gender, y =height, fill = gender)) +
geom_boxplot() +
facet_wrap(~transport)
Histograms use geom_histogram()
ggplot(data, aes(x = height)) +
geom_histogram(bins = 20)
These tend to look terrible out of the box and need a little tweaking though.
ggplot(data, aes(x = height)) +
geom_histogram(bins = 20, fill = "grey50", colour = "black")
dplyr
We can use dplyr
(or any other) function to summarise
our data before plotting, then pipe (%>%
) into
ggplot()
without modifying our original data object. As we
saw last time, this is loaded by default when we called
library(tidyverse)
.
Let’s start working towards a barplot with error bars, which will use
geom_bar()
. geom_bar()
has a slightly unusual
set of default arguments, so we will need to provide the argument
stat = "identity"
. This simply tells
geom_bar()
to use the values provided and not to try
summing anything before plotting.
data %>%
group_by(transport, gender) %>%
summarise(mn_height = mean(height), sd_height = sd(height)) %>%
ggplot(aes(x = transport, y = mn_height, fill = transport)) +
geom_bar(stat = "identity") +
facet_wrap(~gender) +
guides(fill = FALSE)
To add error bars we now just use geom_errorbar()
data %>%
group_by(transport, gender) %>%
summarise(mn_height = mean(height), sd_height = sd(height)) %>%
ggplot(aes(x = transport, y = mn_height, fill = transport)) +
geom_bar(stat = "identity") +
geom_errorbar(
aes(ymin = mn_height - sd_height, ymax = mn_height + sd_height),
width = 0.6
) +
facet_wrap(~gender) +
guides(fill = FALSE)
Notice that geom_errorbar()
needed a lot of information,
so we spread our code out across multiple lines. This is a fairly useful
trick for making your code more readable at a later point in time.
Please only attempt this section if you have more than an hour left of the practical session. Otherwise, this is simply here as a helpful guide for future reference.
ggplot2
uses the idea of themes to control the overall
appearance. The default has white grid-lines and a grey background, with
a good alternative being theme_bw()
.
ggplot(data, aes(x = gender, y = height, fill = gender)) +
geom_boxplot() +
theme_bw()
Try some of the alternatives like theme_void()
and
theme_classic()
The theme()
function is also where you set
axis.text
, legend
and a whole raft of other
attributes. The syntax often uses elements to set an attribute,
which can take a while to come to grips with. Let’s check the help
page.
?theme
As an example, changing text within themes uses
element_text()
?element_text
Let’s change the text on the x-axis of our last plot. Hopefully this gives you enough ideas on how to customise your own plots.
ggplot(data, aes(x = gender, y = height, fill = gender)) +
geom_boxplot() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
hjust = 1
means right alignment of the text in the
horizontal direction, whilst vjust = 0.5
means centre
alignment in the vertical direction. Both these values take vales
between 0 & 1.
We can also move the legend to multiple places:
ggplot(data, aes(x = gender, y = height, fill = gender)) +
geom_boxplot() +
theme_bw() +
theme(legend.position = "bottom")
Or we can use co-ordinates to place it inside the plotting region.
This method treats the plot area as a grid of
width = height = 1
.
ggplot(data, aes(x = gender, y = height, fill = gender)) +
geom_boxplot() +
theme_bw() +
theme(legend.position = c(0.85, 0.15))
The scale of axes or fill colours can also be set manually to be reverse or on the log10 scale.
ggplot(data, aes(x = gender, y = height, fill = gender)) +
geom_boxplot() +
theme_bw() +
theme(legend.position = c(0.85, 0.15)) +
scale_y_continuous(limits = c(1.5, 2)) +
scale_fill_manual(values = c("grey70", "lightblue"))
In the last line of the above, we’ve also set the colour manually. Once you get used to this plotting approach, it’s very easy to customise and make great looking plots.