rmarkdown
rmarkdown
is a cohesive way to
knitr
is the engine behind thisAll practical sessions were written this way
We can output our analysis directly as:
ioslides
presentationsWe never need to use MS Word, Excel or Powerpoint again!
The file suffix is .Rmd
, and these files allow us to
include normal text alongside embedded R
code.. An
rmarkdown
file can:
All output will be generated as a well-formatted, complete file, which can be simply re-generated at any time.
Let’s create our first rmarkdown
document
File
drop-down menu in RStudioRMarkdownTutorial.Rmd
There are quite a few features of note here. Starting at the top, a
header section is contained between the two sets of
---
lines on line 1 and line 6
Editing the YAML header is beyond the scope of this course, however
using this we can set custom .css
files, load LaTeX
packages and control numerous output parameters
Immediately following the header (lines 8 to 10) is a code
chunk
r
(before any commas). These are highly advisable & can make it far
easier to navigate through your document.Line 12 is a Section Heading, starting with ##
Tools
>
Global Options
> RMarkdown
that needs to be
set.)Check the help for a guide to the syntax.
Help > Markdown Quick Reference
The default format is an html_document
& we can
change this later. Generate the default document by clicking
Knit HTML
A preview window will appear with the compiled report
summary(cars)
temperature
Vs. pressure
has
been embeddedecho = FALSE
We could also export this as an MS Word document by clicking the
small ‘down’ arrow next to the word Knit
. By default, this
will be Read-Only, but can be helpful for sharing with
collaborators.
Saving as a .PDF
usually requires an installation of
\(\LaTeX\), so we’ll ignore that for
now.
Now we can modify the code to create our own analysis.
PlantGrowth
dataset which comes with
R
?PlantGrowth
First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let’s add a section header for our analysis to start the report
# Data Description
after the header and after
leaving a blank linePlants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group,
with a total of 30.
Can we get that information from the data itself?
We know that the code nrow(PlantGrowth)
would give the
total number of samples. We can embed this in our data
description!
nrow(PlantGrowth)
`Required Packages
library(tidyverse)
.Hint: You can create an empty code chunk using
Ctrl+Alt+I
This has loaded all of the core tidyverse
packages for
all code chunks within the whole document. All subsequent code chunks
can now use any functions in these packages.
Notice that loading the tidyverse
packages gave us an
overly informative message. We can turn this off by editing the chunk
header.
r
at the start of the code chunk, add a
commamessage
and use the auto-complete
feature to set message = FALSE
After our description, we could also have a look at the data in a summary. Add the following in a code chunk.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(), Mean = mean(weight))
(Recompile…)
To change this table into a nicely formatted one, we can use the
package pander
pander
into the workspace by placing
library(pander)
below library(tidyverse)
Then head back to the code chunk and add the following.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(), Mean = mean(weight)) %>%
pander(caption = "Sample Sizes and average weights for each group")
(Recompile…)
The package pander
is great for formatting
R
output as well, including vectors. Add the following line
to your data description:
“The three groups are classified as `
r
pander(levels(PlantGrowth$group))
`”
R
to be a categorical
variable. R
stores these as a data-type called a factor,
and the different values it can take are known as levels
.
Here we have three different categories (or treatments) which represent
the control and two treatments.
We can use ggplot2
for this
library(tidyverse)
geom_boxplot()
group
variableggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
geom_boxplot() +
theme_bw() +
labs(x = "Treatment Group", y = "Dried Weight (g)")
Here we can fit a simple linear regression using:
weight
as the response variablegroup
as the predictor variableTo fit a linear model in R
:
lm()
In the following line of code, we are saying the the
weight
measurements are dependent on the group
each plant belongs to (weight ~ group
). In R
we can usually read the ~
symbol as depends on (or
is a function of).
model_fit <- lm(weight ~ group, data = PlantGrowth)
We can view the summary()
or anova()
for a
given model using this functions on the saved model.
summary(model_fit)
anova(model_fit)
To place these as formatted tables in the text we can also use
pander()
, which has a default format setting for these
types of objects.
model_fit %>% summary() %>% pander()
model_fit %>% anova() %>% pander()
You can change the default captions if you like using the argument
caption = "my caption"
inside the pnder()
function.
After you’re happy with the way your analysis looks
Session Info
R
command
sessionInfo()
sessionInfo()
inside the
pander()
function to provide yet another nicely formatted
section in our report.So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful
R
and
other documentsR
analysis