rmarkdown is a cohesive way to
knitr is the engine behind thisAll practical sessions were written this way
We can output our analysis directly as:
ioslides presentationsWe never need to use MS Word, Excel or Powerpoint again!
The file suffix is .Rmd, and these files allow us to
include normal text alongside embedded R code.. An
rmarkdown file can:
All output will be generated as a well-formatted, complete file, which can be simply re-generated at any time.
Let’s create our first rmarkdown document
File drop-down menu in RStudioRMarkdownTutorial.RmdThere are quite a few features of note here. Starting at the top, a
header section is contained between the two sets of
--- lines on line 1 and line 6
Editing the YAML header is beyond the scope of this course, however
using this we can set custom .css files, load LaTeX
packages and control numerous output parameters
Immediately following the header (lines 8 to 10) is a code
chunk
r
(before any commas). These are highly advisable & can make it far
easier to navigate through your document.Line 12 is a Section Heading, starting with ##
Tools >
Global Options > RMarkdown that needs to be
set.)Check the help for a guide to the syntax.
Help > Markdown Quick Reference
The default format is an html_document & we can
change this later. Generate the default document by clicking
Knit HTML
A preview window will appear with the compiled report
summary(cars)temperature Vs. pressure has
been embeddedecho = FALSEWe could also export this as an MS Word document by clicking the
small ‘down’ arrow next to the word Knit. By default, this
will be Read-Only, but can be helpful for sharing with
collaborators.
Saving as a .PDF usually requires an installation of
\(\LaTeX\), so we’ll ignore that for
now.
Now we can modify the code to create our own analysis.
PlantGrowth dataset which comes with
RFirst we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let’s add a section header for our analysis to start the report
# Data Description after the header and after
leaving a blank linePlants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group,
with a total of 30.
Can we get that information from the data itself?
We know that the code nrow(PlantGrowth) would give the
total number of samples. We can embed this in our data
description!
nrow(PlantGrowth)`Required Packageslibrary(tidyverse).Hint: You can create an empty code chunk using
Ctrl+Alt+I
This has loaded all of the core tidyverse packages for
all code chunks within the whole document. All subsequent code chunks
can now use any functions in these packages.
Notice that loading the tidyverse packages gave us an
overly informative message. We can turn this off by editing the chunk
header.
r at the start of the code chunk, add a
commamessage and use the auto-complete
feature to set message = FALSEAfter our description, we could also have a look at the data in a summary. Add the following in a code chunk.
(Recompile…)
To change this table into a nicely formatted one, we can use the
package pander
pander into the workspace by placing
library(pander) below library(tidyverse)Then head back to the code chunk and add the following.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(), Mean = mean(weight)) %>%
pander(caption = "Sample Sizes and average weights for each group")(Recompile…)
The package pander is great for formatting
R output as well, including vectors. Add the following line
to your data description:
“The three groups are classified as `
rpander(levels(PlantGrowth$group))`”
R to be a categorical
variable. R stores these as a data-type called a factor,
and the different values it can take are known as levels.
Here we have three different categories (or treatments) which represent
the control and two treatments.
We can use ggplot2 for this
library(tidyverse)geom_boxplot()group variableHere we can fit a simple linear regression using:
weight as the response variablegroup as the predictor variableTo fit a linear model in R:
lm()In the following line of code, we are saying the the
weight measurements are dependent on the group
each plant belongs to (weight ~ group). In R
we can usually read the ~ symbol as depends on (or
is a function of).
We can view the summary() or anova() for a
given model using this functions on the saved model.
To place these as formatted tables in the text we can also use
pander(), which has a default format setting for these
types of objects.
You can change the default captions if you like using the argument
caption = "my caption" inside the pnder()
function.
After you’re happy with the way your analysis looks
Session InfoR command
sessionInfo()sessionInfo() inside the
pander() function to provide yet another nicely formatted
section in our report.So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful
R and
other documentsR analysis