Introduction to R Markdown

BDSI 2022; University of Michigan

About me

Associate Professor in Biostatistics
Erstwhile PhD student at Michigan Biostatistics
R user and aficionado

The basics

When to use

Reports
Slides
Manuscripts / books
Simple websites

Why to use

R code and interpretations integrated into a single document
Separates task of reporting the results from formatting the results:
- decreases risk of copy-paste errors
- decreases workload
Quickly create the same document in different formats, e.g. slides to show and handouts for the audience

source: rstudio.com

whatever format you want to create: html, pdf, docx, …

image source: rstudio.com

pandoc: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another

image source: rstudio.com

md: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.

image source: rstudio.com

knitr: an R package for creating reports directly in R. Will translate your R markdown document (Rmd), including embedded R code, to a plain markdown document

image source: rstudio.com

Rmd: file type recognized by RStudio. This is where everything goes: your header, R code chunks, and your content written in markdown

image source: rstudio.com

From RStudio, go to File > New File > R Markdown...

Choose your document type

Get a template

“YAML” Header

Write R code in chunks

Write plain text

Knit your document to see the final product

This is what my code chunk looks like for the previous slide:

```{r, out.width = paste0(image_scaler*400,"px"), echo = F}
include_graphics("images/knit_menu.png")
```

Try it out: Option 1

Download R (https://cran.r-project.org/)
Download RStudio to interface with R (https://www.rstudio.com/)
Go to https://github.com/psboonstra/markdown-workshop, then ‘Code’, then ‘Download ZIP’

Unzip the folder, then open the .RProj file
In RStudio, click on ‘Files’ at the bottom, and pull up 01-exercise.Rmd

Try it out: Option 2

Go to https://rstudio.cloud/ > Get Started
Create an account
Click the dropdown menu next to the New Project button, and enter the workshop URL of the workshop repository: https://github.com/psboonstra/markdown-workshop
Click on ‘Files’ at the bottom, and pull up 01-exercise.Rmd

Your turn

Complete the tasks in 01-exercise.Rmd. Answer the following question when you are done

08:00

Takeaways

Markdown tries to focus on simplicity
Chunk options control how the chunk is evaluated and used
You can knit the same document to different formats (sometimes easy to do, sometimes requires a bit of finagling)
Consider using in-line chunks instead of hard-coding results

Use Markdown to tell your story

Early code chunk

If you name a variable in an earlier code chunk, you can refer to it again in a later chunk.

x <- rnorm(20)
y <- 3 * x + rnorm(length(x))
foo = tibble(x = x, y = y)

Later code chunk

library(ggplot2)
ggplot(data = foo) + 
  geom_point(aes(x, y))

Tables

foo

## # A tibble: 20 × 2
##          x      y
##      <dbl>  <dbl>
##  1  0.967   2.73 
##  2  0.596   0.734
##  3  0.271   1.71 
##  4 -2.56   -7.36 
##  5  1.21    2.15 
##  6  1.33    4.43 
##  7  1.56    4.38 
##  8  0.427   2.52 
##  9 -1.43   -5.18 
## 10  0.641   2.55 
## 11 -0.257  -1.76 
## 12 -0.521  -1.84 
## 13  1.40    5.24 
## 14  1.01    3.73 
## 15 -0.513  -1.82 
## 16  0.0642  0.551
## 17  0.168  -0.462
## 18  0.0676  0.248
## 19  2.08    6.66 
## 20  2.37    8.81

Tables using ‘kable’

kable(foo)

x	y
0.96679	2.73453
0.59581	0.73431
0.27123	1.70752
-2.56319	-7.36126
1.21407	2.15357
1.33225	4.43215
1.55682	4.38447
0.42744	2.52272
-1.43141	-5.18110
0.64062	2.54757
-0.25691	-1.76152
-0.52123	-1.83766
1.39877	5.24033
1.00760	3.73099
-0.51299	-1.82300
0.06421	0.55076
0.16755	-0.46174
0.06756	0.24757
2.08213	6.66245
2.37490	8.81108

Random lessons I’ve learned

Markdown can be really, really finicky about horizontal and vertical spacing

If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak

If experimenting with a new feature, re-knit frequently

Caching

If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE is both useful and dangerous.

```{r, cache = TRUE}
# some intensive task
```

As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…

Open the file caching_mishap.Rmd and make sure you understand the intended behavior (should be trivial!)
Knit the document
Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 0) and leaving the second chunk alone
Re-knit your document

That’s how we get results like this:

x <- rnorm(n = 1, mean = 0)

## [1] 100.71

What happened

We invalidated the cache in the first chunk (triggering it to run again) without invalidating the cache in the second chunk (so it was left alone)

Possible solutions

Consider if the chunks should be combined
You can invalidate a cache by adding a comment character (#) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will invalidate the cache
Go to Knit > Clear Knitr Cache... or delete directly the folder ending in [filename]_cache in your working directory

#RMarkdown question: If I cache the chunk that loads the R packages, then I sometimes (not always) get random errors from downstream chunks that can't find a loaded function, as if it wasn't loaded. Un-cache-ing the chunk seems to fix the problem. Anyone else encounter this?
— Philip Boonstra ((psboonstra?)) February 9, 2021

The cache mechanism is not to be used with code chunk that have side effect used by other chunks - loading package that are used by other chunk is one of this. See more in :https://t.co/bpKPzkFMad
If you activate cache globally you need to se cache=FALSE
— Christophe Dervieux ((chrisderv?)) February 10, 2021

The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.

Section 11.4, RMarkdown Cookbook

`knitr` can run code in other languages

Including

Python
SQL
Julia
Stan
Javascript

Use ```{python} to start a python code chunk, ```{julia} to start a julia code chunk, ```{bash} to start a Shell script, etc.

You may need external language engines to successfully call other languages. I have not used this functionality before.

see Chapter 2.7, R Markdown: The Definitive Guide

More practice

You can knit R scripts!

You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K

Use #' to indicate a switch to markdown
Use #+ to start a new chunk

Your turn again

Open 02-exercise.R and complete the tasks. Answer the following question when you are done

08:00

Embedding html tags into your markdown

<iframe src="https://isitchristmas.com/"></iframe>

yields

Data analyses in R

`readr` package

Part of the tidyverse (along with dplyr and ggplot2):

https://www.tidyverse.org/

readr gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:

The workhorse of the readr package is read_csv, which reads a comma-separated value (csv) file into R as a data.frame From the help page:

read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, 
skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(), 
skip_empty_rows = TRUE)

Typical use is my_data <- read_csv("my_files_path.csv")

Digression: How does `read_csv` know where to look?

Get your current working directory in R:

getwd()

## [1] "/Users/philb/Desktop/Work/BDSI2021/markdown-workshop"

How did R know this was my desired working directory? Why did I not need to do this:

setwd("/Users/philb/Desktop/Work/BDSI2021/markdown-workshop")

R Projects and working directories

Your working directory is automatically set when you work inside an R project.

To create an R project, go to File > New Project... > Existing Directory and choose your folder

To open an existing R project, find the .Rproj file or go to File > Open Project...

Mouse xenograft study

\(n=37\) mice implanted with human tumor
Randomized to one of three treatment groups (radiation only; drug only; or both drug and radiation) or no treatment
Each tumor on each mouse measured daily for up to 4 weeks
Available at American Statistical Association’s Section on Teaching of Statistics in the Health Sciences (TSHS) data portal
File is called tumor_growth.csv

Varna, Bertheau, and Legrès (2014)

(tumor_growth <- read_csv("tumor_growth.csv"))

## # A tibble: 574 × 5
##    Grp   Group    ID   Day   Size
##    <chr> <dbl> <dbl> <dbl>  <dbl>
##  1 1.CTR     1   101     0   41.8
##  2 1.CTR     1   101     3   85  
##  3 1.CTR     1   101     4  114  
##  4 1.CTR     1   101     5  162. 
##  5 1.CTR     1   101     6  178. 
##  6 1.CTR     1   101     7  325  
##  7 1.CTR     1   101    10  624. 
##  8 1.CTR     1   101    11  648. 
##  9 1.CTR     1   101    12  836. 
## 10 1.CTR     1   101    13 1030. 
## # … with 564 more rows

One more time

Open 03-exercise.Rmd and complete the tasks. Answer the following question if desired

15:00

What to do next

https://rmarkdown.rstudio.com/

R Markdown on RStudio.com

R Markdown: The definitive guide

Free, online version of a book written by the RStudio experts

R Markdown cheatsheet

Helpful quick reference

Mastering markdown

Reference site for markdown langauge

Project-oriented workflow

The benefits of working in self-contained projects

References

Varna, Mariana, Philippe Bertheau, and Luc G Legrès. 2014. “Tumor Microenvironment in Human Tumor Xenografted Mouse Models.” Journal of Analytical Oncology 3 (3): 159–66.

About me

The basics

When to use

Why to use

Choose your document type

Get a template

“YAML” Header

Write R code in chunks

Write plain text

Knit your document to see the final product

Knit your document to see the final product

Try it out: Option 1

Try it out: Option 2

Your turn

Takeaways

Use Markdown to tell your story

Early code chunk

Later code chunk

Tables

Tables using ‘kable’

Random lessons I’ve learned

Markdown can be really, really finicky about horizontal and vertical spacing

Caching

What happened

Possible solutions

knitr can run code in other languages

More practice

You can knit R scripts!

Your turn again

Embedding html tags into your markdown

Data analyses in R

readr package

Digression: How does read_csv know where to look?

R Projects and working directories

Mouse xenograft study

One more time

What to do next

References

`knitr` can run code in other languages

`readr` package

Digression: How does `read_csv` know where to look?