The basics

When to use

  • Reports

  • Slides

  • Manuscripts / books

  • Simple websites

Why to use

  • R code and interpretations integrated into a single document

  • Separates task of reporting the results from formatting the results:

    • decreases risk of copy-paste errors

    • decreases workload

  • Quickly create the same document in different formats, e.g. slides to show and handouts for the audience

whatever format you want to create: html, pdf, docx, …

image source:

pandoc: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another

image source:

md: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.

image source:

knitr: an R package for creating reports directly in R. Will translate your R markdown document (Rmd), including embedded R code, to a plain markdown document

image source:

Rmd: file type recognized by RStudio. This is where everything goes: your header, R code chunks, and your content written in markdown

image source:

From RStudio, go to File > New File > R Markdown...

Choose your document type

Get a template

“YAML” Header

Write R code in chunks

Write plain text

Knit your document to see the final product

This is what my code chunk looks like for the previous slide:

```{r, out.width = paste0(image_scaler*400,"px"), echo = F}

  • Markdown tries to focus on simplicity
  • Chunk options control how the chunk is evaluated and used
  • You can knit the same document to different formats (sometimes easy to do, sometimes requires a bit of finagling)
  • Consider using in-line chunks instead of hard-coding results

Use Markdown to tell your story

Early code chunk

If you name a variable in an earlier code chunk, you can refer to it again in a later chunk.

x <- rnorm(20)
y <- 3 * x + rnorm(length(x))
foo = tibble(x = x, y = y)

Later code chunk

ggplot(data = foo) + 
  geom_point(aes(x, y))


## # A tibble: 20 × 2
##          x      y
##      <dbl>  <dbl>
##  1  0.967   2.73 
##  2  0.596   0.734
##  3  0.271   1.71 
##  4 -2.56   -7.36 
##  5  1.21    2.15 
##  6  1.33    4.43 
##  7  1.56    4.38 
##  8  0.427   2.52 
##  9 -1.43   -5.18 
## 10  0.641   2.55 
## 11 -0.257  -1.76 
## 12 -0.521  -1.84 
## 13  1.40    5.24 
## 14  1.01    3.73 
## 15 -0.513  -1.82 
## 16  0.0642  0.551
## 17  0.168  -0.462
## 18  0.0676  0.248
## 19  2.08    6.66 
## 20  2.37    8.81

Tables using ‘kable’

x y
0.96679 2.73453
0.59581 0.73431
0.27123 1.70752
-2.56319 -7.36126
1.21407 2.15357
1.33225 4.43215
1.55682 4.38447
0.42744 2.52272
-1.43141 -5.18110
0.64062 2.54757
-0.25691 -1.76152
-0.52123 -1.83766
1.39877 5.24033
1.00760 3.73099
-0.51299 -1.82300
0.06421 0.55076
0.16755 -0.46174
0.06756 0.24757
2.08213 6.66245
2.37490 8.81108

Random lessons I’ve learned

Markdown can be really, really finicky about horizontal and vertical spacing

If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak

If experimenting with a new feature, re-knit frequently


If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE is both useful and dangerous.

```{r, cache = TRUE}
# some intensive task

As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…

  • Open the file caching_mishap.Rmd and make sure you understand the intended behavior (should be trivial!)

  • Knit the document

  • Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 0) and leaving the second chunk alone

  • Re-knit your document

That’s how we get results like this:

x <- rnorm(n = 1, mean = 0)
## [1] 100.71

What happened

We invalidated the cache in the first chunk (triggering it to run again) without invalidating the cache in the second chunk (so it was left alone)

Possible solutions

  • Consider if the chunks should be combined

  • You can invalidate a cache by adding a comment character (#) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will invalidate the cache

  • Go to Knit > Clear Knitr Cache... or delete directly the folder ending in [filename]_cache in your working directory

The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.

Section 11.4, RMarkdown Cookbook

knitr can run code in other languages


  • Python

  • SQL

  • Julia

  • Stan

  • Javascript

Use ```{python} to start a python code chunk, ```{julia} to start a julia code chunk, ```{bash} to start a Shell script, etc.

You may need external language engines to successfully call other languages. I have not used this functionality before.

see Chapter 2.7, R Markdown: The Definitive Guide

More practice

You can knit R scripts!

You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K

  • Use #' to indicate a switch to markdown

  • Use #+ to start a new chunk

Embedding html tags into your markdown

<iframe src=""></iframe> 


Data analyses in R

readr package

readr gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:

The workhorse of the readr package is read_csv, which reads a comma-separated value (csv) file into R as a data.frame From the help page:

read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, 
skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(), 
skip_empty_rows = TRUE)

Typical use is my_data <- read_csv("my_files_path.csv")

Digression: How does read_csv know where to look?

Get your current working directory in R:

## [1] "/Users/philb/Desktop/Work/BDSI2021/markdown-workshop"

How did R know this was my desired working directory? Why did I not need to do this:


R Projects and working directories

Your working directory is automatically set when you work inside an R project.

To create an R project, go to File > New Project... > Existing Directory and choose your folder

To open an existing R project, find the .Rproj file or go to File > Open Project...

Mouse xenograft study

  • \(n=37\) mice implanted with human tumor
  • Randomized to one of three treatment groups (radiation only; drug only; or both drug and radiation) or no treatment
  • Each tumor on each mouse measured daily for up to 4 weeks
  • Available at American Statistical Association’s Section on Teaching of Statistics in the Health Sciences (TSHS) data portal
  • File is called tumor_growth.csv

Varna, Bertheau, and Legrès (2014)

(tumor_growth <- read_csv("tumor_growth.csv"))
## # A tibble: 574 × 5
##    Grp   Group    ID   Day   Size
##    <chr> <dbl> <dbl> <dbl>  <dbl>
##  1 1.CTR     1   101     0   41.8
##  2 1.CTR     1   101     3   85  
##  3 1.CTR     1   101     4  114  
##  4 1.CTR     1   101     5  162. 
##  5 1.CTR     1   101     6  178. 
##  6 1.CTR     1   101     7  325  
##  7 1.CTR     1   101    10  624. 
##  8 1.CTR     1   101    11  648. 
##  9 1.CTR     1   101    12  836. 
## 10 1.CTR     1   101    13 1030. 
## # … with 564 more rows

Varna, Mariana, Philippe Bertheau, and Luc G Legrès. 2014. “Tumor Microenvironment in Human Tumor Xenografted Mouse Models.” Journal of Analytical Oncology 3 (3): 159–66.