BDSI 2022; University of Michigan

About me

  • Associate Professor in Biostatistics
  • Erstwhile PhD student at Michigan Biostatistics
  • R user and aficionado

The basics

When to use

  • Reports

  • Slides

  • Manuscripts / books

  • Simple websites

Why to use

  • R code and interpretations integrated into a single document

  • Separates task of reporting the results from formatting the results:

    • decreases risk of copy-paste errors

    • decreases workload

  • Quickly create the same document in different formats, e.g. slides to show and handouts for the audience

whatever format you want to create: html, pdf, docx, …

image source: rstudio.com

pandoc: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another

image source: rstudio.com

md: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.

image source: rstudio.com

knitr: an R package for creating reports directly in R. Will translate your R markdown document (Rmd), including embedded R code, to a plain markdown document

image source: rstudio.com

Rmd: file type recognized by RStudio. This is where everything goes: your header, R code chunks, and your content written in markdown

image source: rstudio.com

From RStudio, go to File > New File > R Markdown...

Choose your document type

Get a template

“YAML” Header

Write R code in chunks

Write plain text

Knit your document to see the final product

Knit your document to see the final product

This is what my code chunk looks like for the previous slide:

```{r, out.width = paste0(image_scaler*400,"px"), echo = F}
include_graphics("images/knit_menu.png")
```

Try it out: Option 1

Try it out: Option 2

Your turn

Complete the tasks in 01-exercise.Rmd. Answer the following question when you are done

08:00

Takeaways

  • Markdown tries to focus on simplicity
  • Chunk options control how the chunk is evaluated and used
  • You can knit the same document to different formats (sometimes easy to do, sometimes requires a bit of finagling)
  • Consider using in-line chunks instead of hard-coding results

Use Markdown to tell your story

Early code chunk

If you name a variable in an earlier code chunk, you can refer to it again in a later chunk.

x <- rnorm(20)
y <- 3 * x + rnorm(length(x))
foo = tibble(x = x, y = y)

Later code chunk

library(ggplot2)
ggplot(data = foo) + 
  geom_point(aes(x, y))

Tables

foo
## # A tibble: 20 × 2
##          x      y
##      <dbl>  <dbl>
##  1  0.967   2.73 
##  2  0.596   0.734
##  3  0.271   1.71 
##  4 -2.56   -7.36 
##  5  1.21    2.15 
##  6  1.33    4.43 
##  7  1.56    4.38 
##  8  0.427   2.52 
##  9 -1.43   -5.18 
## 10  0.641   2.55 
## 11 -0.257  -1.76 
## 12 -0.521  -1.84 
## 13  1.40    5.24 
## 14  1.01    3.73 
## 15 -0.513  -1.82 
## 16  0.0642  0.551
## 17  0.168  -0.462
## 18  0.0676  0.248
## 19  2.08    6.66 
## 20  2.37    8.81

Tables using ‘kable’

kable(foo)
x y
0.96679 2.73453
0.59581 0.73431
0.27123 1.70752
-2.56319 -7.36126
1.21407 2.15357
1.33225 4.43215
1.55682 4.38447
0.42744 2.52272
-1.43141 -5.18110
0.64062 2.54757
-0.25691 -1.76152
-0.52123 -1.83766
1.39877 5.24033
1.00760 3.73099
-0.51299 -1.82300
0.06421 0.55076
0.16755 -0.46174
0.06756 0.24757
2.08213 6.66245
2.37490 8.81108

Random lessons I’ve learned

Markdown can be really, really finicky about horizontal and vertical spacing

If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak

If experimenting with a new feature, re-knit frequently

Caching

If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE is both useful and dangerous.

```{r, cache = TRUE}
# some intensive task
```

As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…

  • Open the file caching_mishap.Rmd and make sure you understand the intended behavior (should be trivial!)

  • Knit the document

  • Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 0) and leaving the second chunk alone

  • Re-knit your document

That’s how we get results like this:

x <- rnorm(n = 1, mean = 0)
x
## [1] 100.71

What happened

We invalidated the cache in the first chunk (triggering it to run again) without invalidating the cache in the second chunk (so it was left alone)

Possible solutions

  • Consider if the chunks should be combined

  • You can invalidate a cache by adding a comment character (#) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will invalidate the cache

  • Go to Knit > Clear Knitr Cache... or delete directly the folder ending in [filename]_cache in your working directory

The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.

Section 11.4, RMarkdown Cookbook

knitr can run code in other languages

Including

  • Python

  • SQL

  • Julia

  • Stan

  • Javascript

Use ```{python} to start a python code chunk, ```{julia} to start a julia code chunk, ```{bash} to start a Shell script, etc.

You may need external language engines to successfully call other languages. I have not used this functionality before.

see Chapter 2.7, R Markdown: The Definitive Guide

More practice

You can knit R scripts!

You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K

  • Use #' to indicate a switch to markdown

  • Use #+ to start a new chunk

Your turn again

Open 02-exercise.R and complete the tasks. Answer the following question when you are done

08:00

Embedding html tags into your markdown

<iframe src="https://isitchristmas.com/"></iframe> 

yields

Data analyses in R

readr package

readr gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:

The workhorse of the readr package is read_csv, which reads a comma-separated value (csv) file into R as a data.frame From the help page:

read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, 
skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(), 
skip_empty_rows = TRUE)

Typical use is my_data <- read_csv("my_files_path.csv")

Digression: How does read_csv know where to look?

Get your current working directory in R:

getwd()
## [1] "/Users/philb/Desktop/Work/BDSI2021/markdown-workshop"

How did R know this was my desired working directory? Why did I not need to do this:

setwd("/Users/philb/Desktop/Work/BDSI2021/markdown-workshop")

R Projects and working directories

Your working directory is automatically set when you work inside an R project.

To create an R project, go to File > New Project... > Existing Directory and choose your folder

To open an existing R project, find the .Rproj file or go to File > Open Project...

Mouse xenograft study

  • \(n=37\) mice implanted with human tumor
  • Randomized to one of three treatment groups (radiation only; drug only; or both drug and radiation) or no treatment
  • Each tumor on each mouse measured daily for up to 4 weeks
  • Available at American Statistical Association’s Section on Teaching of Statistics in the Health Sciences (TSHS) data portal
  • File is called tumor_growth.csv

Varna, Bertheau, and Legrès (2014)

(tumor_growth <- read_csv("tumor_growth.csv"))
## # A tibble: 574 × 5
##    Grp   Group    ID   Day   Size
##    <chr> <dbl> <dbl> <dbl>  <dbl>
##  1 1.CTR     1   101     0   41.8
##  2 1.CTR     1   101     3   85  
##  3 1.CTR     1   101     4  114  
##  4 1.CTR     1   101     5  162. 
##  5 1.CTR     1   101     6  178. 
##  6 1.CTR     1   101     7  325  
##  7 1.CTR     1   101    10  624. 
##  8 1.CTR     1   101    11  648. 
##  9 1.CTR     1   101    12  836. 
## 10 1.CTR     1   101    13 1030. 
## # … with 564 more rows

One more time

Open 03-exercise.Rmd and complete the tasks. Answer the following question if desired

15:00

What to do next

References

Varna, Mariana, Philippe Bertheau, and Luc G Legrès. 2014. “Tumor Microenvironment in Human Tumor Xenografted Mouse Models.” Journal of Analytical Oncology 3 (3): 159–66.