- Associate Professor in Biostatistics
- Former PhD student at Michigan Biostatistics
- R user and aficionado
https://github.com/psboonstra/markdown-workshop
BDSI 2023; University of Michigan
Reports
Slides
Manuscripts / books
Simple websites
R code and interpretations integrated into a single document
Separates task of reporting the results from formatting the results:
decreases risk of copy-paste errors
decreases workload
Quickly create the same document in different formats, e.g. slides to show and handouts for the audience
source: rstudio.com
whatever format you want to create: html, pdf, docx, …
image source: rstudio.com
pandoc
: “an open-source document converter” (wikipedia). Translates markup from one type of format, e.g. markdown, to another
image source: rstudio.com
md
: a document written in markdown, “a lightweight markup language with plain text formatting syntax” (wikipedia). Github also uses markdown.
image source: rstudio.com
knitr
: an R package for creating reports directly in R. Will translate your R markdown document (Rmd), including embedded R code, to a plain markdown document
image source: rstudio.com
Rmd: file type recognized by RStudio. This is where everything goes: your header, R code chunks, and your content written in markdown
image source: rstudio.com
From RStudio, go to File
> New File
> R Markdown...
This is what my code chunk looks like for the previous slide:
```{r, out.width = paste0(image_scaler*400,"px"), echo = F} include_graphics("images/knit_menu.png") ```
.RProj
file01-exercise.Rmd
01-exercise.Rmd
Complete the tasks in 01-exercise.Rmd
.
08:00
If you name a variable in an earlier code chunk, you can refer to it again in a later chunk.
x <- rnorm(20) y <- 3 * x + rnorm(length(x)) foo = tibble(x = x, y = y)
library(ggplot2) ggplot(data = foo) + geom_point(aes(x, y))
foo
## # A tibble: 20 × 2 ## x y ## <dbl> <dbl> ## 1 2.05 7.38 ## 2 0.660 3.73 ## 3 -0.344 1.03 ## 4 -0.121 0.0591 ## 5 -0.0144 -0.709 ## 6 0.930 2.83 ## 7 0.00650 1.12 ## 8 0.654 2.24 ## 9 -0.0899 -1.86 ## 10 -0.477 -2.51 ## 11 -1.63 -4.74 ## 12 0.254 0.449 ## 13 0.858 1.11 ## 14 1.47 4.31 ## 15 1.82 4.23 ## 16 -1.06 -3.48 ## 17 -0.392 -2.99 ## 18 -0.589 -3.33 ## 19 -0.269 -0.0198 ## 20 -3.54 -9.32
kable(foo)
x | y |
---|---|
2.04893 | 7.37650 |
0.66041 | 3.72549 |
-0.34450 | 1.02643 |
-0.12072 | 0.05913 |
-0.01436 | -0.70902 |
0.92964 | 2.82717 |
0.00650 | 1.11691 |
0.65366 | 2.23591 |
-0.08988 | -1.86226 |
-0.47742 | -2.50652 |
-1.62941 | -4.73572 |
0.25356 | 0.44936 |
0.85841 | 1.10619 |
1.47220 | 4.31446 |
1.81825 | 4.23217 |
-1.06355 | -3.48341 |
-0.39198 | -2.98898 |
-0.58903 | -3.32540 |
-0.26925 | -0.01978 |
-3.54323 | -9.31959 |
If something (a new header option, a code chunk, etc) is not working as you expect, try adding an additional linebreak
If experimenting with a new feature, re-knit frequently
If, like me, you become a compulsive re-knitter, the code chunk option cache = TRUE
is both useful and dangerous.
```{r, cache = TRUE}
# some intensive task
```
As long as you don’t change anything in the chunk, you won’t need to re-run the intensive task upon re-knitting. However, things can go awry…
Open the file caching_mishap.Rmd
and make sure you understand the intended behavior (should be trivial!)
Knit the document
Now edit your first chunk, changing to x <- rnorm(n = 1, mean = 0)
and leaving the second chunk alone
Re-knit your document
That’s how we get results like this:
x <- rnorm(n = 1, mean = 0)
x
## [1] 100.71
We invalidated the cache in the first chunk (triggering it to run again) without invalidating the cache in the second chunk (so it was left alone)
Consider if the chunks should be combined
You can invalidate a cache by adding a comment character (#
) at the end of a line, or making some other innocuous change to your chunk. Even extra white space will invalidate the cache
Go to Knit
> Clear Knitr Cache...
or delete directly the folder ending in [filename]_cache
in your working directory
#RMarkdown question: If I cache the chunk that loads the R packages, then I sometimes (not always) get random errors from downstream chunks that can't find a loaded function, as if it wasn't loaded. Un-cache-ing the chunk seems to fix the problem. Anyone else encounter this?
— Philip Boonstra ((psboonstra?)) February 9, 2021
The cache mechanism is not to be used with code chunk that have side effect used by other chunks - loading package that are used by other chunk is one of this. See more in :https://t.co/bpKPzkFMad
— Christophe Dervieux ((chrisderv?)) February 10, 2021
If you activate cache globally you need to se cache=FALSE
The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.
knitr
can run code in other languagesIncluding
Python
SQL
Julia
Stan
Javascript
Use ```{python}
to start a python code chunk, ```{julia}
to start a julia code chunk, ```{bash}
to start a Shell script, etc.
You may need external language engines to successfully call other languages. I have not used this functionality before.
You are not limited to using Markdown in Rmd files – you can knit R scripts using the same shortcut: Cmd+Shift+K / Ctrl+Shift+K
Use #'
to indicate a switch to markdown
Use #+
to start a new chunk
Open 02-exercise.R
and complete the tasks.
08:00
<iframe src="https://isitchristmas.com/"></iframe>
yields
readr
packagePart of the tidyverse
(along with dplyr
and ggplot2
):
readr
gives you tools to read in data from files outside R, wrangled and manipulated, and then written to files outside R:
The workhorse of the readr
package is read_csv
, which reads a comma-separated value (csv
) file into R as a data.frame
From the help page:
read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress(), skip_empty_rows = TRUE)
Typical use is my_data <- read_csv("my_files_path.csv")
read_csv
know where to look?Get your current working directory in R:
getwd()
## [1] "/Users/philb/Desktop/Work/BDSI2023/markdown-workshop"
How did R know this was my desired working directory? Why did I not need to do this:
setwd("/Users/philb/Desktop/Work/BDSI2023/markdown-workshop")
Your working directory is automatically set when you work inside an R project.
To create an R project, go to File
> New Project...
> Existing Directory
and choose your folder
To open an existing R project, find the .Rproj
file or go to File
> Open Project...
tumor_growth.csv
Varna, Bertheau, and Legrès (2014)
(tumor_growth <- read_csv("tumor_growth.csv"))
## # A tibble: 574 × 5 ## Grp Group ID Day Size ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 1.CTR 1 101 0 41.8 ## 2 1.CTR 1 101 3 85 ## 3 1.CTR 1 101 4 114 ## 4 1.CTR 1 101 5 162. ## 5 1.CTR 1 101 6 178. ## 6 1.CTR 1 101 7 325 ## 7 1.CTR 1 101 10 624. ## 8 1.CTR 1 101 11 648. ## 9 1.CTR 1 101 12 836. ## 10 1.CTR 1 101 13 1030. ## # ℹ 564 more rows
Open 03-exercise.Rmd
and complete the tasks.
15:00
R Markdown: The definitive guide
Varna, Mariana, Philippe Bertheau, and Luc G Legrès. 2014. “Tumor Microenvironment in Human Tumor Xenografted Mouse Models.” Journal of Analytical Oncology 3 (3): 159–66.