Everything you see in the course website has been created with R
relying on .Rmd
files. What you see here is an Rmd file
which combines
Rmd files can be “compiled” in various forms (html, doc, pdf, md) and are extremely handy.
In almost all my code I rely on pipes (%>%
).
The pipe operator has been introduced in R (as part of the
magrittr
package) to make the code easier to read in
presence of nested functions.
library(magrittr)
## consider for example
a <- 10
b <- sqrt(log(a+2)) ## this is a trivial nested function ...
## with pipes I could write the same as
b1 <- a %>% add(2) %>% log(.) %>% sqrt(.)
## the "dot" represent what is coming from the pipe
You can imagine the pipe operator just like a pipe, which connects two functions passing the output of the first to the input of the second.
This way of programming is particularly efficient when you want to do “data carpentry” on tabular data
Tibbles are the new version of R “data.frames”. You can think to them as the analogous of Excel tables. Tibbles are the workhorse of the data analyst since almost all the data we will be digesting are organized as tables.
An interesting feature of tibbles is that they can contain columns of tables. The advantage of that will be clear during the demos. Just as an appetizer
library(tidyverse)
## here I create a dummy table
my_first_tibble <- tibble(names = c("One","Two"),
counts = c(1,2))
my_first_tibble
## # A tibble: 2 × 2
## names counts
## <chr> <dbl>
## 1 One 1
## 2 Two 2
and now I add a column of tables …
## this is a dummy matrix to be used just as an example
a_matrix <- matrix(1:4, nrow = 2)
## I'm adding here a 2 element column which combines twice the matrix
my_first_tibble$tables <- list(a_matrix,a_matrix)
## I'm doing that with the data carpentry machinery
my_first_tibble <- my_first_tibble %>%
add_column(other_tables = list(a_matrix,a_matrix))
my_first_tibble
## # A tibble: 2 × 4
## names counts tables other_tables
## <chr> <dbl> <list> <list>
## 1 One 1 <int [2 × 2]> <int [2 × 2]>
## 2 Two 2 <int [2 × 2]> <int [2 × 2]>
Now you see that my table contains a column made of matrices
purrr
and cyclesIn many cases you would like to apply some sort of function
iteratively along a table. The standard way of doing this ion many
programming languages is to use for
loops. Unfortunately,
for loops can be really slow in R. To work around this problem. base
r
provides a full set of apply functions
(e.g. apply
,lapply
,sapply
,vapply
…). Modern programming paradigms are making life easier using the
purrr
package
The purrr
package (which is part of
tidyverse
) allows you to recursively apply operations to
data stored in tabular form.
BTW: Do you know how to write R functions???
Suppose, for example, that we want to calculate the determinant of the previous matrices
## in R there is a specific function called "determinat" to do that
?determinant
If my data are organized in tabular form …the determinant of the two matrices can be calculated in a very efficient and organized way as follows
my_first_tibble <-
my_first_tibble %>%
mutate(determinant = map(tables, function(x) determinant(x))) %>%
mutate(determinant1 = map(tables, ~ determinant(.x)))
my_first_tibble
## # A tibble: 2 × 6
## names counts tables other_tables determinant determinant1
## <chr> <dbl> <list> <list> <list> <list>
## 1 One 1 <int [2 × 2]> <int [2 × 2]> <det> <det>
## 2 Two 2 <int [2 × 2]> <int [2 × 2]> <det> <det>
In words …
my_first_tibble
mutate
, which creates a
new column called determinant.map
ing the R
function determinant
to the objects stored in the
tables column## this is the determinant of the first matrix
my_first_tibble$determinant[[1]]
## $modulus
## [1] 0.6931472
## attr(,"logarithm")
## [1] TRUE
##
## $sign
## [1] -1
##
## attr(,"class")
## [1] "det"
To draw from different distributions:
runif
for uniform distributionrlnorm
for the lognormal distributionThis is a brute force demonstration of the central limit theorem