Rmd

Everything you see in the course website has been created with R relying on .Rmd files. What you see here is an Rmd file which combines

  1. text with a simplified format
  2. R chunks which can be executed independently

Rmd files can be “compiled” in various forms (html, doc, pdf, md) and are extremely handy.

Pipes

In almost all my code I rely on pipes (%>%). The pipe operator has been introduced in R (as part of the magrittr package) to make the code easier to read in presence of nested functions.

library(magrittr)

## consider for example
a <- 10
b <- sqrt(log(a+2)) ## this is a trivial nested function ...

## with pipes I could write the same as 
b1 <- a %>% add(2) %>% log(.) %>% sqrt(.)

## the "dot" represent what is coming from the pipe

You can imagine the pipe operator just like a pipe, which connects two functions passing the output of the first to the input of the second.

This way of programming is particularly efficient when you want to do “data carpentry” on tabular data

Tibbles

Tibbles are the new version of R “data.frames”. You can think to them as the analogous of Excel tables. Tibbles are the workhorse of the data analyst since almost all the data we will be digesting are organized as tables.

An interesting feature of tibbles is that they can contain columns of tables. The advantage of that will be clear during the demos. Just as an appetizer

library(tidyverse)

## here I create a dummy table
my_first_tibble <- tibble(names = c("One","Two"),
                          counts = c(1,2))

my_first_tibble
## # A tibble: 2 × 2
##   names counts
##   <chr>  <dbl>
## 1 One        1
## 2 Two        2

and now I add a column of tables …

## this is a dummy matrix to be used just as an example
a_matrix <- matrix(1:4, nrow = 2)

## I'm adding here a 2 element column which combines twice the matrix
my_first_tibble$tables <- list(a_matrix,a_matrix)

## I'm doing that with the data carpentry machinery
my_first_tibble <- my_first_tibble %>% 
  add_column(other_tables = list(a_matrix,a_matrix))


my_first_tibble
## # A tibble: 2 × 4
##   names counts tables        other_tables 
##   <chr>  <dbl> <list>        <list>       
## 1 One        1 <int [2 × 2]> <int [2 × 2]>
## 2 Two        2 <int [2 × 2]> <int [2 × 2]>

Now you see that my table contains a column made of matrices

purrr and cycles

In many cases you would like to apply some sort of function iteratively along a table. The standard way of doing this ion many programming languages is to use for loops. Unfortunately, for loops can be really slow in R. To work around this problem. base r provides a full set of apply functions (e.g. apply,lapply,sapply,vapply …). Modern programming paradigms are making life easier using the purrr package

The purrr package (which is part of tidyverse ) allows you to recursively apply operations to data stored in tabular form.

BTW: Do you know how to write R functions???

Suppose, for example, that we want to calculate the determinant of the previous matrices

## in R there is a specific function called "determinat" to do that
?determinant

If my data are organized in tabular form …the determinant of the two matrices can be calculated in a very efficient and organized way as follows

my_first_tibble <- 
  my_first_tibble %>% 
  mutate(determinant = map(tables, function(x) determinant(x))) %>% 
  mutate(determinant1 = map(tables, ~ determinant(.x)))

my_first_tibble
## # A tibble: 2 × 6
##   names counts tables        other_tables  determinant determinant1
##   <chr>  <dbl> <list>        <list>        <list>      <list>      
## 1 One        1 <int [2 × 2]> <int [2 × 2]> <det>       <det>       
## 2 Two        2 <int [2 × 2]> <int [2 × 2]> <det>       <det>

In words …

  1. take my_first_tibble
  2. pipe it to a function called mutate, which creates a new column called determinant.
  3. the content of determinant is created by maping the R function determinant to the objects stored in the tables column
## this is the determinant of the first matrix
my_first_tibble$determinant[[1]]
## $modulus
## [1] 0.6931472
## attr(,"logarithm")
## [1] TRUE
## 
## $sign
## [1] -1
## 
## attr(,"class")
## [1] "det"

Homework

  1. Create a tibble that contains 100 random sampling from a non normal distribution (I suggest to you uniform or a lognormal distribution).
  2. Show that the means of the 100 samples shows a normal distribution regardless the shape of the distribution we are drawing from

To draw from different distributions:

This is a brute force demonstration of the central limit theorem