library(xcms)
library(tidyverse)
library(plotly)

Introduction

The objective of this demo is to make you familiar with the characteristics of LC-MS metabolomics data (both MS1 and MS2) by using two packages belonging to the xcms ecosystem.

Since its first release in 2006, xcms has been steadily growing as one of the standards for the analysis of LC(GC)-MS-(MS) metabolomics data.

In the last years a huge effort has been made to set up a full xcms centric ecosystem which now allows to:

… so many thanks to the people that are working on that!

The Data

For our demo we will use a subset of a dataset on wines which have been recently acquired at the Fondazione E. Mach in Italy. The dataset has been designed for training purposes and includes MS and DDA (MS2 e MS3) (in positive and negative polarity) acquisition of a 20 wines with the following characteristics:

The acquisition sequences included blanks, pooled QC and samples. Al data were acquired with a HPLC-LTQ-Orbitrap instrument. A set of standard injections is also available to illustrate the annotation strategy.

Reading one injection

To understand how the data behaves, let’s start reading in the data coming from one injection

raw_one <- readMSData(
  files = "../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
  msLevel. = c(1,2), ## we read only MS1 and MS2
  mode = "onDisk")  ## with this parameter the data are not loaded into RAM

Let’s look to the object

raw_one
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.44 Mb
## - - - Spectra data - - -
##  MS level(s): 1 2 
##  Number of spectra: 1092 
##  MSn retention times: 0:01 - 12:00 minutes
## - - - Processing information - - -
## Data loaded [Mon Nov 13 17:17:34 2023] 
## Filter: select MS level(s) 1 2. [Mon Nov 13 17:17:34 2023] 
##  MSnbase version: 2.28.1 
## - - - Meta data  - - -
## phenoData
##   rowNames: x016_X_QC_X_4_NEG_DDA.mzML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   x016_X_QC_X_4_NEG_DDA.mzML 
## protocolData: none
## featureData
##   featureNames: F1.S0001 F1.S0002 ... F1.S1637 (1092 total)
##   fvarLabels: fileIdx spIdx ... spectrum (35 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

We see …

It is important that along the process one can be able to visualize the raw data, so let’s give a look to the structure of the R object we created.

Let’s first get the retention times where spectra were collected

## method to extract the retention times
rtime(raw_one)[1:20]
## F1.S0001 F1.S0002 F1.S0004 F1.S0005 F1.S0007 F1.S0008 F1.S0010 F1.S0011 
##   0.6727   1.0629   1.8932   2.3157   3.1473   3.5692   4.4018   4.8222 
## F1.S0013 F1.S0014 F1.S0016 F1.S0017 F1.S0019 F1.S0020 F1.S0022 F1.S0023 
##   5.6518   6.0737   6.9058   7.3267   8.1741   8.5963   9.4257   9.8563 
## F1.S0025 F1.S0026 F1.S0028 F1.S0029 
##  10.6962  11.1141  11.9442  12.3656

The variable names should be read as “F(ile)x.S(can)000y”. Since we are dealing with a DDA experiment, the data contains a combination of full scan and fragmentation spectra, so each one of the previous Scans should show either the full family of ions coming from the source, or what is produced by fragmenting a specific precursor

## this gives the type of the corresponding 10 scans
msLevel(raw_one)[1:10]
## F1.S0001 F1.S0002 F1.S0004 F1.S0005 F1.S0007 F1.S0008 F1.S0010 F1.S0011 
##        1        2        1        2        1        2        1        2 
## F1.S0013 F1.S0014 
##        1        2

As you see, they are interleaved.

Full Scans

Let’s now get out the full scans

## I'm telling xcms to filter only scans acquired in "Full scan mode"
fs_raw <- raw_one %>% filterMsLevel(1)

fs_raw
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.23 Mb
## - - - Spectra data - - -
##  MS level(s): 1 
##  Number of spectra: 546 
##  MSn retention times: 0:01 - 11:60 minutes
## - - - Processing information - - -
## Data loaded [Mon Nov 13 17:17:34 2023] 
## Filter: select MS level(s) 1 2. [Mon Nov 13 17:17:34 2023] 
## Filter: select MS level(s) 1. [Mon Nov 13 17:17:34 2023] 
##  MSnbase version: 2.28.1 
## - - - Meta data  - - -
## phenoData
##   rowNames: x016_X_QC_X_4_NEG_DDA.mzML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   x016_X_QC_X_4_NEG_DDA.mzML 
## protocolData: none
## featureData
##   featureNames: F1.S0001 F1.S0004 ... F1.S1636 (546 total)
##   fvarLabels: fileIdx spIdx ... spectrum (35 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

We already now how the rt will look like, but what about mz and intensity

rt <- rtime(fs_raw)
mz <- mz(fs_raw)
I <-  intensity(fs_raw)

And their structure …

glimpse(mz[1:4])
## List of 4
##  $ F1.S0001: num [1:367] 51.4 53.6 53.7 53.7 53.7 ...
##  $ F1.S0004: num [1:346] 53.1 53.7 53.7 53.7 53.7 ...
##  $ F1.S0007: num [1:385] 51.2 53.5 53.7 53.7 53.7 ...
##  $ F1.S0010: num [1:382] 50.9 51.4 51.6 52.2 53.7 ...
glimpse(I[1:4])
## List of 4
##  $ F1.S0001: num [1:367] 5796 4898 14954 12601 18832 ...
##  $ F1.S0004: num [1:346] 5549 9187 10296 10868 9178 ...
##  $ F1.S0007: num [1:385] 5259 5481 4976 17168 22202 ...
##  $ F1.S0010: num [1:382] 5210 6492 5209 5071 15069 ...

So both object are lists: for every scan we have a vector of mz and a vector of I: a spectrum. We have 3D data!

Plot by hand

plot(mz[[1]], I[[1]], type = "h")

Plot with xcms

The first spectrum can be extracted by standard list subsetting

s1 <- fs_raw[[1]]
s1
## Object of class "Spectrum1"
##  Retention time: 0:01 
##  MSn level: 1 
##  Total ion count: 367 
##  Polarity: 0

So a method for this type of object is already in place

plot(s1)

This can be also made interactive with a little (and inelegant) trick

ggplotly(plot(s1))

This is an alternative 2D visualization of the ion map

ggplotly(tibble(rt = rt, mz = mz, I = I)  %>% 
  unnest(c("mz","I")) %>%
  filter(mz > 284 & mz < 300) %>% 
  filter(rt > 200 & rt < 300) %>% 
  ggplot() + 
  geom_point(aes(x = rt, y = mz, col = log10(I), size = I)) + 
  scale_color_viridis_c() + 
  theme_light())

As we have discussed in the presentation, the different metabolites present in the sample will show-up as peaks in their ion chromatograms, basically because when more molecule of a specific type are reaching the ionization source, more ions associated to that molecule are produced.

The previous map shows that the ion around m/z 295.0445 could show an interesting profile over time …

# extract the chromatogram
chr_raw <- chromatogram(fs_raw, 
                        mz = 295.0445 + 0.01*c(-1, 1), 
                        rt = 250 + 60*c(-1, 1))

chr_raw
## MChromatograms with 1 row and 1 column
##      x016_X_QC_X_4_NEG_DDA.mzML
##                  <Chromatogram>
## [1,]                 length: 87
## phenoData with 1 variables
## featureData with 5 variables

As before we can get out the data and manually plot the trace

plot(rtime(chr_raw[1,1]), intensity(chr_raw[1,1]), type = "b")

Note: the structure of the Mchromatograms is “matrix like” because it is designed to work best with many ions and many files.

Just to show what I mean let’s read in the data coming from two different injections

two_raw <- readMSData(
  files = c("../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
            "../data/wines/x020_wht_gewurz_A_1_NEG_DDA.mzML"),
  msLevel. = 1, ## we read only MS1 and MS2
  mode = "onDisk")  ## with this parameter the data are not loaded into RAM

As before we “slice” an extracted ion chromatogram

# extract the chromatogram
chr_raw_2 <- chromatogram(two_raw, 
                        mz = 295.0445 + 0.01*c(-1, 1), 
                        rt = 250 + 60*c(-1, 1))

chr_raw_2
## MChromatograms with 1 row and 2 columns
##      x016_X_QC_X_4_NEG_DDA.mzML x020_wht_gewurz_A_1_NEG_DDA.mzML
##                  <Chromatogram>                   <Chromatogram>
## [1,]                 length: 87                       length: 88
## phenoData with 1 variables
## featureData with 5 variables

Here we could extract the rt and intensity for the separate files, but we can plot them with a handy xcms method

The plot can be also obtained by a specific method, which is also giving some additional info

plot(chr_raw_2)

Here we see a clear difference between the two traces, looking to the names we see that one datafile is a QC, while the second is one injection of a white wine (wht)… food for brains ! ;-)

Note The previous plot shows a typical characteristic of Orbitrap (and in general FT) instruments: the signal off peaks is almost invariably zero. As we will see this can be a problem in presence of missing data, since no “real” noise can be integrated there.

DIY