library(xcms)
library(MsExperiment)
library(tidyverse)
library(plotly)
library(Spectra)

Introduction

The objective of this demo is to make you familiar with the characteristics of LC-MS metabolomics data (both MS1 and MS2) by using two packages belonging to the xcms ecosystem.

Since its first release in 2006, xcms has been steadily growing as one of the standards for the analysis of LC(GC)-MS-(MS) metabolomics data.

In the last years a huge effort has been made to set up a full xcms centric ecosystem which now allows to:

… so many thanks to the people that are working on that!

The Data

For our demo we will use a subset of a dataset on wines which have been recently acquired at the Fondazione E. Mach in Italy. The dataset has been designed for training purposes and includes MS and DDA (MS2 e MS3) (in positive and negative polarity) acquisition of a 20 wines with the following characteristics:

The acquisition sequences included blanks, pooled QC and samples. Al data were acquired with a HPLC-LTQ-Orbitrap instrument. A set of standard injections is also available to illustrate the annotation strategy.

Reading one injection

To understand how the data behaves, let’s start reading in the data coming from one injection

raw_one <- readMsExperiment(
  spectraFiles = "../data/wines/x016_X_QC_X_4_NEG_DDA.mzML", mode = "ondisk")  

Let’s look to the object

raw_one
## Object of class MsExperiment 
##  Spectra: MS1 (546) MS2 (546) MS3 (546) 
##  Experiment data: 1 sample(s)
##  Sample data links:
##   - spectra: 1 sample(s) to 1638 element(s).

We see …

It is important that along the process one can be able to visualize the raw data, so let’s give a look to the structure of the R object we created.

Let’s first get the retention times where spectra were collected

## method to extract the retention times
rtime(raw_one)[1:20]
##  [1] 0.6727 1.0629 1.5381 1.8932 2.3157 2.7924 3.1473 3.5692 4.0451 4.4018
## [11] 4.8222 5.2973 5.6518 6.0737 6.5510 6.9058 7.3267 7.8191 8.1741 8.5963

The variable names should be read as “F(ile)x.S(can)000y”. Since we are dealing with a DDA experiment, the data contains a combination of full scan and fragmentation spectra, so each one of the previous Scans should show either the full family of ions coming from the source, or what is produced by fragmenting a specific precursor

## this gives the type of experiment of the 
msLevel(spectra(raw_one))[1:10]
##  [1] 1 2 3 1 2 3 1 2 3 1

As you see, they are interleaved.

Full Scans

Let’s now get out the full scans

## I'm telling xcms to filter only scans acquired in "Full scan mode"
fs_raw <- raw_one %>% filterMsLevel(1)
## Filter spectra
fs_raw
## Object of class MsExperiment 
##  Spectra: MS1 (546) 
##  Experiment data: 1 sample(s)
##  Sample data links:
##   - spectra: 1 sample(s) to 546 element(s).

We already now how the rt will look like, but what about mz and intensity

rt <- rtime(fs_raw)
mz <- mz(spectra(fs_raw))
I <-  intensity(spectra(fs_raw))

And their structure …

glimpse(mz[1:4])
## Formal class 'SimpleNumericList' [package "IRanges"] with 4 slots
##   ..@ elementType    : chr "numeric"
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()
##   ..@ listData       :List of 4
##   .. ..$ : num [1:367] 51.4 53.6 53.7 53.7 53.7 ...
##   .. ..$ : num [1:346] 53.1 53.7 53.7 53.7 53.7 ...
##   .. ..$ : num [1:385] 51.2 53.5 53.7 53.7 53.7 ...
##   .. ..$ : num [1:382] 50.9 51.4 51.6 52.2 53.7 ...
glimpse(I[1:4])
## Formal class 'SimpleNumericList' [package "IRanges"] with 4 slots
##   ..@ elementType    : chr "numeric"
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()
##   ..@ listData       :List of 4
##   .. ..$ : num [1:367] 5796 4898 14954 12601 18832 ...
##   .. ..$ : num [1:346] 5549 9187 10296 10868 9178 ...
##   .. ..$ : num [1:385] 5259 5481 4976 17168 22202 ...
##   .. ..$ : num [1:382] 5210 6492 5209 5071 15069 ...

So both object are lists: for every scan we have a vector of mz and a vector of I: a spectrum. We have 3D data!

Plot by hand

plot(mz[[1]], I[[1]], type = "h")

Plot with xcms

The first spectrum can be extracted by standard list subsetting

s1 <- spectra(fs_raw)[1]
s1
## MSn data (Spectra) with 1 spectra in a MsBackendMzR backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         1    0.6727         1
##  ... 33 more variables/columns.
## 
## file(s):
## x016_X_QC_X_4_NEG_DDA.mzML
## Processing:
##  Filter: select MS level(s) 1 [Wed Oct  9 16:58:39 2024]

So a method for this type of object is already in place

plotSpectra(s1)

This is an alternative 2D visualization of the ion map

raw_spectra <- spectra(fs_raw)

## we create a tibble of spectra ...

spectra_tibble <- tibble(rt = rt, 
                         mz = as.list(mz),
                         I = as.list(I))
ggplotly(spectra_tibble  %>% unnest(c(mz, I)) %>% 
  filter(mz > 284 & mz < 300) %>% 
  filter(rt > 200 & rt< 300) %>% 
  ggplot() + 
  geom_point(aes(x = rt, y = mz, col = log10(I), size = I)) + 
  scale_color_viridis_c() + 
  theme_light())

As we have discussed in the presentation, the different metabolites present in the sample will show-up as peaks in their ion chromatograms, basically because when more molecule of a specific type are reaching the ionization source, more ions associated to that molecule are produced.

The previous map shows that the ion around m/z 295.0445 could show an interesting profile over time …

# extract the chromatogram
chr_raw <- chromatogram(fs_raw, 
                        mz = 295.0445 + 0.01*c(-1, 1), 
                        rt = 250 + 60*c(-1, 1))

chr_raw
## MChromatograms with 1 row and 1 column
##                   1
##      <Chromatogram>
## [1,]     length: 87
## phenoData with 2 variables
## featureData with 4 variables

As before we can get out the data and manually plot the trace

plot(rtime(chr_raw[1,1]), intensity(chr_raw[1,1]), type = "b")

Note: the structure of the Mchromatograms is “matrix like” because it is designed to work best with many ions and many files.

Just to show what I mean let’s read in the data coming from two different injections

two_raw <- readMsExperiment(
  spectraFiles = c("../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
            "../data/wines/x020_wht_gewurz_A_1_NEG_DDA.mzML"),
  msLevel. = 1, ## we read only MS1 and MS2
  mode = "onDisk")  ## with this parameter the data are not loaded into RAM

As before we “slice” an extracted ion chromatogram

# extract the chromatogram
chr_raw_2 <- chromatogram(two_raw, 
                        mz = 295.0445 + 0.01*c(-1, 1), 
                        rt = 250 + 60*c(-1, 1))

chr_raw_2
## MChromatograms with 1 row and 2 columns
##                   1              2
##      <Chromatogram> <Chromatogram>
## [1,]     length: 87     length: 88
## phenoData with 2 variables
## featureData with 4 variables

Here we could extract the rt and intensity for the separate files, but we can plot them with a handy xcms method

The plot can be also obtained by a specific method, which is also giving some additional info

plot(chr_raw_2)

Here we see a clear difference between the two traces, looking to the names we see that one datafile is a QC, while the second is one injection of a white wine (wht)… food for brains ! ;-)

Note The previous plot shows a typical characteristic of Orbitrap (and in general FT) instruments: the signal off peaks is almost invariably zero. As we will see this can be a problem in presence of missing data, since no “real” noise can be integrated there.

DIY