library(xcms)
library(MsExperiment)
library(tidyverse)
library(plotly)
library(Spectra)
The objective of this demo is to make you familiar with the
characteristics of LC-MS metabolomics data (both MS1 and MS2) by using
two packages belonging to the xcms ecosystem.
Since its first release in 2006, xcms has been steadily
growing as one of the standards for the analysis of LC(GC)-MS-(MS)
metabolomics data.
In the last years a huge effort has been made to set up a full
xcms centric ecosystem which now allows to:
… so many thanks to the people that are working on that!
For our demo we will use a subset of a dataset on wines which have been recently acquired at the Fondazione E. Mach in Italy. The dataset has been designed for training purposes and includes MS and DDA (MS2 e MS3) (in positive and negative polarity) acquisition of a 20 wines with the following characteristics:
The acquisition sequences included blanks, pooled QC and samples. Al data were acquired with a HPLC-LTQ-Orbitrap instrument. A set of standard injections is also available to illustrate the annotation strategy.
To understand how the data behaves, let’s start reading in the data coming from one injection
raw_one <- readMsExperiment(
spectraFiles = "../data/wines/x016_X_QC_X_4_NEG_DDA.mzML", mode = "ondisk")
Let’s look to the object
raw_one
## Object of class MsExperiment
## Spectra: MS1 (546) MS2 (546) MS3 (546)
## Experiment data: 1 sample(s)
## Sample data links:
## - spectra: 1 sample(s) to 1638 element(s).
We see …
phenoData. More on that later!featureDatra. More on that later!It is important that along the process one can be able to visualize the raw data, so let’s give a look to the structure of the R object we created.
Let’s first get the retention times where spectra were collected
## method to extract the retention times
rtime(raw_one)[1:20]
## [1] 0.6727 1.0629 1.5381 1.8932 2.3157 2.7924 3.1473 3.5692 4.0451 4.4018
## [11] 4.8222 5.2973 5.6518 6.0737 6.5510 6.9058 7.3267 7.8191 8.1741 8.5963
The variable names should be read as “F(ile)x.S(can)000y”. Since we are dealing with a DDA experiment, the data contains a combination of full scan and fragmentation spectra, so each one of the previous Scans should show either the full family of ions coming from the source, or what is produced by fragmenting a specific precursor
## this gives the type of experiment of the
msLevel(spectra(raw_one))[1:10]
## [1] 1 2 3 1 2 3 1 2 3 1
As you see, they are interleaved.
Let’s now get out the full scans
## I'm telling xcms to filter only scans acquired in "Full scan mode"
fs_raw <- raw_one %>% filterMsLevel(1)
## Filter spectra
fs_raw
## Object of class MsExperiment
## Spectra: MS1 (546)
## Experiment data: 1 sample(s)
## Sample data links:
## - spectra: 1 sample(s) to 546 element(s).
We already now how the rt will look like, but what about
mz and intensity
rt <- rtime(fs_raw)
mz <- mz(spectra(fs_raw))
I <- intensity(spectra(fs_raw))
And their structure …
glimpse(mz[1:4])
## Formal class 'SimpleNumericList' [package "IRanges"] with 4 slots
## ..@ elementType : chr "numeric"
## ..@ elementMetadata: NULL
## ..@ metadata : list()
## ..@ listData :List of 4
## .. ..$ : num [1:367] 51.4 53.6 53.7 53.7 53.7 ...
## .. ..$ : num [1:346] 53.1 53.7 53.7 53.7 53.7 ...
## .. ..$ : num [1:385] 51.2 53.5 53.7 53.7 53.7 ...
## .. ..$ : num [1:382] 50.9 51.4 51.6 52.2 53.7 ...
glimpse(I[1:4])
## Formal class 'SimpleNumericList' [package "IRanges"] with 4 slots
## ..@ elementType : chr "numeric"
## ..@ elementMetadata: NULL
## ..@ metadata : list()
## ..@ listData :List of 4
## .. ..$ : num [1:367] 5796 4898 14954 12601 18832 ...
## .. ..$ : num [1:346] 5549 9187 10296 10868 9178 ...
## .. ..$ : num [1:385] 5259 5481 4976 17168 22202 ...
## .. ..$ : num [1:382] 5210 6492 5209 5071 15069 ...
So both object are lists: for every scan we have a vector of
mz and a vector of I: a spectrum. We have 3D
data!
plot(mz[[1]], I[[1]], type = "h")
xcmsThe first spectrum can be extracted by standard list subsetting
s1 <- spectra(fs_raw)[1]
s1
## MSn data (Spectra) with 1 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.6727 1
## ... 33 more variables/columns.
##
## file(s):
## x016_X_QC_X_4_NEG_DDA.mzML
## Processing:
## Filter: select MS level(s) 1 [Wed Oct 9 16:58:39 2024]
So a method for this type of object is already in place
plotSpectra(s1)
This is an alternative 2D visualization of the ion map
raw_spectra <- spectra(fs_raw)
## we create a tibble of spectra ...
spectra_tibble <- tibble(rt = rt,
mz = as.list(mz),
I = as.list(I))
ggplotly(spectra_tibble %>% unnest(c(mz, I)) %>%
filter(mz > 284 & mz < 300) %>%
filter(rt > 200 & rt< 300) %>%
ggplot() +
geom_point(aes(x = rt, y = mz, col = log10(I), size = I)) +
scale_color_viridis_c() +
theme_light())
As we have discussed in the presentation, the different metabolites present in the sample will show-up as peaks in their ion chromatograms, basically because when more molecule of a specific type are reaching the ionization source, more ions associated to that molecule are produced.
The previous map shows that the ion around m/z 295.0445 could show an interesting profile over time …
# extract the chromatogram
chr_raw <- chromatogram(fs_raw,
mz = 295.0445 + 0.01*c(-1, 1),
rt = 250 + 60*c(-1, 1))
chr_raw
## MChromatograms with 1 row and 1 column
## 1
## <Chromatogram>
## [1,] length: 87
## phenoData with 2 variables
## featureData with 4 variables
As before we can get out the data and manually plot the trace
plot(rtime(chr_raw[1,1]), intensity(chr_raw[1,1]), type = "b")
Note: the structure of the Mchromatograms is “matrix
like” because it is designed to work best with many ions and many
files.
Just to show what I mean let’s read in the data coming from two different injections
two_raw <- readMsExperiment(
spectraFiles = c("../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
"../data/wines/x020_wht_gewurz_A_1_NEG_DDA.mzML"),
msLevel. = 1, ## we read only MS1 and MS2
mode = "onDisk") ## with this parameter the data are not loaded into RAM
As before we “slice” an extracted ion chromatogram
# extract the chromatogram
chr_raw_2 <- chromatogram(two_raw,
mz = 295.0445 + 0.01*c(-1, 1),
rt = 250 + 60*c(-1, 1))
chr_raw_2
## MChromatograms with 1 row and 2 columns
## 1 2
## <Chromatogram> <Chromatogram>
## [1,] length: 87 length: 88
## phenoData with 2 variables
## featureData with 4 variables
Here we could extract the rt and intensity for the separate files,
but we can plot them with a handy xcms method
The plot can be also obtained by a specific method, which is also giving some additional info
plot(chr_raw_2)
Here we see a clear difference between the two traces, looking to the
names we see that one datafile is a QC, while the second is one
injection of a white wine (wht)… food for brains ! ;-)
Note The previous plot shows a typical characteristic of Orbitrap (and in general FT) instruments: the signal off peaks is almost invariably zero. As we will see this can be a problem in presence of missing data, since no “real” noise can be integrated there.