library(xcms)
library(tidyverse)
library(plotly)
The objective of this demo is to make you familiar with the
characteristics of LC-MS metabolomics data (both MS1 and MS2) by using
two packages belonging to the xcms
ecosystem.
Since its first release in 2006, xcms
has been steadily
growing as one of the standards for the analysis of LC(GC)-MS-(MS)
metabolomics data.
In the last years a huge effort has been made to set up a full
xcms
centric ecosystem which now allows to:
… so many thanks to the people that are working on that!
For our demo we will use a subset of a dataset on wines which have been recently acquired at the Fondazione E. Mach in Italy. The dataset has been designed for training purposes and includes MS and DDA (MS2 e MS3) (in positive and negative polarity) acquisition of a 20 wines with the following characteristics:
The acquisition sequences included blanks, pooled QC and samples. Al data were acquired with a HPLC-LTQ-Orbitrap instrument. A set of standard injections is also available to illustrate the annotation strategy.
To understand how the data behaves, let’s start reading in the data coming from one injection
raw_one <- readMSData(
files = "../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
msLevel. = c(1,2), ## we read only MS1 and MS2
mode = "onDisk") ## with this parameter the data are not loaded into RAM
Let’s look to the object
raw_one
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.44 Mb
## - - - Spectra data - - -
## MS level(s): 1 2
## Number of spectra: 1092
## MSn retention times: 0:01 - 12:00 minutes
## - - - Processing information - - -
## Data loaded [Mon Nov 13 17:17:34 2023]
## Filter: select MS level(s) 1 2. [Mon Nov 13 17:17:34 2023]
## MSnbase version: 2.28.1
## - - - Meta data - - -
## phenoData
## rowNames: x016_X_QC_X_4_NEG_DDA.mzML
## varLabels: sampleNames
## varMetadata: labelDescription
## Loaded from:
## x016_X_QC_X_4_NEG_DDA.mzML
## protocolData: none
## featureData
## featureNames: F1.S0001 F1.S0002 ... F1.S1637 (1092 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
We see …
phenoData
. More on that later!featureDatra
. More on that later!It is important that along the process one can be able to visualize the raw data, so let’s give a look to the structure of the R object we created.
Let’s first get the retention times where spectra were collected
## method to extract the retention times
rtime(raw_one)[1:20]
## F1.S0001 F1.S0002 F1.S0004 F1.S0005 F1.S0007 F1.S0008 F1.S0010 F1.S0011
## 0.6727 1.0629 1.8932 2.3157 3.1473 3.5692 4.4018 4.8222
## F1.S0013 F1.S0014 F1.S0016 F1.S0017 F1.S0019 F1.S0020 F1.S0022 F1.S0023
## 5.6518 6.0737 6.9058 7.3267 8.1741 8.5963 9.4257 9.8563
## F1.S0025 F1.S0026 F1.S0028 F1.S0029
## 10.6962 11.1141 11.9442 12.3656
The variable names should be read as “F(ile)x.S(can)000y”. Since we are dealing with a DDA experiment, the data contains a combination of full scan and fragmentation spectra, so each one of the previous Scans should show either the full family of ions coming from the source, or what is produced by fragmenting a specific precursor
## this gives the type of the corresponding 10 scans
msLevel(raw_one)[1:10]
## F1.S0001 F1.S0002 F1.S0004 F1.S0005 F1.S0007 F1.S0008 F1.S0010 F1.S0011
## 1 2 1 2 1 2 1 2
## F1.S0013 F1.S0014
## 1 2
As you see, they are interleaved.
Let’s now get out the full scans
## I'm telling xcms to filter only scans acquired in "Full scan mode"
fs_raw <- raw_one %>% filterMsLevel(1)
fs_raw
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.23 Mb
## - - - Spectra data - - -
## MS level(s): 1
## Number of spectra: 546
## MSn retention times: 0:01 - 11:60 minutes
## - - - Processing information - - -
## Data loaded [Mon Nov 13 17:17:34 2023]
## Filter: select MS level(s) 1 2. [Mon Nov 13 17:17:34 2023]
## Filter: select MS level(s) 1. [Mon Nov 13 17:17:34 2023]
## MSnbase version: 2.28.1
## - - - Meta data - - -
## phenoData
## rowNames: x016_X_QC_X_4_NEG_DDA.mzML
## varLabels: sampleNames
## varMetadata: labelDescription
## Loaded from:
## x016_X_QC_X_4_NEG_DDA.mzML
## protocolData: none
## featureData
## featureNames: F1.S0001 F1.S0004 ... F1.S1636 (546 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
We already now how the rt
will look like, but what about
mz
and intensity
rt <- rtime(fs_raw)
mz <- mz(fs_raw)
I <- intensity(fs_raw)
And their structure …
glimpse(mz[1:4])
## List of 4
## $ F1.S0001: num [1:367] 51.4 53.6 53.7 53.7 53.7 ...
## $ F1.S0004: num [1:346] 53.1 53.7 53.7 53.7 53.7 ...
## $ F1.S0007: num [1:385] 51.2 53.5 53.7 53.7 53.7 ...
## $ F1.S0010: num [1:382] 50.9 51.4 51.6 52.2 53.7 ...
glimpse(I[1:4])
## List of 4
## $ F1.S0001: num [1:367] 5796 4898 14954 12601 18832 ...
## $ F1.S0004: num [1:346] 5549 9187 10296 10868 9178 ...
## $ F1.S0007: num [1:385] 5259 5481 4976 17168 22202 ...
## $ F1.S0010: num [1:382] 5210 6492 5209 5071 15069 ...
So both object are lists: for every scan we have a vector of
mz
and a vector of I
: a spectrum. We have 3D
data!
plot(mz[[1]], I[[1]], type = "h")
xcms
The first spectrum can be extracted by standard list subsetting
s1 <- fs_raw[[1]]
s1
## Object of class "Spectrum1"
## Retention time: 0:01
## MSn level: 1
## Total ion count: 367
## Polarity: 0
So a method for this type of object is already in place
plot(s1)
This can be also made interactive with a little (and inelegant) trick
ggplotly(plot(s1))
This is an alternative 2D visualization of the ion map
ggplotly(tibble(rt = rt, mz = mz, I = I) %>%
unnest(c("mz","I")) %>%
filter(mz > 284 & mz < 300) %>%
filter(rt > 200 & rt < 300) %>%
ggplot() +
geom_point(aes(x = rt, y = mz, col = log10(I), size = I)) +
scale_color_viridis_c() +
theme_light())
As we have discussed in the presentation, the different metabolites present in the sample will show-up as peaks in their ion chromatograms, basically because when more molecule of a specific type are reaching the ionization source, more ions associated to that molecule are produced.
The previous map shows that the ion around m/z 295.0445 could show an interesting profile over time …
# extract the chromatogram
chr_raw <- chromatogram(fs_raw,
mz = 295.0445 + 0.01*c(-1, 1),
rt = 250 + 60*c(-1, 1))
chr_raw
## MChromatograms with 1 row and 1 column
## x016_X_QC_X_4_NEG_DDA.mzML
## <Chromatogram>
## [1,] length: 87
## phenoData with 1 variables
## featureData with 5 variables
As before we can get out the data and manually plot the trace
plot(rtime(chr_raw[1,1]), intensity(chr_raw[1,1]), type = "b")
Note: the structure of the Mchromatograms
is “matrix
like” because it is designed to work best with many ions and many
files.
Just to show what I mean let’s read in the data coming from two different injections
two_raw <- readMSData(
files = c("../data/wines/x016_X_QC_X_4_NEG_DDA.mzML",
"../data/wines/x020_wht_gewurz_A_1_NEG_DDA.mzML"),
msLevel. = 1, ## we read only MS1 and MS2
mode = "onDisk") ## with this parameter the data are not loaded into RAM
As before we “slice” an extracted ion chromatogram
# extract the chromatogram
chr_raw_2 <- chromatogram(two_raw,
mz = 295.0445 + 0.01*c(-1, 1),
rt = 250 + 60*c(-1, 1))
chr_raw_2
## MChromatograms with 1 row and 2 columns
## x016_X_QC_X_4_NEG_DDA.mzML x020_wht_gewurz_A_1_NEG_DDA.mzML
## <Chromatogram> <Chromatogram>
## [1,] length: 87 length: 88
## phenoData with 1 variables
## featureData with 5 variables
Here we could extract the rt and intensity for the separate files,
but we can plot them with a handy xcms
method
The plot can be also obtained by a specific method, which is also giving some additional info
plot(chr_raw_2)
Here we see a clear difference between the two traces, looking to the
names we see that one datafile is a QC, while the second is one
injection of a white wine (wht
)… food for brains ! ;-)
Note The previous plot shows a typical characteristic of Orbitrap (and in general FT) instruments: the signal off peaks is almost invariably zero. As we will see this can be a problem in presence of missing data, since no “real” noise can be integrated there.