Rt Correction and Feature definition

library(xcms)
library(tidyverse)
library(plotly)

Introduction

In the previous demo we have been dealing with peak picking, so we assume that we have now a list of chromatographic peaks which have been detected across the full set of samples.

The next step in the process is now to “match” these list of peaks in a consensus list of features which will be the variables which will be present in the final data matrix.

This process of matching is necessary to compensate:

measurement error in the m/z dimension
retention time shifts

In general, the first phenomenon is less important than the second: when you buy a very expensive mass spectrometer, the instrument producer will do its best to keep mass accuracy as high as possible. On that dimension you, basically, do binning. Chromatographic stability, on the other, end is by far less easy to control.

We start reading in the pick picked data:

load("wines.RData")

We already know that our dataset contains blank injections. They could be used to identify an exclusion list, but for now let’s take them out:

## identify non blank indices
no_blank_id <- which(raw_data$variety != "blank")
## subset the data with a specific xcms function
raw_data <- filterFile(raw_data, file = no_blank_id)
## check the metadata
pData(raw_data)

## # A tibble: 26 × 9
##    filepath              fname inj_ord color variety bottle rep   polarity mode 
##    <chr>                 <chr> <chr>   <chr> <chr>   <chr>  <chr> <chr>    <chr>
##  1 ../data/wines/x002_X… x002… x002    X     QC      X      1     NEG      DDA  
##  2 ../data/wines/x003_X… x003… x003    X     QC      X      2     NEG      DDA  
##  3 ../data/wines/x004_r… x004… x004    red   sangio  B      1     NEG      DDA  
##  4 ../data/wines/x005_r… x005… x005    red   ptnoir  A      1     NEG      DDA  
##  5 ../data/wines/x006_r… x006… x006    red   merlot  A      1     NEG      DDA  
##  6 ../data/wines/x007_w… x007… x007    wht   vermen  A      1     NEG      DDA  
##  7 ../data/wines/x008_r… x008… x008    red   cannon  A      1     NEG      DDA  
##  8 ../data/wines/x009_w… x009… x009    wht   chardo  B      1     NEG      DDA  
##  9 ../data/wines/x010_w… x010… x010    wht   chardo  A      1     NEG      DDA  
## 10 ../data/wines/x011_r… x011… x011    red   lagrei  B      1     NEG      DDA  
## # ℹ 16 more rows

Let’s now visualize the amount of retention time shift looking to the ionic trace of compound which should be present in almost all samples:

# extract the chromatogram
chr_raw <- chromatogram(raw_data, 
                        mz = 295.0445 + 0.01*c(-1, 1), 
                        rt = 250 + 20*c(-1, 1),
                        include = "none")

The include = "none" argument is used to force xcms to read the raw data, the same function can indeed be used to extract the EIC only of the ions which show a chromatrographic peak in that interval.

Find the CP in the slice:

sub_peaks <- chromPeaks(raw_data) %>% 
  as_tibble() %>% 
  filter(between(mz, 295.03,295.05)) %>% 
  filter(between(rt, 230,270))

library(RColorBrewer)
group_colors <- paste0(brewer.pal(3, "Set1"), "60")
names(group_colors) <- c("red","wht","X")
plot(chr_raw, col = group_colors[raw_data$color])
legend("topright", legend = c("red","white", "X"), col = group_colors, lty = 1)
abline(v = sub_peaks$rt, col = "red", lty = 3)

Here we have a tricky situation. The peaks are not aligned, even if the shift is really small, moreover this is actually a double feature, which is also potential biomarker for color and variety … so we should go back to peak detection and play around ;-) to be able to identify the two peaks.

Anyway, let’s try to correct the RT shift. As usual in xcms this can be done in different ways, here we will use the most simple solution (a potentially more flexible one based on dynamic time warping is also available).

What we will do is to apply this algorithm on the QCs and the extrapolate the estimated RT correction on the samples. The rationale behind this choice is the following: samples could be chemically different so it could be in principle possible to misinterpret a chemical difference (which produces a difference in RT) with a retention time shift of analytical origin.

If I use QCs to do that the chemical nature of the samples is the same so every difference is coming from analytical drifts!

The idea behind retention time correction is quite simple:

match groups of chromatographic peaks present in all samples (called house keeping peak groups)
estimate a consensus retention time
extrapolate/interpolate the retention time on regions of the RT axes where the peaks are not present (either with a linear or a loess function)

The first thing to do is to identify the house keeping groups:

The parameters here are:

sampleGroups: some peak could be missing in some sample, so a house keeping group could include less peaks than samples. The way to control that is to identify groups of samples and consider as good groups only those detected in
minFraction of at least one group
binSize is the width of the m/z slices
bw a density estimator is used to find the “correct” position of the peak inalong the RT dimension. This is the bandwith in seconds of the density estimator.

raw_data <- groupChromPeaks(
  raw_data, 
  param = PeakDensityParam(sampleGroups = raw_data$color,
                           minFraction = 2/3,           
                           binSize = 0.02,  ## width of the slices
                           bw = 3))  ## bandwidth of the density used to find the consensus position

After this grouping, the retention time correction is performed working:

raw_data <- adjustRtime(
  raw_data, 
  param = PeakGroupsParam(minFraction = 1,
                          subset = which(raw_data$variety == "QC"), span = 0.4))

## Performing retention time correction using 629 peak groups.

## Aligning sample number 3 against subset ... OK
## Aligning sample number 4 against subset ... OK
## Aligning sample number 5 against subset ... OK
## Aligning sample number 6 against subset ... OK
## Aligning sample number 7 against subset ... OK
## Aligning sample number 8 against subset ... OK
## Aligning sample number 9 against subset ... OK
## Aligning sample number 10 against subset ... OK
## Aligning sample number 11 against subset ... OK
## Aligning sample number 12 against subset ... OK
## Aligning sample number 15 against subset ... OK
## Aligning sample number 16 against subset ... OK
## Aligning sample number 17 against subset ... OK
## Aligning sample number 18 against subset ... OK
## Aligning sample number 19 against subset ... OK
## Aligning sample number 20 against subset ... OK
## Aligning sample number 21 against subset ... OK
## Aligning sample number 22 against subset ... OK
## Aligning sample number 23 against subset ... OK
## Aligning sample number 24 against subset ... OK
## Applying retention time adjustment to the identified chromatographic peaks ... OK

plotAdjustedRtime(raw_data, col = group_colors[raw_data$color])

So the retention time correction here really small, speaking of a remarkable analytical reproducibility. In general the reproducibility of the chromatography is considered good if the required shift is smaller than the typical chromatographic width. This is anyway a rule of thumb since everything is very much dependent on the analytical method. The black dots show the position of the housekeeping groups which were used to estimate the extent of retention time correction on the QCs.

After retention time correction is now time to move from chromatographic peaks to features. With this term we define a set of consensus groups of peaks which can be used reliably to measure the intensity of a given ion across all samples. If you look to the overall pre-processing from a distant perspective, what we have been basically doing is to try to compensate in a clever and intelligent way for mz inaccuracies and retention time shifts, and the core of the riddle is that the samples are actually (and luckily different).

Features are obtained by a further grouping of the chromatographic peaks after retention time correction:

raw_data <- groupChromPeaks(
  raw_data, 
  param = PeakDensityParam(sampleGroups = raw_data$color,
                           minFraction = 2/3,           
                           binSize = 0.02,  ## width of the slices
                           bw = 3))  ## bandwidth of the density used to find the consensus position

Let’s mow look top their definition

## extract them from the grouped 
features <- featureDefinitions(raw_data)
head(features)

## DataFrame with 6 rows and 12 columns
##           mzmed     mzmin     mzmax     rtmed     rtmin     rtmax    npeaks
##       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001   56.5699   56.5698   56.5699  274.4321  273.0029  275.6861         8
## FT002   59.0140   59.0139   59.0141   52.5710   51.2949   53.7408        17
## FT003   65.6779   65.6779   65.6780   61.2852   60.2421   68.9642        10
## FT004   65.6779   65.6779   65.6779   79.8001   78.4873   80.2757         9
## FT005   68.4565   68.4564   68.4565  182.2209  179.7875  183.5496        17
## FT006   68.4565   68.4564   68.4565  163.1186  161.3699  171.1034        10
##             red       wht         X            peakidx  ms_level
##       <numeric> <numeric> <numeric>             <list> <integer>
## FT001         8         0         0 1875,2539,3298,...         1
## FT002         6         6         5  405, 730,1460,...         1
## FT003         9         0         0 1546,2883,4377,...         1
## FT004         9         0         0 1545,2882,4376,...         1
## FT005         3         8         6  292,1010,1713,...         1
## FT006         2         2         6  293,1011,1714,...         1

The df is quite rich:

rownames are the feature id
mz* refer to the boundaries of the mz of the chromatographic peaks which were grouped in that feature
rt* is the same byt for retention time
npeaks tell us how many chromatographic peaks were assigned to this feature
the subsequent columns are gicing you the same info split by sample group
peakidx contains the index of peaks which were assigned to this feature

Let’s dig more on this. For example, can we visualize the position of the peaks assigned to FT005. The idx are the indexes of the row in the peak table:

idx_F005 <- features["FT005","peakidx"][[1]]

Let’s plot them across the retention time:

chromPeaks(raw_data) %>% 
  as_tibble() %>% 
  slice(idx_F005) %>% 
  ggplot() + 
  geom_density(aes(x = rt), bw = 3) + 
  geom_rug(aes(x = rt)) +
  geom_vline(xintercept = features["FT005","rtmed"], col = "red", lty = 2) + 
  xlim(140, 200) + 
  theme_bw()

Here the rug show the position of the chromatigraphic peaks, the line is the density profile (not a chromatrographic trace!) and the red line is our best “estimate” of the true position of the feature

Ok now that we have the definition of the features, let’s get the final data matrix! Mind the gap! We will need to transpose it to get it into the standard R shape.

DM <- featureValues(raw_data, value = "into") 
head(DM)

##       x002_X_QC_X_1_NEG_DDA.mzML x003_X_QC_X_2_NEG_DDA.mzML
## FT001                         NA                         NA
## FT002                   986147.4                   975746.5
## FT003                         NA                         NA
## FT004                         NA                         NA
## FT005                   947277.0                   802055.4
## FT006                   438418.3                   401059.7
##       x004_red_sangio_B_1_NEG_DDA.mzML x005_red_ptnoir_A_1_NEG_DDA.mzML
## FT001                         716179.8                          1131380
## FT002                         708724.8                               NA
## FT003                         475746.8                               NA
## FT004                         981039.5                               NA
## FT005                        1505677.8                               NA
## FT006                         614215.3                               NA
##       x006_red_merlot_A_1_NEG_DDA.mzML x007_wht_vermen_A_1_NEG_DDA.mzML
## FT001                          1035399                               NA
## FT002                           987357                               NA
## FT003                           717797                               NA
## FT004                          1070249                               NA
## FT005                               NA                          1066122
## FT006                               NA                               NA
##       x008_red_cannon_A_1_NEG_DDA.mzML x009_wht_chardo_B_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                         735192.5                               NA
## FT003                         332599.6                               NA
## FT004                         685215.9                               NA
## FT005                               NA                          1280785
## FT006                               NA                               NA
##       x010_wht_chardo_A_1_NEG_DDA.mzML x011_red_lagrei_B_1_NEG_DDA.mzML
## FT001                               NA                        1674083.6
## FT002                         631526.5                               NA
## FT003                               NA                         266367.1
## FT004                               NA                         726500.5
## FT005                        1409850.9                               NA
## FT006                               NA                               NA
##       x012_wht_moscat_A_1_NEG_DDA.mzML x013_wht_ptgrig_C_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                               NA                               NA
## FT003                               NA                               NA
## FT004                               NA                               NA
## FT005                          1324338                         900748.1
## FT006                               NA                               NA
##       x015_X_QC_X_3_NEG_DDA.mzML x016_X_QC_X_4_NEG_DDA.mzML
## FT001                         NA                         NA
## FT002                         NA                  1127542.5
## FT003                         NA                         NA
## FT004                         NA                         NA
## FT005                   952594.7                   794963.6
## FT006                   493059.5                   513929.8
##       x017_wht_ptgrig_B_1_NEG_DDA.mzML x018_red_ptnoir_B_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                         759701.4                        1010649.1
## FT003                               NA                         510139.4
## FT004                               NA                        1893196.4
## FT005                        1751762.1                        1680198.6
## FT006                         939508.5                         627432.0
##       x019_wht_sauvig_A_1_NEG_DDA.mzML x020_wht_gewurz_A_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                         895189.3                         924981.7
## FT003                               NA                               NA
## FT004                               NA                               NA
## FT005                        2058863.1                               NA
## FT006                               NA                               NA
##       x021_red_merlot_B_1_NEG_DDA.mzML x022_red_frappa_A_1_NEG_DDA.mzML
## FT001                        1721708.3                        1038064.2
## FT002                        1036225.4                        1173011.5
## FT003                         460387.9                         310784.4
## FT004                        1679054.1                         666971.4
## FT005                               NA                        1134783.9
## FT006                               NA                               NA
##       x023_wht_ptgrig_A_1_NEG_DDA.mzML x024_red_sangio_A_1_NEG_DDA.mzML
## FT001                               NA                         746794.4
## FT002                         667977.4                               NA
## FT003                               NA                        1315892.3
## FT004                               NA                        2223661.4
## FT005                               NA                               NA
## FT006                               NA                               NA
##       x025_red_lagrei_A_1_NEG_DDA.mzML x026_wht_chardo_C_1_NEG_DDA.mzML
## FT001                        2247568.5                               NA
## FT002                               NA                         743351.6
## FT003                         294771.2                               NA
## FT004                         629777.6                               NA
## FT005                               NA                         963538.4
## FT006                               NA                         291901.9
##       x028_X_QC_X_5_NEG_DDA.mzML x029_X_QC_X_6_NEG_DDA.mzML
## FT001                         NA                         NA
## FT002                   892491.9                   899920.7
## FT003                         NA                         NA
## FT004                         NA                         NA
## FT005                   847438.4                   791870.4
## FT006                   529113.8                   443991.4

Well … we see a lot of NAs there … Recalling what we discussed in the lecture, NAs can be there for two main reasons:

a compound is below the detection limit in one or more samples. No compounds, no features, no party …
our peak picking could have been mistaken somewhere

Anyway, xcms implement a clever strategy to try to fix, at least partially, the NAs. The function fillChromPeaks will take care of going back to the raw files and integrate the signal possibly present in the mz/rt region of each feature. If the peak was there and was missed by chance this will basically recover the lost signal.

raw_data_filled <- fillChromPeaks(raw_data)

## Defining peak areas for filling-in .... OK
## Start integrating peak areas from original files
## Requesting 146 peaks from x029_X_QC_X_6_NEG_DDA.mzML ... got 144.
## 
## Requesting 291 peaks from x012_wht_moscat_A_1_NEG_DDA.mzML ... got 154.
## Requesting 281 peaks from x013_wht_ptgrig_C_1_NEG_DDA.mzML ... got 153.
## Requesting 144 peaks from x015_X_QC_X_3_NEG_DDA.mzML ... got 140.
## Requesting 151 peaks from x016_X_QC_X_4_NEG_DDA.mzML ... got 149.
## Requesting 308 peaks from x017_wht_ptgrig_B_1_NEG_DDA.mzML ... got 186.
## 
## Requesting 144 peaks from x002_X_QC_X_1_NEG_DDA.mzML ... got 139.
## Requesting 133 peaks from x003_X_QC_X_2_NEG_DDA.mzML ... got 133.
## Requesting 240 peaks from x004_red_sangio_B_1_NEG_DDA.mzML ... got 203.
## Requesting 325 peaks from x005_red_ptnoir_A_1_NEG_DDA.mzML ... got 236.
## Requesting 201 peaks from x006_red_merlot_A_1_NEG_DDA.mzML ... got 173.
## 
## Requesting 230 peaks from x018_red_ptnoir_B_1_NEG_DDA.mzML ... got 192.
## Requesting 316 peaks from x019_wht_sauvig_A_1_NEG_DDA.mzML ... got 204.
## Requesting 299 peaks from x020_wht_gewurz_A_1_NEG_DDA.mzML ... got 158.
## Requesting 240 peaks from x021_red_merlot_B_1_NEG_DDA.mzML ... got 207.
## Requesting 234 peaks from x022_red_frappa_A_1_NEG_DDA.mzML ... got 189.
## 
## Requesting 257 peaks from x023_wht_ptgrig_A_1_NEG_DDA.mzML ... got 147.
## Requesting 323 peaks from x024_red_sangio_A_1_NEG_DDA.mzML ... got 239.
## Requesting 368 peaks from x025_red_lagrei_A_1_NEG_DDA.mzML ... got 193.
## Requesting 242 peaks from x026_wht_chardo_C_1_NEG_DDA.mzML ... got 147.
## Requesting 130 peaks from x028_X_QC_X_5_NEG_DDA.mzML ... got 123.
## 
## Requesting 296 peaks from x007_wht_vermen_A_1_NEG_DDA.mzML ... got 190.
## Requesting 234 peaks from x008_red_cannon_A_1_NEG_DDA.mzML ... got 201.
## Requesting 292 peaks from x009_wht_chardo_B_1_NEG_DDA.mzML ... got 181.
## Requesting 264 peaks from x010_wht_chardo_A_1_NEG_DDA.mzML ... got 144.
## Requesting 376 peaks from x011_red_lagrei_B_1_NEG_DDA.mzML ... got 208.

DM <- featureValues(raw_data_filled, value = "into") 
head(DM)

##       x002_X_QC_X_1_NEG_DDA.mzML x003_X_QC_X_2_NEG_DDA.mzML
## FT001                   360508.9                  353756.56
## FT002                   986147.4                  975746.51
## FT003                   112981.2                   23360.64
## FT004                   527294.7                  588971.93
## FT005                   947277.0                  802055.38
## FT006                   438418.3                  401059.72
##       x004_red_sangio_B_1_NEG_DDA.mzML x005_red_ptnoir_A_1_NEG_DDA.mzML
## FT001                         716179.8                       1131380.07
## FT002                         708724.8                        138880.65
## FT003                         475746.8                         69369.56
## FT004                         981039.5                        344743.51
## FT005                        1505677.8                        423297.72
## FT006                         614215.3                        164831.39
##       x006_red_merlot_A_1_NEG_DDA.mzML x007_wht_vermen_A_1_NEG_DDA.mzML
## FT001                        1035399.5                               NA
## FT002                         987357.0                         454742.3
## FT003                         717797.0                               NA
## FT004                        1070248.7                               NA
## FT005                          66347.0                        1066122.2
## FT006                         289532.3                         398657.3
##       x008_red_cannon_A_1_NEG_DDA.mzML x009_wht_chardo_B_1_NEG_DDA.mzML
## FT001                         110110.3                               NA
## FT002                         735192.5                         308628.3
## FT003                         332599.6                               NA
## FT004                         685215.9                               NA
## FT005                         526072.9                        1280785.3
## FT006                         142918.2                         273601.2
##       x010_wht_chardo_A_1_NEG_DDA.mzML x011_red_lagrei_B_1_NEG_DDA.mzML
## FT001                               NA                       1674083.64
## FT002                         631526.5                         88090.12
## FT003                               NA                        266367.13
## FT004                               NA                        726500.53
## FT005                        1409850.9                         24040.09
## FT006                         249928.5                               NA
##       x012_wht_moscat_A_1_NEG_DDA.mzML x013_wht_ptgrig_C_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                          1119334                         724985.1
## FT003                               NA                               NA
## FT004                               NA                               NA
## FT005                          1324338                         900748.1
## FT006                               NA                         139998.1
##       x015_X_QC_X_3_NEG_DDA.mzML x016_X_QC_X_4_NEG_DDA.mzML
## FT001                  579689.22                  273061.99
## FT002                  575473.90                 1127542.49
## FT003                   43828.02                   77739.66
## FT004                  330076.75                  525037.37
## FT005                  952594.73                  794963.58
## FT006                  493059.49                  513929.76
##       x017_wht_ptgrig_B_1_NEG_DDA.mzML x018_red_ptnoir_B_1_NEG_DDA.mzML
## FT001                               NA                         389920.3
## FT002                         759701.4                        1010649.1
## FT003                               NA                         510139.4
## FT004                               NA                        1893196.4
## FT005                        1751762.1                        1680198.6
## FT006                         939508.5                         627432.0
##       x019_wht_sauvig_A_1_NEG_DDA.mzML x020_wht_gewurz_A_1_NEG_DDA.mzML
## FT001                               NA                               NA
## FT002                         895189.3                         924981.7
## FT003                               NA                               NA
## FT004                               NA                               NA
## FT005                        2058863.1                               NA
## FT006                         536513.6                               NA
##       x021_red_merlot_B_1_NEG_DDA.mzML x022_red_frappa_A_1_NEG_DDA.mzML
## FT001                       1721708.27                        1038064.2
## FT002                       1036225.43                        1173011.5
## FT003                        460387.90                         310784.4
## FT004                       1679054.14                         666971.4
## FT005                        120163.61                        1134783.9
## FT006                         88149.44                         234804.2
##       x023_wht_ptgrig_A_1_NEG_DDA.mzML x024_red_sangio_A_1_NEG_DDA.mzML
## FT001                               NA                         746794.4
## FT002                        667977.44                         439009.6
## FT003                               NA                        1315892.3
## FT004                               NA                        2223661.4
## FT005                        130263.78                         304219.6
## FT006                         74391.78                         288818.9
##       x025_red_lagrei_A_1_NEG_DDA.mzML x026_wht_chardo_C_1_NEG_DDA.mzML
## FT001                        2247568.5                               NA
## FT002                         127789.9                         743351.6
## FT003                         294771.2                               NA
## FT004                         629777.6                               NA
## FT005                               NA                         963538.4
## FT006                               NA                         291901.9
##       x028_X_QC_X_5_NEG_DDA.mzML x029_X_QC_X_6_NEG_DDA.mzML
## FT001                   399528.8                   304736.0
## FT002                   892491.9                   899920.7
## FT003                   121386.2                   103181.5
## FT004                   422484.7                   485913.5
## FT005                   847438.4                   791870.4
## FT006                   529113.8                   443991.4

The definition of each feature can be extracted with a specific function:

feat <- featureDefinitions(raw_data_filled)
feat

## DataFrame with 734 rows and 12 columns
##           mzmed     mzmin     mzmax     rtmed     rtmin     rtmax    npeaks
##       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001   56.5699   56.5698   56.5699  274.4321  273.0029  275.6861         8
## FT002   59.0140   59.0139   59.0141   52.5710   51.2949   53.7408        17
## FT003   65.6779   65.6779   65.6780   61.2852   60.2421   68.9642        10
## FT004   65.6779   65.6779   65.6779   79.8001   78.4873   80.2757         9
## FT005   68.4565   68.4564   68.4565  182.2209  179.7875  183.5496        17
## ...         ...       ...       ...       ...       ...       ...       ...
## FT730   657.090   657.090   657.091   144.041   138.149   147.620        16
## FT731   657.090   657.090   657.091   252.483   250.670   253.659        12
## FT732   657.091   657.090   657.091   219.358   216.425   221.860        10
## FT733   657.090   657.090   657.091   132.686   125.768   137.218        12
## FT734   879.164   879.163   879.165   212.744   211.804   214.443        13
##             red       wht         X            peakidx  ms_level
##       <numeric> <numeric> <numeric>             <list> <integer>
## FT001         8         0         0 1875,2539,3298,...         1
## FT002         6         6         5  405, 730,1460,...         1
## FT003         9         0         0 1546,2883,4377,...         1
## FT004         9         0         0 1545,2882,4376,...         1
## FT005         3         8         6  292,1010,1713,...         1
## ...         ...       ...       ...                ...       ...
## FT730         8         0         6  384,1095,1808,...         1
## FT731         6         0         6  412,1137,2511,...         1
## FT732         7         0         3  382,1096,1809,...         1
## FT733         7         0         4  383,1097,1810,...         1
## FT734         7         0         6  356,1075,1791,...         1

The situation improved, but it is not yet perfect.

We now save the output which will be used for the subsequent statistical analysis and compound annotation:

save(raw_data_filled, DM, feat, file = "processed_wines_data.RData")

Checking Feature chromatograms

A typical thing one wants to DO at some point in the analysis is to check if the chromatographic peaks of a really interesting feature are really there …

In order to do that xcms provides a really handy function:

ft_chr1 <- featureChromatograms(raw_data, 
                                features = "FT005", 
                                expandRt = 15, 
                                filled = FALSE)

Now we plot it highlighting the integrated areas and the class of the samples:

sample_colors <- group_colors[raw_data$color]
plot(ft_chr1, col = group_colors[raw_data$color],
     peakBg = sample_colors[chromPeaks(ft_chr1)[, "sample"]])

That can be used to spot problems of integration, and, also, if a biomarker is really a biomarker ;-)

DIY

At the beginning of this demo we have been focussing our attention to the extracted ion trace around mz 289. Try to check how many features were defined there. Can you check if there the peak was integrated well?
If you go back to the feature tables, you will see that in some cases the number of peaks assigned to a features can be larger than the number of samples. How can this be? Try look to the extracted ion traces and understand what happens (ADVANCED)

Feature Quality Assessment

As we stressed along the full lecture, the best way to validate the outcomes of your analysis is to go back to the raw data and inspect the extracted ion traces. However it would be useful to get an overall feedback on “quality” of each feature.

xcms already implements some useful tool to do that:

head(featureSummary(raw_data))

##       count     perc multi_count multi_perc       rsd
## FT001     8 30.76923           0    0.00000 0.4176693
## FT002    17 65.38462           0    0.00000 0.1798341
## FT003     9 34.61538           1   11.11111 0.6343434
## FT004     9 34.61538           0    0.00000 0.5146304
## FT005    17 65.38462           0    0.00000 0.3213700
## FT006    10 38.46154           0    0.00000 0.3302743

Here we see:

counts: number of chromatographic peaks associated to each feature
perc: percentage of samples in which a peak was found
multi_count: how many multiple peaks you have in the data?
relative standard deviation

This can be also split by sample class:

sample_vs_qc <- ifelse(raw_data$variety == "QC","QC","S")

featureSummary(raw_data, group = sample_vs_qc, skipFilled = FALSE) %>% 
  as_tibble() %>% 
  arrange(desc(QC_rsd))

## # A tibble: 734 × 15
##    count  perc multi_count multi_perc   rsd QC_count QC_perc QC_multi_count
##    <dbl> <dbl>       <dbl>      <dbl> <dbl>    <dbl>   <dbl>          <dbl>
##  1    15  57.7           2       13.3 0.907        4    66.7              0
##  2    18  69.2           4       22.2 0.928        5    83.3              1
##  3    12  46.2           3       25   0.883        4    66.7              1
##  4    15  57.7           3       20   0.693        4    66.7              1
##  5    15  57.7           3       20   0.531        6   100                1
##  6    20  76.9          13       65   0.543        4    66.7              4
##  7    14  53.8           2       14.3 0.970        4    66.7              0
##  8     9  34.6           1       11.1 0.621        4    66.7              1
##  9    26 100            25       96.2 0.490        6   100                6
## 10    12  46.2           0        0   0.577        5    83.3              0
## # ℹ 724 more rows
## # ℹ 7 more variables: QC_multi_perc <dbl>, QC_rsd <dbl>, S_count <dbl>,
## #   S_perc <dbl>, S_multi_count <dbl>, S_multi_perc <dbl>, S_rsd <dbl>

This useful to filter features with higher variability in the QC (more than in the samples). But …

marker of variety can suffer of dilution effects on the pooled QC
variety specific QC! … but then the number of injections explodes

Rt Correction and Feature definition

Pietro Franceschi

2023-11-14

Introduction

Checking Feature chromatograms

DIY

Feature Quality Assessment