Main Content

msalign

Align peaks in signal to reference peaks

Description

IntensitiesOut = msalign(X,Intensities,RefX) aligns the peaks in raw, noisy signal data, represented by Intensities and X, to reference peaks, provided by RefX.

example

IntensitiesOut = msalign(X,Intensities,RefX,Name,Value) modifies the behavior of msalign using one or more Name=Value arguments. For example, you can have msalign take more iterations than the default five by specifying Iterations=10.

example

[IntensitiesOut,RefXOut] = msalign(X,Intensities,RefX,Name,Value) also returns RefXout, a new vector of separation-unit values to use as reference masses for aligning the peaks. RefXOut differs from RefX only when the Group name-value argument is true.

Examples

collapse all

Load the sample_lo_res file, which is included with the toolbox.

load sample_lo_res

Set the markers vector and the weights vector.

R = [3991.4 4598 7964 9160];
W = [60 100 60 100];

Display a color image of the mass spectra before alignment.

msheatmap(MZ_lo_res,Y_lo_res,'markers',R,'range',[3000 10000])
title('before alignment')

Figure contains an axes object. The axes object with title before alignment, xlabel Mass/Charge (M/Z), ylabel Spectrogram Indices contains an object of type image.

Align spectra with reference masses and display a color image of mass spectra after alignment.

YA = msalign(MZ_lo_res,Y_lo_res,R,'weights',W);
msheatmap(MZ_lo_res,YA,'markers',R,'range',[3000 10000])
title('after alignment')

Figure contains an axes object. The axes object with title after alignment, xlabel Mass/Charge (M/Z), ylabel Spectrogram Indices contains an object of type image.

Now the spectrogram displays much better vertical alignment.

If you have only one reference peak in your data, do not use the msalign function. Instead, use the following procedure, which shifts, but does not scale, the X input vector.

Load the sample_lo_res data and view the first sample spectrum.

load sample_lo_res
MZ = MZ_lo_res;
Y = Y_lo_res(:,1);
msviewer(MZ,Y)

Figure Mass Spectra Viewer contains an axes object and other objects of type uitoolbar, uipanel, uimenu. The axes object contains 2 objects of type patch, line.

Use the tall peak around 4000 m/z as the reference peak. To determine the reference peak's m/z value, click function_msalign_zoomicon.gif, and then click-drag to zoom in on the peak. Right-click in the center of the peak, and then click Add Marker to label the peak with its m/z value.

function_msalign_example2.png

Shift a spectrum by the difference between RP, the known reference mass of 4000 m/z, and SP, the experimental mass of 4051.14 m/z.

RP = 4000;
SP = 4051.14;
YOut = interp1(MZ, Y, MZ-(RP-SP));

Plot the original spectrum in red and the shifted spectrum in blue and zoom in on the reference peak.

plot(MZ,Y,'r',MZ,YOut,'b:')
xlabel('Mass/Charge (M/Z)')
ylabel('Relative Intensity')
legend('Y','YOut')
axis([3600 4800 -2 60])

Figure contains an axes object. The axes object with xlabel Mass/Charge (M/Z), ylabel Relative Intensity contains 2 objects of type line. These objects represent Y, YOut.

Input Arguments

collapse all

Separation-unit values for a set of signals with peaks, specified as a real vector. The number of elements in X equals the number of rows in the matrix Intensities. The separation unit can quantify wavelength, frequency, distance, time, or m/z, depending on the instrument that generates the signal data.

Data Types: double

Intensity values for a set of peaks that share the same separation-unit range, specified as a real matrix. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The number of rows equals the number of elements in vector X.

Data Types: double

Separation-unit values of known reference masses in a sample signal, specified as a real vector.

For reference peaks, select compounds that are not expected to have significant shifts among the different signals. For example, in mass spectrometry, select compounds that do not undergo structural transformation, such as phosphorylation. Doing so increases the accuracy of your alignment and lets you detect compounds that exhibit structural transformations among the sample signal.

Data Types: double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example:

Number of steps for the search grid, specified as a positive integer. At every iteration, the search area is divided by GridSteps^2.

Example: 15

Data Types: double

Indication to create nondefault RefXOut, specified as false (do not create a nondefault vector) or true (create a nondefault vector). When true, msalign creates RefXOut by adjusting the values in RefX, based on the sample data from multiple signals in Intensities, such that the overall shifting and scaling of the peaks is minimized.

Set Group to true only if Intensities contains data for a large number of signals, and you are not confident of the separation-unit values used for your reference peaks in RefX. Leave Group set to false if you are confident of the separation-unit values used for your reference peaks in RefX.

Example: true

Data Types: logical

Number of refining iterations, specified as a positive integer. At every iteration, the search grid is scaled down to improve the estimates.

Example: 10

Data Types: double

Range limits, specified as a two-element vector. The range limits are in separation units, relative to each peak. No peak shifts beyond these limits.

Use these values to tune the robustness of the algorithm. Ideally, you should keep the range within the maximum expected shift. If you try to correct larger shifts by increasing the limits, you increase the possibility of picking incorrect peaks to align to the reference masses.

Example: [-50 200]

Data Types: double

Flag to control the rescaling of X, specified as true or false. When false, the output signal is aligned only to the reference peaks by using constant shifts. By default, msalign estimates a rescaling factor, unless RefX contains only one reference peak.

Example: false

Data Types: logical

Search space type, specified as "regular" or "latin".

  • "regular" — Evenly spaced lattice

  • "latin" — Random Latin hypercube with GridSteps^2 samples.

Example: "latin"

Data Types: char | string

Select the display of a plot of an original and aligned signal over the reference masses specified by RefX, specified as one of the following:

  • false — Do not show a plot. The default when return values are specified.

  • true — Plot the first signal in Intensities. The default when return values are not specified.

  • positive integer — Index of the signal in Intensities to plot.

Example: 2 (plot the second signal in Intensities)

Data Types: double | logical

Relative weight for each mass in RefX, specified as a positive vector the same size as RefX. The default sets each weight to 1, which means each reference peak is weighted equally. This implies more intense reference peaks have a greater effect in the alignment algorithm. If you have a less intense reference peak, you can increase its weight to emphasize it more in the alignment algorithm.

Data Types: double

Width, in separation units, for all the Gaussian pulses used to build the correlating synthetic signal, specified as a positive scalar or function handle.

  • positive scalar — The point of the peak where the Gaussian pulse reaches 60.65% of its maximum is set to the width you specify with WidthOfPulsesValue.

  • function handle — The function is evaluated at the respective separation-unit values and returns a variable width for the pulses. The function evaluation should give reasonable values from 0 to max(abs(Range)); otherwise, the function returns an error.

Tuning the spread of the Gaussian pulses controls a tradeoff between robustness (wider pulses) and precision (narrower pulses). However, the spread of the pulses is unrelated to the shape of the observed peaks in the signal. The purpose of the pulse spread is to drive the optimization algorithm.

Data Types: double | function_handle

Scaling factor to determine window size around every alignment peak, specified as a positive scalar. The synthetic signal is compared to the sample signal only within these regions, which saves computation time. The size of the window is given in separation units by WidthOfPulsesValue * WindowSizeRatioValue. For the default value 2.5, at the limits of the window, the Gaussian pulses have a value of 4.39% of their maximum.

Example: 4

Data Types: double

Output Arguments

collapse all

Intensity values for a set of peaks that share the same separation-unit range, returned as a real matrix. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The intensity values represent a shifting and scaling of the data.

Separation-unit values of reference masses, returned as a real vector. RefXOut differs from RefX only when you set the Group name-value argument to true.

Algorithms

First, msalign creates a synthetic signal from the reference peaks using Gaussian pulses centered at the separation-unit values specified by RefX. Then, msalign shifts and scales the separation-unit scale to find the maximum alignment between the input signals and the synthetic signal. (msalign uses an iterative multiresolution grid search until it finds the best scale and shift factors for each signal.) Once msalign determines the new separation-unit scale, msalign creates the corrected signals by resampling their intensities at the original separation-unit values, creating IntensitiesOut, a vector or matrix of corrected intensity values. The resampling method preserves the shape of the peaks.

References

[1] Monchamp, P., Andrade-Cetto, L., Zhang, J.Y., and Henson, R. (2007) Signal Processing Methods for Mass Spectrometry. In Systems Bioinformatics: An Engineering Case-Based Approach, G. Alterovitz and M.F. Ramoni, eds. (Artech House Publishers).

Version History

Introduced before R2006a