MSPredictLDA program performs classification of patient for cancer/normal case using the mass-spectrum data and CA125 marker level.
LDA analysis of mass spectra data.
One of the perspective applications of MS data is using them for prognosis of disease. The problem can be formulated as follows: Identify peaks in serum MS data that changes their intensity in the case of disease and such that this changes can be detectable as early as possible. It was shown recently, that the information contained in mass spectra, in combination with the level of tumor marker serum CA125 useful for early detection of ovarian cancer .
MS data processing can be used to solve this task.
Once the peaks in different spectra are identified, they can be aligned over each other that allows to reveal the presence of common peaks in these spectra.
The Softberry SMS program package allows to perform all these steps of analysis and output the set of spectral data in a single table. In this table rows correspond to samples, each column correspond to MS intensity for the peak groups identified at the preprocessing steps.
The table can also contain additional information (that can be passed to table from additional files). For example for each sample Patient ID, time of sampling, patient status (cancer or non-cancer) can be added. Additional patient parameters that can be used for prognosis can be also added to the table as well. For example, it is known, that tumor marker serum CA125 is useful for early detection of ovarian cancer , also in combination with mass spectra data .
In this example we demonstrate that mass spectra data along with the CA125 level can be applicable to classify MS samples for cancer and non-cancer with high precision value. We used data from the paper of Gammerman et al . These data represents set of mass spectrum data for serum samples taken from patients with up to 7 years prior to the cancer detection (18 patients, 75 samples). The data contain also control samples that were taken from the healthy women. The number of control samples is 154.
In this work we considered control samples as a general pool of healthy people. We did not take into account the time the control sample was taken.
We tested the hypothesis, whether linear combination of CA125 level and peak intensities from MS data can be useful to separate serum from cancer patients from non-cancer control samples.
To solve this task we applied Linear Discriminate Analysis. It is used in statistics and machine learning to find a linear combination of features which characterize or separate two or more classes of objects. The resulting combination may be used as a linear classifier. In our case we have two classes of samples: cancer (class 1) and control (class 0). To find the linear classifier we used patients samples taken not later than 1 month and not earlier than 6 months before diagnosis (17 samples from class1, 154 from class 0)
We examined all the peaks from 20 from table above in combination with CA125 level to build linear classifier. We select the peak groub that delivers the best prediction performance.
For example, the best overall performance was achieved for combination of CA125 and peak group 17 (located within the min mass 2983.11, max mass 2989.592).
Prediction results for this peak shown below:
Number of samples=171 (control(0)=154;disease(1)=17) Class0 (control) (num/fract)=24/0.140351; mean_score=4.152928 Class1 (disease ) (num/fract)=147/0.859649; mean_score=-5.309040 Test results: Fraction of true predictions: 0.959064 Class 0: Fraction of true positives : 0.954545 Fraction of false negatives : 0.045455 Class 1: Fraction of true positives : 1.000000 Fraction of false negatives : 0.000000
The overall fraction of true predictions is 0.959064. Interestingly, this classifier does not misclassified any cancer sample (17 true positives from 17). This can be useful as no cancer patient can be missed by this analysis.
The change of the linear discriminant function for this classifier (CA125 + 17 group peak intensity, LDF) is shown in figure below for each cancer patient samples for all times before diagnosis (including time=0, the time of diagnosis).
 Gammerman et al, The Computer Journal, (2008)
|© 2022 www.softberry.com|