Survival PLS and Classification for metabolic data
Survival PLS and Classification for metabolic data
The function performs partial least squares (PLS) and principal component regression on Metabolomics matrix and fit Cox proportional hazard model with covariates using the first PLS scores as covariates.
Survival: A vector of survival time with length equals to number of subjects
Mdata: A large or small metabolic profile matrix. A matrix with metabolic profiles where the number of rows should be equal to the number of metabolites and number of columns should be equal to number of patients.
Censor: A vector of censoring indicator
Reduce: A boolean parameter indicating if the metabolic profile matrix should be reduced, default is TRUE and larger metabolic profile matrix is reduced by supervised pca approach and first pca is extracted from the reduced matrix to be used in the classifier.
Select: Number of metabolites (default is 15) to be selected from supervised PCA. This is valid only if th argument Reduce=TRUE
Prognostic: A dataframe containing possible prognostic(s) factor and/or treatment effect to be used in the model.
Plots: A boolean parameter indicating if the plots should be shown. Default is FALSE
Quantile: The cut off value for the classifier, default is the median cutoff
Returns
A object is returned with the following values - Survfit: The cox proportional regression result using the first PCA
Riskscores: A vector of risk scores which is equal to the number of patents.
Riskgroup: The classification of the subjects based on the PCA into low or high risk group
pc1: The First PCA scores based on either the reduced Metabolite matrix or the full matrix
KMplot: The Kaplan-Meier survival plot of the riskgroup
SurvBPlot: The distribution of the survival in the riskgroup
Riskpca: The plot of Risk scores vs first PCA
Details
This function reduces larger metabolomics matrix to smaller version using supervised pca approach. The function performs the PLS on the reduced metabolomics matrix and fit Cox proportional hazard model with first PLS scores as a covariate afterwards. And classifier is then built based on the first PLS scores multiplied by its estimated regression coefficient. Patients are classified using median of the risk scores. The function can also perform grid analysis where the grid will be several quantiles but the default is median. This function can handle single and multiple metabolites. Prognostic factors can also be included to enhance classification.
Examples
## FIRSTLY SIMULATING A METABOLIC SURVIVAL DATAData = MSData(nPatients =100, nMet =150, Prop =0.5)## USING THE FUNCTIONResult = SurvPlsClass(Survival=Data$Survival, Mdata=t(Data$Mdata),Censor=Data$Censor, Reduce =FALSE, Select =150,Prognostic = Data$Prognostic, Plots =FALSE, Quantile =0.5)## GETTING THE SURVIVAL REGRESSION OUTPUTResult$SurvFit
## GETTING THE RISKSCORESResult$Riskscores
### GETTING THE RISKGROUPResult$Riskgroup
### OBTAINING THE FIRST PRINCIPAL COMPONENT SCORESResult$pc1