Predict phenotypes from genome-wide SNP data based on a file of coefficients
Predict phenotypes from genome-wide SNP data based on a file of coefficients
Predict phenotypes from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.
genotypesfilename: character string: path to file containing SNP genotypes coded 0, 1, 2. See Input file formats.
betafilename: character string: path to file containing fitted coefficients. See Input file formats.
phenotypesfilename: (optional) character string: path to file in which to write out the predicted phenotypes. See Output file formats. Whether or not this argument is supplied, the fitted coefficients are also returned by the function.
verbose: Logical: If TRUE, additional information is printed to the R outupt as the code runs. Defaults to FALSE.
Input file formats
genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If the coefficients include an intercept then the first row of betafilename should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those in genotypesfilename. The format of betafilename is that of the output of linearRidgeGenotypes, meaning linearRidgeGenotypesPredict can be used to predict using coefficients fitted using linearRidgeGenotypes (see the example).
Output file format
Whether or not phenotypesfilename is provided, predicted phenotypes are returned to the R workshpace. If phenotypesfilename is provided, predicted phenotypes are written to the file specified (in addition).
phenotypesfilename:: One column, containing predicted phenotypes, one individual per row.
Returns
A vector of fitted values, the same length as the number of individuals whose data are in genotypesfilename. If phenotypesfilename is supplied, the fitted values are also written there.
References
A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]
Author(s)
Erika Cule
See Also
linearRidgeGenotypes for model fitting. logisticRidgeGenotypes and logisticRidgeGenotypesPredict for corresponding functions to fit and predict on SNP data with binary outcomes.
Examples
## Not run:genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package ="ridge")phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package ="ridge")betafile <- tempfile(pattern ="beta", fileext =".dat")beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile, betafilename = betafile)pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile, betafilename = betafile)## compare to output of linearRidgedata(GenCont)## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txtbeta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))pred_phen <- predict(beta_linearRidge)print(cbind(pred_phen_geno, pred_phen))## Delete the temporary betafileunlink(betafile)## End(Not run)