Same Species Sample Contamination Detection
Feature Generation for Contamination Detection Model
Second alternative allele percentage
Annotation rate
Calculate average log-likelihood
Low depth percentage
Get the ratio of allele frequencies with a region
Get absolute value of skewness
SNV percentage
Calculate zygosity variable
Check input filename
Negative Log Likelihood
VCF Data Input
Read in input vcf data in GATK format for Contamination detection
Read in input vcf data in strelka2 format for Contamination detection
Read in input vcf data in VarDict format for Contamination detection
Read in input vcf data in VarPROWL format
Estimate Rho for Alternative Allele Frequency
Remove CNV regions within VCF files by changepoint method
Remove CNV regions within VCF files given cnv file
Same Species Sample Contamination
VCF Data Summary
Train Contamination Detection Model
Remove CNV regions within VCF files
Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.