Nominating Quality Control Outliers in Genomic Profiling Studies
Sum across sign corrected z-scores for total sample quality score
Corrects the z-scores signs according to the metrics
Calculate an outlier cutoff using cosine similarity
Tests the accumulated quality scores for outliers using cosine similar...
Fits the QC data to distributions and returns the KS test result and B...
Generates the standard barplot of scores for each sample
Generates the standard heatmap of scores for each sample.
Generates the multipanel plot of heatmap and barplot
Calculate z-scores for each metric across each sample
A method that analyzes quality control metrics from multi-sample genomic sequencing studies and nominates poor quality samples for exclusion. Per sample quality control data are transformed into z-scores and aggregated. The distribution of aggregated z-scores are modelled using parametric distributions. The parameters of the optimal model, selected either by goodness-of-fit statistics or user-designation, are used for outlier nomination. Two implementations of the Cosine Similarity Outlier Detection algorithm are provided with flexible parameters for dataset customization.