The Cls vector is reordered from lowest to highest number. The ClassNames vector and ColorSequence vectors are matched by this ordering of Cls, i.e. the lowest number gets the first color or class name.
ClassMDplot(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames =NULL, PlotLegend =TRUE,Ordering ="Columnwise", main ='MDplot for each Class', xlab ='Classes', ylab ='PDE of Data per Class', Fill ='darkblue', MinimalAmoutOfData=40, MinimalAmoutOfUniqueData=12,SampleSize=1e+05,...)
Arguments
Data: [1:n] Vector of the data to be plotted
Cls: [1:n] Vector of class identifiers of k clusters one number is the label of one cluster
ColorSequence: Optional: [1:k] vector, The sequence of colors used, Default: DataVisualizations::DefaultColorSequence
ClassNames: Optional: [1:k] named numerical vector, The names of the classes. Default: Class 1 - Class k with k beeing the number of classes
PlotLegend: Optional: Add a legent to plot. Default: TRUE)
Ordering: Optional: Ordering of Classes, please see MDplot for details)
main: Optional: Title of the plot. Default: MDplot for each Class
Fill: Optional: [1:k] Vector with the colors, the MD's are to be colored with. If only one value is given, all MD's are colored in the same color.
xlab: Optional: Title of the x axis. Default: "Classes"
ylab: Optional: Title of the y axis. Default: "Data"
MinimalAmoutOfData: Optional: numeric value defining a threshold. Below this threshold no density estimation is performed and a Jitter plot with a median line is drawn. Please see MDplot for details.
MinimalAmoutOfUniqueData: Optional: numeric value defining a threshold. Below this threshold no density estimation and statistical testing is performed and a Jitter plot is drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]).
SampleSize: Optional: numeric value defining a threshold. Above this thresholdclass-wise uniform sampling of finite cases is performed in order to shorten computation time. If required, SampleSize=n can be set to omit this procedure.
...: Further arguments that are documented in MDplot except for OnlyPlotOutput which is always true.
Returns
A List of - ClassData: The matrix [1:m,1:NoOfClasses] used to plot with the reordered Cls, rows are filled partly with NaN, m is the length of the number of data in largest class.
ggobject: The ggplot2 plot object
in mode invisible
Author(s)
Michael Thrun, Felix Pape
Examples
data(ITS)#shortcut for example if AdaptGauss not installedClassification = kmeans(ITS, centers =2)$cluster
#better approach#please download package from cran#model=AdaptGauss::AdaptGauss(ITS)#Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS,#DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights))ClassNames=c(1,2)names(ClassNames)=c("Insert name \n of Class 1","Insert name \n of Class 2")ClassMDplot(ITS,Classification,ClassNames = ClassNames)
References
Thrun, M. C., Breuer, L., & Ultsch, A. : Knowledge discovery from low-frequency stream nitrate concentrations: hydrology and biology contributions, Proc. European Conference on Data Analysis (ECDA), Paderborn, Germany, 2018.
Note
Function is still experimental because ColorSequence does not work yet, because we are unable to specify the colors in ggplot2. If someone knows a solution, please mail the maintainer of the package. Similar issue for PlotLegend.