data: data frame or vector which contains the data.
nbins: number of bins (= levels).
labels: character vector of labels for the resulting category.
method: character string specifying the binning method, see 'Details'; can be abbreviated.
na.omit: logical value whether instances with missing values should be removed.
Returns
A data frame or vector.
Details
Character strings and logical strings are coerced into factors. Matrices are coerced into data frames. When called with a single vector only the respective factor (and not a data frame) is returned. Method "length" gives intervals of equal length, method "content" gives intervals of equal content (via quantiles). Method "clusters" determins "nbins" clusters via 1D kmeans with deterministic seeding of the initial cluster centres (Jenks natural breaks optimization).
When "na.omit = FALSE" an additional level "NA" is added to each factor with missing values.
Examples
data <- iris
str(data)str(bin(data))str(bin(data, nbins =3))str(bin(data, nbins =3, labels = c("small","medium","large")))## Difference between methods "length" and "content"set.seed(1); table(bin(rnorm(900), nbins =3))set.seed(1); table(bin(rnorm(900), nbins =3, method ="content"))## Method "clusters"intervals <- paste(levels(bin(faithful$waiting, nbins =2, method ="cluster")), collapse =" ")hist(faithful$waiting, main = paste("Intervals:", intervals))abline(v = c(42.9,67.5,96.1), col ="blue")## Missing valuesbin(c(1:10,NA), nbins =2, na.omit =FALSE)# adds new level "NA"bin(c(1:10,NA), nbins =2)# omits missing values by default (with warning)