opus function

Filtered Top-k Association Discovery of Self-Sufficient Itemsets

Filtered Top-k Association Discovery of Self-Sufficient Itemsets

opus finds the top k productive, non-redundant itemsets on the measure of interest (leverage or lift) using the OPUS Miner algorithm.

opus(transactions, k = 100, format = "data.frame", sep = " ", print_closures = FALSE, filter_itemsets = TRUE, search_by_lift = FALSE, correct_for_mult_compare = TRUE, redundancy_tests = TRUE)

Arguments

  • transactions: A filename, list, or object of class transactions (arules).
  • k: The number of itemsets to return, an integer (default 100).
  • format: The output format ("data.frame", default, or "itemsets").
  • sep: The separator between items (for files, default " ").
  • print_closures: return the closure for each itemset (default FALSE)
  • filter_itemsets: filter itemsets that are not independently productive (default TRUE)
  • search_by_lift: make lift (rather than leverage) the measure of interest (default FALSE)
  • correct_for_mult_compare: correct alpha for the size of the search space (default TRUE)
  • redundancy_tests: exclude redundant itemsets (default TRUE)

Returns

The top k productive, non-redundant itemsets, with relevant statistics, in the form of a data frame, object of class itemsets (arules), or a list.

Details

opus provides an interface to the OPUS Miner algorithm (implemented in C++) to find the top k productive, non-redundant itemsets by leverage (default) or lift.

transactions should be a filename, list (of transactions, each list element being a vector of character values representing item labels), or an object of class transactions (arules).

Files should be in the format of a list of transactions, one line per transaction, each transaction (ie, line) being a sequence of item labels, separated by the character specified by the parameter sep (default " "). See, for example, the files at http://fimi.ua.ac.be/data/. (Alternatively, files can be read seaparately using the read_transactions function.)

format should be specified as either "data.frame" (the default) or "itemsets", and any other value will return a list.

Examples

## Not run: result <- opus("mushroom.dat") result <- opus("mushroom.dat", k = 50) result[result$self_sufficient, ] result[order(result$count, decreasing = TRUE), ] trans <- read_transactions("mushroom.dat", format = "transactions") result <- opus(trans, print_closures = TRUE) result <- opus(trans, format = "itemsets") ## End(Not run)

References

Webb, G. I., & Vreeken, J. (2014). Efficient Discovery of the Most Interesting Associations. ACM Transactions on Knowledge Discovery from Data, 8(3), 1-15. doi: http://dx.doi.org/10.1145/2601433

  • Maintainer: Christoph Bergmeir
  • License: GPL-3
  • Last published: 2020-02-03

Useful links