OMLDataSetDescription function

Construct OMLDataSetDescription.

Construct OMLDataSetDescription.

Creates a description for an OMLDataSet. To see a full list of all elements, please see the documentation.

makeOMLDataSetDescription( id = 0L, name, version = "0", description, format = "ARFF", creator = NA_character_, contributor = NA_character_, collection.date = NA_character_, upload.date = as.POSIXct(Sys.time()), language = NA_character_, licence = NA_character_, url = NA_character_, default.target.attribute = NA_character_, row.id.attribute = NA_character_, ignore.attribute = NA_character_, version.label = NA_character_, citation = NA_character_, visibility = NA_character_, original.data.url = NA_character_, paper.url = NA_character_, update.comment = NA_character_, md5.checksum = NA_character_, status = NA_character_, tags = NA_character_ )

Arguments

  • id: [integer(1)]

    Data set ID, autogenerated by the server. Ignored when set manually.

  • name: [character(1)]

    The name of the data set.

  • version: [character(1)]

    Version of the data set, autogenerated by the server. Ignored when set manually.

  • description: [character(1)]

    Description of the data set, given by the uploader.

  • format: [character(1)]

    Format of the data set. At the moment this is always "ARFF".

  • creator: [character]

    The person(s), that created this data set. Optional.

  • contributor: [character]

    People, that contibuted to this version of the data set (e.g., by reformatting). Optional.

  • collection.date: [character(1)]

    The date the data was originally collected. Given by the uploader. Optional.

  • upload.date: [POSIXt]

    The date the data was uploaded. Added by the server. Ignored when set manually.

  • language: [character(1)]

    Language in which the data is represented. Starts with 1 upper case letter, rest lower case, e.g. 'English'

  • licence: [character(1)]

    Licence of the data. NA means: Public Domain or "don't know/care".

  • url: [character(1)]

    Valid URL that points to the data file.

  • default.target.attribute: [character]

    The default target attribute, if it exists. Of course, tasks can be defined that use another attribute as target.

  • row.id.attribute: [character(1)]

    The attribute that represents the row-id column, if present in the data set. Else NA.

  • ignore.attribute: [character]

    Attributes that should be excluded in modelling, such as identifiers and indexes. Optional.

  • version.label: [character(1)]

    Version label provided by user, something relevant to the user. Can also be a date, hash, or some other type of id.

  • citation: [character(1)]

    Reference(s) that should be cited when building on this data.

  • visibility: [character(1)]

    Who can see the data set. Typical values: 'Everyone', 'All my friends', 'Only me'. Can also be any of the user's circles.

  • original.data.url: [character(1)]

    For derived data, the url to the original data set. This can be an OpenML data set, e.g. 'http://openml.org/d/1'.

  • paper.url: [character(1)]

    Link to a paper describing the data set.

  • update.comment: [character(1)]

    When the data set is updated, add an explanation here.

  • md5.checksum: [character(1)]

    MD5 checksum to check if the data set is downloaded without corruption. Can be ignored by user.

  • status: [character(1)]

    The status of the data set, autogenerated by the server. Ignored when set manually.

  • tags: [character]

    Optional tags for the data set.

Examples

data("airquality") dsc = "Daily air quality measurements in New York, May to September 1973. This data is taken from R." cit = "Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth." desc_airquality = makeOMLDataSetDescription(name = "airquality", description = dsc, creator = "New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data)", collection.date = "May 1, 1973 to September 30, 1973", language = "English", licence = "GPL-2", url = "https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html", default.target.attribute = "Ozone", citation = cit, tags = "R") airquality_oml = makeOMLDataSet(desc = desc_airquality, data = airquality, colnames.old = colnames(airquality), colnames.new = colnames(airquality), target.features = "Ozone")

See Also

Other data set-related functions: OMLDataSet, convertMlrTaskToOMLDataSet(), convertOMLDataSetToMlr(), deleteOMLObject(), getOMLDataSet(), listOMLDataSets(), tagOMLObject(), uploadOMLDataSet()

  • Maintainer: Giuseppe Casalicchio
  • License: BSD_3_clause + file LICENSE
  • Last published: 2022-10-19