regDataGen function

Artificial data for testing regression algorithms

Artificial data for testing regression algorithms

The generator produces regression data data with 4 discrete and 7 numeric attributes.

regDataGen(noInst, t1=0.8, t2=0.5, noise=0.1)

Arguments

  • noInst: Number of instances to generate.
  • t1, t2: Parameters controlling the shape of the distribution.
  • noise: Parameter controlling the amount of noise. If noise=0, there is no noise. If noise = 1, then the level of the signal and noise are the same.

Returns

Returns a data.frame with noInst rows and 11 columns. Range of values of the attributes and response are - a1: 0,1

  • a2: a,b,c,d

  • a3: 0,1 (irrelevant)

  • a4: a,b,c,d (irrelevant)

  • x1: numeric (gaussian with different sd for each class)

  • x2: numeric (gaussian with different sd for each class)

  • x3: numeric (gaussian, irrelevant)

  • x4: numeric from [0,1]

  • x5: numeric from [0,1]

  • x6: numeric from [0,1]

  • response: numeric

Details

The response variable is derived from x4, x5, x6 using two different functions. The choice depends on a hidden variable, which determines weather the response value would follow a linear dependency f=x42x5+3x6f=x_4-2x_5+3x_6, or a nonlinear one f=cos(4πx4)(2x53x6)f=cos(4\pi x_4)(2x_5-3x_6).

Attributes a1, a2, x1, x2 carry some information on the hidden variables depending on parameters t1, t2. Extreme values of the parameters are t1=0.5 and t2=1, when there is no information. On the other hand, if t1=0 or t1=1 then each of the attributes a1, a2 carries full information. If t2=0, then each of x1, x2 carries full information on the hidden variable.

The attributes x4, x5, x6 are available with a noise level depending on parameter noise. If noise=0, there is no noise. If noise=1, then the level of the signal and noise are the same.

Author(s)

Petr Savicky

See Also

classDataGen,ordDataGen,CoreModel,

Examples

#prepare a regression data set regData <-regDataGen(noInst=200) # build regression tree similar to CART modelRT <- CoreModel(response ~ ., regData, model="regTree", modelTypeReg=1) print(modelRT) destroyModels(modelRT) # clean up