nBkp: number of breakpoints. If NULL, then argument bkp
is expected to be provided.
bkp: a numeric vector of breakpoint positions that may be used to bypass the breakpoint generation step. Defaults to NULL.
regData: a data.frame containing copy number data for different types of copy number regions. Columns:
c: Total copy number
b: Allele B fraction (a.k.a. BAF)
region: a character value, annotation label for the region. See Details.
genotype: the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).
regions: a character vector of region labels that may be used to bypass the region label generation step. Defaults to NULL.
regAnnot: a data.frame containing annotation data for each copy number region. Columns:
region: label of the form (must match regData[["region"]]).
freq: frequency (in [0,1]) of this type of region in the genome.
If NULL (the default), frequencies of regions (0,1), (0,2), (1,1) and (1,2) (the most common alterations) are set to represent 90% of the regions. sum(regAnnot[["freq"]]) should be 1.
minLength: minimum length of region between breakpoints. Defaults to 0.
regionSize: If regionSize>0, breakpoints are included by pairs, where the distance within pair is set to regionSize. nBkp is then required to be an even number.
connex: If TRUE, any two successive regions are constrained to be connex in the (minor CN, major CN) space. See 'Details'.
Returns
A list with elements
profile: the profile (a length by 2 data.frame containing the same fields as the input data regData.
bkp: a vector of bkp positions (the last row index before a breakpoint)
regions: a character vector of region labels
Details
This function generates a random copy number profile of length 'length', with 'nBkp' breakpoints randomly chosen. Between two breakpoints, the profile is constant and taken among the different types of regions in regData.
Elements of regData[["region"]] must be of the form "(C1,C2)", where C1 denotes the minor copy number and C2 denotes the major copy number. For example,
(1,1): Normal
(0,1): Hemizygous deletion
(0,0): Homozygous deletion
(1,2): Single copy gain
(0,2): Copy-neutral LOH
(2,2): Balanced two-copy gain
(1,3): Unbalanced two-copy gain
(0,3): Single-copy gain with LOH
If 'connex' is set to TRUE (the default), transitions between copy number regions are constrained in such a way that for any breakpoint, one of the minor and the major copy number does not change. Equivalently, this means that all breakpoints can be seen in both total copy numbers and allelic ratios.
Examples
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1)sim <- getCopyNumberDataByResampling(len=1e4, nBkp=5, minLength=100, regData=affyDat)plotSeg(sim$profile, sim$bkp)## another run with identical parametersbkp <- sim$bkp
regions <- sim$regions
sim2 <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDat, regions=regions)plotSeg(sim2$profile, bkp)## change tumor fraction but keep same "truth"affyDatC <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5)simC <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatC, regions=regions)plotSeg(simC$profile, bkp)## restrict to only normal, single copy gain, and copy-neutral LOH## with the same bkpaffyDatR <- subset(affyDat, region %in% c("(1,1)","(0,2)","(1,2)"))simR <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatR)plotSeg(simR$profile, bkp)## Same 'truth', on another dataSetregions <- simR$regions
illuDat <- acnr::loadCnRegionData(dataSet="GSE11976", tumorFraction=1)sim <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=illuDat, regions=regions)plotSeg(sim$profile, sim$bkp)
References
Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." Briefings in Bioinformatics, no. 4: 600-615.