getCopyNumberDataByResampling() R function from [jointseg]

Generate a copy number profile by resampling

Generate a copy number profile by resampling input data


getCopyNumberDataByResampling(length, nBkp = NA, bkp = NULL,
  regData = NULL, regions = NULL, regAnnot = NULL, minLength = 0,
  regionSize = 0, connex = TRUE)

Arguments

length: length of the profile
nBkp: number of breakpoints. If NULL, then argument bkp

is expected to be provided.
bkp: a numeric vector of breakpoint positions that may be used to bypass the breakpoint generation step. Defaults to NULL.
regData: a data.frame containing copy number data for different types of copy number regions. Columns:
- c: Total copy number
- b: Allele B fraction (a.k.a. BAF)
- region: a character value, annotation label for the region. See Details.
- genotype: the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).
regions: a character vector of region labels that may be used to bypass the region label generation step. Defaults to NULL.
regAnnot: a data.frame containing annotation data for each copy number region. Columns:
- region: label of the form (must match regData[["region"]]).
- freq: frequency (in [0,1]) of this type of region in the genome.
If NULL (the default), frequencies of regions (0,1), (0,2), (1,1) and (1,2) (the most common alterations) are set to represent 90% of the regions. sum(regAnnot[["freq"]]) should be 1.
minLength: minimum length of region between breakpoints. Defaults to 0.
regionSize: If regionSize>0, breakpoints are included by pairs, where the distance within pair is set to regionSize. nBkp is then required to be an even number.
connex: If TRUE, any two successive regions are constrained to be connex in the (minor CN, major CN) space. See 'Details'.

Returns

A list with elements

profile: the profile (a length by 2 data.frame containing the same fields as the input data regData.
bkp: a vector of bkp positions (the last row index before a breakpoint)
regions: a character vector of region labels

Details

This function generates a random copy number profile of length 'length', with 'nBkp' breakpoints randomly chosen. Between two breakpoints, the profile is constant and taken among the different types of regions in regData.

Elements of regData[["region"]] must be of the form "(C1,C2)", where C1 denotes the minor copy number and C2 denotes the major copy number. For example,

(1,1): Normal
(0,1): Hemizygous deletion
(0,0): Homozygous deletion
(1,2): Single copy gain
(0,2): Copy-neutral LOH
(2,2): Balanced two-copy gain
(1,3): Unbalanced two-copy gain
(0,3): Single-copy gain with LOH

If 'connex' is set to TRUE (the default), transitions between copy number regions are constrained in such a way that for any breakpoint, one of the minor and the major copy number does not change. Equivalently, this means that all breakpoints can be seen in both total copy numbers and allelic ratios.

Examples


affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1)
sim <- getCopyNumberDataByResampling(len=1e4, nBkp=5, minLength=100, regData=affyDat)
plotSeg(sim$profile, sim$bkp)

## another run with identical parameters
bkp <- sim$bkp
regions <- sim$regions
sim2 <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDat, regions=regions)
plotSeg(sim2$profile, bkp)

## change tumor fraction but keep same "truth"
affyDatC <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5)
simC <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatC, regions=regions)
plotSeg(simC$profile, bkp)

## restrict to only normal, single copy gain, and copy-neutral LOH
## with the same bkp
affyDatR <- subset(affyDat, region %in% c("(1,1)", "(0,2)", "(1,2)"))
simR <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatR)
plotSeg(simR$profile, bkp)

## Same 'truth', on another dataSet
regions <- simR$regions
illuDat <- acnr::loadCnRegionData(dataSet="GSE11976", tumorFraction=1)
sim <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=illuDat, regions=regions)
plotSeg(sim$profile, sim$bkp)

References

Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." Briefings in Bioinformatics, no. 4: 600-615.

Author(s)

Morgane Pierre-Jean and Pierre Neuvial

jointseg package Read PDF manual

Maintainer: Morgane Pierre-Jean
License: LGPL (>= 2.1)
Last published: 2019-01-11

Useful links

getCopyNumberDataByResampling function