Preprocessing() R function from [scapGNN]

Data preprocessing

This function is to prepare data for the ConNetGNN function.


Preprocessing(data, parallel.cores = 1, verbose = TRUE)

Arguments

data: The input data should be a data frame or a matrix where the rows are genes and the columns are cells. The seurat object are also accepted.
parallel.cores: Number of processors to use when doing the calculations in parallel (default: 2). If parallel.cores=0, then it will use all available core processors unless we set this argument with a smaller number.
verbose: Gives information about each step. Default: TRUE.

Returns

A list:

orig_dara: User-submitted raw data, rows are highly variable genes and columns are cells.
cell_features: Cell feature matrix.
gene_features: Gene feature matrix.
ltmg_matrix: Gene regulatory signal matrix for LTMG.
cell_adj: The adjacency matrix of the cell correlation network.
gene_adj: The adjacency matrix of the gene correlation network.

Details

Preprocessing

The function is able to interface with the seurat framework. The process of seurat data processing refers to Examples. The input data should be containing hypervariable genes and log-transformed. Left-truncated mixed Gaussian (LTMG) modeling to calculate gene regulatory signal matrix. Positively correlated gene-gene and cell-cell are used as the initial gene correlation matrix and cell correlation matrix.

Examples


# Load dependent packages.
# require(coop)

# Seurat data processing.
# require(Seurat)

# Load the PBMC dataset (Case data for seurat)
# pbmc.data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")

# Our recommended data filtering is that only genes expressed as non-zero in more than
# 1% of cells, and cells expressed as non-zero in more than 1% of genes are kept.
# In addition, users can also filter mitochondrial genes according to their own needs.
# pbmc <- CreateSeuratObject(counts = pbmc.data, project = "case",
#                                     min.cells = 3, min.features = 200)
# pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
# pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

# Normalizing the data.
# pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize")

# Identification of highly variable features.
# pbmc <- FindVariableFeatures(pbmc, selection.method = 'vst', nfeatures = 2000)

# Run Preprocessing.
# Prep_data <- Preprocessing(pbmc)


# Users can also directly input data
# in data frame or matrix format
# containing highly variable genes.
data("Hv_exp")
Hv_exp <- Hv_exp[,1:20]
Hv_exp <- Hv_exp[which(rowSums(Hv_exp) > 0),]
Prep_data <- Preprocessing(Hv_exp[1:10,])

scapGNN package Read PDF manual

Maintainer: Xudong Han
License: GPL (>= 2)
Last published: 2023-08-08

Useful links

Preprocessing function

Data preprocessing

Arguments

Returns

Details

Examples