correct_lip_for_abundance function

Protein abundance correction for LiP-data

Protein abundance correction for LiP-data

Performs the correction of LiP-peptides for changes in protein abundance and calculates their significance using a t-test. This function was implemented based on the MSstatsLiP

package developed by the Vitek lab.

correct_lip_for_abundance( lip_data, trp_data, protein_id, grouping, comparison = comparison, diff = diff, n_obs = n_obs, std_error = std_error, p_adj_method = "BH", retain_columns = NULL, method = c("satterthwaite", "no_df_approximation") )

Arguments

  • lip_data: a data frame containing at least the input variables. Ideally, the result from the calculate_diff_abundance function is used.
  • trp_data: a data frame containing at least the input variables minus the grouping column. Ideally, the result from the calculate_diff_abundance function is used.
  • protein_id: a character column in the lip_data and trp_data data frames that contains protein identifiers.
  • grouping: a character column in the lip_data data frame that contains precursor or peptide identifiers.
  • comparison: a character column in the lip_data and trp_data data frames that contains the comparisons between conditions.
  • diff: a numeric column in the lip_data and trp_data data frames that contains log2-fold changes for peptide or protein quantities.
  • n_obs: a numeric column in the lip_data and trp_data data frames containing the number of observations used to calculate fold changes.
  • std_error: a numeric column in the lip_data and trp_data data frames containing the standard error of fold changes.
  • p_adj_method: a character value, specifies the p-value correction method. Possible methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default method is "BH".
  • retain_columns: a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector). Please note that if you retain columns that have multiple rows per grouped variable there will be duplicated rows in the output.
  • method: a character value, specifies the method used to estimate the degrees of freedom. Possible methods are c("satterthwaite", "no_df_approximation"). satterthwaite uses the Welch-Satterthwaite equation to estimate the pooled degrees of freedom, as described in https://doi.org/10.1016/j.mcpro.2022.100477 and implemented in the MSstatsLiP package. This approach respects the number of protein measurements for the degrees of freedom. no_df_approximation just takes the number of peptides into account when calculating the degrees of freedom.

Returns

a data frame containing corrected differential abundances (adj_diff, adjusted standard errors (adj_std_error), degrees of freedom (df), pvalues (pval) and adjusted p-values (adj_pval)

Examples

# Load libraries library(dplyr) # Load example data and simulate tryptic data by summing up precursors data <- rapamycin_10uM data_trp <- data %>% dplyr::group_by(pg_protein_accessions, r_file_name) %>% dplyr::mutate(pg_quantity = sum(fg_quantity)) %>% dplyr::distinct( r_condition, r_file_name, pg_protein_accessions, pg_quantity ) # Calculate differential abundances for LiP and Trp data diff_lip <- data %>% dplyr::mutate(fg_intensity_log2 = log2(fg_quantity)) %>% assign_missingness( sample = r_file_name, condition = r_condition, intensity = fg_intensity_log2, grouping = eg_precursor_id, ref_condition = "control", retain_columns = "pg_protein_accessions" ) %>% calculate_diff_abundance( sample = r_file_name, condition = r_condition, grouping = eg_precursor_id, intensity_log2 = fg_intensity_log2, comparison = comparison, method = "t-test", retain_columns = "pg_protein_accessions" ) diff_trp <- data_trp %>% dplyr::mutate(pg_intensity_log2 = log2(pg_quantity)) %>% assign_missingness( sample = r_file_name, condition = r_condition, intensity = pg_intensity_log2, grouping = pg_protein_accessions, ref_condition = "control" ) %>% calculate_diff_abundance( sample = r_file_name, condition = r_condition, grouping = pg_protein_accessions, intensity_log2 = pg_intensity_log2, comparison = comparison, method = "t-test" ) # Correct for abundance changes corrected <- correct_lip_for_abundance( lip_data = diff_lip, trp_data = diff_trp, protein_id = pg_protein_accessions, grouping = eg_precursor_id, retain_columns = c("missingness"), method = "satterthwaite" ) head(corrected, n = 10)

Author(s)

Aaron Fehr

  • Maintainer: Jan-Philipp Quast
  • License: MIT + file LICENSE
  • Last published: 2024-10-21