x: data.frame to be saved to an 'Excel' 'xlsx' file.
file: character valid path to save an 'Excel' 'xlsx' file. If the file exists, and append=TRUE the new data will be added to the existing file withthe defined sheetName.
Note when file=NULL the output is not saved to a file, instead the Workbook object is returned by this function. The Workbook object can be passed as argument wb in order to add multiple sheets to the same Workbook prior to saving them together. This operation is intended to provide a substantial improvement in speed.
wb: Workbook object as defined in R package openxlsx. When this argument is defined, data is not imported from file, and instead the workbook data is used from wb. This option is intended to improve speed of writing several sheets to the same output file, by preventing the slow read/write steps each time a new sheet is added.
sheetName: character value less with a valid 'Excel' 'xlsx' worksheet name. At this time (version 0.0.29.900) the sheetName is restricted to 31 characters, with no puntuation except "-" and "_".
startRow, startCol: integer indicating the row and column number to start with the top,left cell written to the worksheet, default are 1.
append: logical default FALSE, whether to append to file (TRUE), or to write over an existing file. The append=TRUE is useful when adding a worksheet to an existing file.
headerColors, columnColors, highlightHeaderColors, highlightColors, borderColor, borderPosition: default values for the 'Excel' worksheet background and border colors. As of version 0.0.29.900, colors must use valid 'Excel' color names.
highlightColumns, numColumns, fcColumns, lfcColumns, hitColumns, intColumns, pvalueColumns: integer vector referring the column number in the input data.frame
x to define as each column type, as relevant.
numFormat, fcFormat, lfcFormat, hitFormat, intFormat, pvalueFormat: character string with valid 'Excel' cell formatting, for example "#,##0.00" defines a column to use comma-delimited numbers above one thousand, and display two decimal places in all numeric cells. See [https://support.microsoft.com] topic "Excel Create and apply a custom number format." or "Excel Number format codes" for more details. Some examples below:
"#,##0" : display only integer values, using comma as delimiter for every thousands place. The number 2142.12 would be represented: "2,142"
"###0.0" : display numeric values rounded to the 0.1 place, using no comma delimiter for values above one thousand. The number 2142.12 would be represented: "2142.1"
"[\>0.01]0.00#;0.00E+00" : this rule is a conditional format, values above 0.01 are represented as numbers rounded to the thousandths position 0.001; values below 0.01 are represented with scientific notation with three digits. The number 0.1256 would be represented: "0.126"
The number 0.001256 would be represented: "1.26E-03"
"[Red]#,###.00_);[Blue](#,###.00);[Black]0.00_)" : this format applies to positive values, negative values, and zero, in order delimited by semicolons. Positive values are colored red. The string "_)" adds whitespace (defined by "_") equale to the width of the character ")" to the end of positive values. Negative values are surrounded by parentheses "()"
and are colored blue. Values equal to zero are represented with two trailing digits, and whitespace ("_") equal to width ")". The whitespace at the end of positive values and zero are used to align all values at the same decimal position.
numRule, fcRule, lfcRule, hitRule, intRule, pvalueRule: numeric vector length=3 indicating the breakpoints for 'Excel' to apply conditional color formatting, using the corresponding style. Note that all conditional formatting applied by this function uses the "3-Color Scale", therefore there should be three values, and three corresponding colors in the corresponding Style arguments.
numStyle, fcStyle, lfcStyle, intStyle, hitStyle, pvalueStyle: character
vector length=3 containing three valid R colors. Note that alpha transparency will be removed prior to use in 'Excel', as required. Note that all conditional formatting applied by this function uses the "3-Color Scale", therefore there should be three colors, which match three values in the corresponding Rule arguments.
doConditional: logical indicating whether to apply conditional formatting of cells, with this function only the background cell color (and contrasting text color) is affected.
doCategorical: logical indicating whether to apply categorical color formatting, of only the background cell colors and contrasting text color. This argument requires colorSub be defined.
colorSub: character vector of R colors, whose names refer to cell values in the input x data.frame.
freezePaneColumn, freezePaneRow: integer value of the row or column before which the 'Excel' "freeze panes" is applied. Note that these values are adjusted relative by startRow and startCol in the 'Excel' worksheet, so that the values are applied relative to the data.frame argument x.
doFilter: logical indicating whether to enable column filtering by default.
fontName: character default font configuration, containing a valid 'Excel' font name.
fontSize: numeric default font size in 'Excel' point units.
minWidth, maxWidth, autoWidth: numeric minimum, maximum size for each 'Excel' cell, in character units as defined by 'Excel', used when autoWidth=TRUE to restrict cell widths to this range. Note that the argument colWidths is generally preferred, if the numeric widths can be reasonable calculated or anticipated upfront. When autoWidth=FALSE 'Excel' typically auto-sizes cells to the width of the largest value in each column, which may not be ideal when values are extremely large.
colWidths: numeric width of each column in x, recycled to the total number of columns required. Note that when keepRownames=TRUE, the first column will contain rownames(x), therefore the length of colWidths in that case will be ncol(x) + 1.
wrapCells: logical default FALSE, indicating whether to enable word-wrap within cells.
wrapHeaders: logical indicating whether to enable word wrap for column headers, which is helpful when autoWidth=TRUE since it fixed the cell width while allowing the column header to be seen.
headerRowMultiplier: numeric value to define the row height of the first header row in 'Excel'. This value is defined as a multiple of subsequent rows, and should usually represent the maximum number of lines after word-wrapping, as relevant. This argument is helpful when wrapHeaders=TRUE and autoWidth=TRUE.
keepRownames: logical indicating whether to include rownames(x) in its own column in 'Excel'.
verbose: logical indicating whether to print verbose output.
...: additional arguments are passed to applyXlsxConditionalFormat()
and applyXlsxCategoricalFormat() as relevant.
Returns
Workbook object as defined by the openxlsx package is returned invisibly with invisible(). This Workbook
can be used in argument wb to provide a speed boost when saving multiple sheets to the same file.
Details
This function is a minor but useful customization of the openxlsx::saveWorkbook() and associated functions, intended to provide some pre-configured formatting of known column types, typically relevant to statistical values, and in some cases, gene or transcript expression values.
There are numerous configurable options when saving an 'Excel' worksheet, most of the defaults in this function are intended not to require changes, but are listed as formal function arguments to make each option visibly obvious.
If colorSub is supplied as a named vector of colors, then by default text values will be colorized accordingly, which can be especially helpful when including data with categorical text values.
This function pre-configures formatting options for the following column data types, each of which has conditional color-formatting, defined numeric ranges, and color scales.
int: integer values, where numeric values are formatted without visible decimal places, and the big.mark="," standard is used to help visually distinguish large integers. The color scale is by default c(0, 100, 10000).
num: numeric values, with fixed number of visible decimal places, which helps visibly align values along each row.
hit: numeric type, a subset of "int" intended when data is flagged with something like a "+1" or "-1" to indicate a statistical increase or decrease.
pvalue: P-value, where numeric values range from 1 down near zero, and values are formatted consistently with scientific notation.
fc: numeric fold change, whose values are expected to range from 1 and higher, and -1 and lower. Decimal places are by default configured to show one decimal place, to simplify the 'Excel' visual summary.
lfc: numeric log fold change, whose values are expected to be centered at zero. Decimal places are by default configured to show one decimal place, to simplify the 'Excel' visual summary.
highlight: character and undefined columns to be highlighted with a brighter background color, and bold text.
For each column data type, a color scale and default numeric range is defined, which allows conditional formatting of cells based upon expected ranges of values.
A screenshot of the file produced by the example is shown below.
Examples
# set up a test data.frameset.seed(123);lfc <--3:3+ stats::rnorm(7)/3;colorSub <- nameVector( rainbow2(7), LETTERS[1:7])df <- data.frame(name=LETTERS[1:7], int=round(4^(1:7)), num=(1:7)*4-2+ stats::rnorm(7), fold=2^abs(lfc)*sign(lfc), lfc=lfc, pvalue=10^(-1:-7+ stats::rnorm(7)), hit=sample(c(-1,0,0,1,1), replace=TRUE, size=7));df;# write to tempfile for examplesif(check_pkg_installed("openxlsx")){ out_xlsx <- tempfile(pattern="writeOpenxlsx_", fileext=".xlsx") writeOpenxlsx(x=df, file=out_xlsx, sheetName="jamba_test", colorSub=colorSub, intColumns=2, numColumns=3, fcColumns=4, lfcColumns=5, pvalueColumns=6, hitColumn=7, freezePaneRow=2, freezePaneColumn=2, append=FALSE);# now read it back df_list <- readOpenxlsx(xlsx=out_xlsx); sdim(df_list)}
See Also
Other jam export functions: applyXlsxCategoricalFormat(), applyXlsxConditionalFormat(), readOpenxlsx(), set_xlsx_colwidths(), set_xlsx_rowheights()