html_table function

Parse an html table into a data frame

Parse an html table into a data frame

The algorithm mimics what a browser does, but repeats the values of merged cells in every cell that cover.

html_table( x, header = NA, trim = TRUE, fill = deprecated(), dec = ".", na.strings = "NA", convert = TRUE )

Arguments

  • x: A document (from read_html()), node set (from html_elements()), node (from html_element()), or session (from session()).

  • header: Use first row as header? If NA, will use first row if it consists of <th> tags.

    If TRUE, column names are left exactly as they are in the source document, which may require post-processing to generate a valid data frame.

  • trim: Remove leading and trailing whitespace within each cell?

  • fill: Deprecated - missing cells in tables are now always automatically filled with NA.

  • dec: The character used as decimal place marker.

  • na.strings: Character vector of values that will be converted to NA

    if convert is TRUE.

  • convert: If TRUE, will run type.convert() to interpret texts as integer, double, or NA.

Returns

When applied to a single element, html_table() returns a single tibble. When applied to multiple elements or a document, html_table() returns a list of tibbles.

Examples

sample1 <- minimal_html("<table> <tr><th>Col A</th><th>Col B</th></tr> <tr><td>1</td><td>x</td></tr> <tr><td>4</td><td>y</td></tr> <tr><td>10</td><td>z</td></tr> </table>") sample1 %>% html_element("table") %>% html_table() # Values in merged cells will be duplicated sample2 <- minimal_html("<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> <tr><td colspan='2'>4</td><td>5</td></tr> <tr><td>6</td><td colspan='2'>7</td></tr> </table>") sample2 %>% html_element("table") %>% html_table() # If a row is missing cells, they'll be filled with NAs sample3 <- minimal_html("<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td colspan='2'>1</td><td>2</td></tr> <tr><td colspan='2'>3</td></tr> <tr><td>4</td></tr> </table>") sample3 %>% html_element("table") %>% html_table()
  • Maintainer: Hadley Wickham
  • License: MIT + file LICENSE
  • Last published: 2024-02-12