contentanalysis R package [Documentation]

analyze_scientific_content

Enhanced scientific content analysis with citation extraction

author_names_match

Compare two author names with fuzzy matching

calculate_readability_indices

Calculate readability indices for text

calculate_word_distribution

Calculate word distribution across text segments or sections

check_author_conflict

Check if author conflict is real or just normalization difference

complete_references_from_oa

Complete references from OpenAlex with intelligent conflict resolution

count_syllables

Count syllables in a word

create_citation_network

Create Citation Co-occurrence Network

create_empty_readability_tibble

Create empty readability tibble

extract_doi_from_pdf

Extract DOI from PDF Metadata (Legacy Function)

extract_pdf_metadata

Extract DOI and Metadata from PDF

gemini_content_ai

Process Content with Google Gemini AI

get_crossref_references

Retrieve rich metadata from the CrossRef API for a given DOI

get_example_paper

Get path to example paper

map_citations_to_segments

Map citations to document segments or sections

match_citations_to_references

Match citations to references

merge_text_chunks_named

Merge Text Chunks into Named Sections

normalize_author_name

Normalize author name for robust comparison

normalize_references_section

Normalize references section formatting

parse_references_section

Parse references section from text

pdf2txt_auto

Import PDF with Automatic Section Detection

pdf2txt_multicolumn_safe

Extract text from multi-column PDF with structure preservation

pipe

Pipe operator

plot_word_distribution

Create interactive word distribution plot

process_large_pdf

Process Large PDF Documents with Google Gemini AI

readability_multiple

Calculate readability indices for multiple texts

remove_all_tables

Remove All Types of Tables (Markdown and Plain Text)

remove_code_blocks

Remove Markdown Code Block Markers

remove_figure_caps

Remove Figure Captions

split_into_sections

Split document text into sections

Download source package Read PDF manual

Provides comprehensive tools for extracting and analyzing scientific content from PDF documents, including citation extraction, reference matching, text analysis, and bibliometric indicators. Supports multi-column PDF layouts, 'CrossRef' API <https://www.crossref.org/documentation/retrieve-metadata/rest-api/> integration, and advanced citation parsing.

Maintainer: Massimo Aria
License: GPL (>= 3)
Last published: 2025-12-12

Useful links

contentanalysis0.2.1 package

Functions

Readme

Dependencies

Imports

Versions

News