contentanalysis0.2.1 package

Scientific Content and Citation Analysis from PDF Documents

analyze_scientific_content

Enhanced scientific content analysis with citation extraction

author_names_match

Compare two author names with fuzzy matching

calculate_readability_indices

Calculate readability indices for text

calculate_word_distribution

Calculate word distribution across text segments or sections

check_author_conflict

Check if author conflict is real or just normalization difference

complete_references_from_oa

Complete references from OpenAlex with intelligent conflict resolution

count_syllables

Count syllables in a word

create_citation_network

Create Citation Co-occurrence Network

create_empty_readability_tibble

Create empty readability tibble

extract_doi_from_pdf

Extract DOI from PDF Metadata (Legacy Function)

extract_pdf_metadata

Extract DOI and Metadata from PDF

gemini_content_ai

Process Content with Google Gemini AI

get_crossref_references

Retrieve rich metadata from the CrossRef API for a given DOI

get_example_paper

Get path to example paper

map_citations_to_segments

Map citations to document segments or sections

match_citations_to_references

Match citations to references

merge_text_chunks_named

Merge Text Chunks into Named Sections

normalize_author_name

Normalize author name for robust comparison

normalize_references_section

Normalize references section formatting

parse_references_section

Parse references section from text

pdf2txt_auto

Import PDF with Automatic Section Detection

pdf2txt_multicolumn_safe

Extract text from multi-column PDF with structure preservation

pipe

Pipe operator

plot_word_distribution

Create interactive word distribution plot

process_large_pdf

Process Large PDF Documents with Google Gemini AI

readability_multiple

Calculate readability indices for multiple texts

remove_all_tables

Remove All Types of Tables (Markdown and Plain Text)

remove_code_blocks

Remove Markdown Code Block Markers

remove_figure_caps

Remove Figure Captions

split_into_sections

Split document text into sections

Provides comprehensive tools for extracting and analyzing scientific content from PDF documents, including citation extraction, reference matching, text analysis, and bibliometric indicators. Supports multi-column PDF layouts, 'CrossRef' API <https://www.crossref.org/documentation/retrieve-metadata/rest-api/> integration, and advanced citation parsing.

  • Maintainer: Massimo Aria
  • License: GPL (>= 3)
  • Last published: 2025-12-12