Scientific Content and Citation Analysis from PDF Documents
Enhanced scientific content analysis with citation extraction
Compare two author names with fuzzy matching
Calculate readability indices for text
Calculate word distribution across text segments or sections
Check if author conflict is real or just normalization difference
Complete references from OpenAlex with intelligent conflict resolution
Count syllables in a word
Create Citation Co-occurrence Network
Create empty readability tibble
Extract DOI from PDF Metadata (Legacy Function)
Extract DOI and Metadata from PDF
Process Content with Google Gemini AI
Retrieve rich metadata from the CrossRef API for a given DOI
Get path to example paper
Map citations to document segments or sections
Match citations to references
Merge Text Chunks into Named Sections
Normalize author name for robust comparison
Normalize references section formatting
Parse references section from text
Import PDF with Automatic Section Detection
Extract text from multi-column PDF with structure preservation
Pipe operator
Create interactive word distribution plot
Process Large PDF Documents with Google Gemini AI
Calculate readability indices for multiple texts
Remove All Types of Tables (Markdown and Plain Text)
Remove Markdown Code Block Markers
Remove Figure Captions
Split document text into sections
Provides comprehensive tools for extracting and analyzing scientific content from PDF documents, including citation extraction, reference matching, text analysis, and bibliometric indicators. Supports multi-column PDF layouts, 'CrossRef' API <https://www.crossref.org/documentation/retrieve-metadata/rest-api/> integration, and advanced citation parsing.
Useful links