daiR1.2.0 package

Interface with Google Cloud Document AI API

draw_blocks

Draw block bounding boxes

reassign_tokens2

Assign tokens to a single new block

delete_processor

Delete processor

deprecated

Deprecated functions

disable_processor

Disable processor

dot-onAttach

Run when daiR is attached

get_entities

Get entities

get_ids_by_type

List ids of available processors of a given type

get_processor_info

Get information about processor

get_processor_versions

List available versions of processor

get_processors

List created processors

get_project_id

Get project id

get_tables

Get tables

get_text

Get text

get_versions_by_type

List versions of available processors of a given type

image_to_pdf

Convert images to PDF

img_to_binbase

Image to base64 tiff

is_colour

Check that a string is a valid colour representation

is_json

Check that a file is JSON

is_pdf

Check that a file is PDF

list_processor_types

List available processor types

build_block_df

Build block dataframe

make_hocr

Make hOCR file

merge_shards

Merge shards

pdf_to_binbase

PDF to base64 tiff

reassign_tokens

Assign tokens to new blocks

build_token_df

Build token dataframe

create_processor

Create processor

defunct

Defunct functions

dai_async

OCR documents asynchronously

dai_auth

Check authentication

dai_notify

Notify on job completion

dai_status

Check job status

dai_sync

OCR document synchronously

dai_token

Produce access token

dai_user

Get user information

draw_entities

Draw entity bounding boxes

draw_lines

Draw line bounding boxes

draw_paragraphs

Draw paragraph bounding boxes

draw_tokens

Draw token bounding boxes

enable_processor

Enable processor

from_labelme

Extract block coordinates from labelme files

redraw_blocks

Inspect revised block bounding boxes

split_block

Split a block bounding box

tables_from_dai_file

Get tables from output file

tables_from_dai_response

Get tables from response object

text_from_dai_file

Get text from output file

text_from_dai_response

Get text from HTTP response object

R interface for the Google Cloud Services 'Document AI API' <https://cloud.google.com/document-ai> with additional tools for output file parsing and text reconstruction. 'Document AI' is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. 'daiR' gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.

  • Maintainer: Thomas Hegghammer
  • License: MIT + file LICENSE
  • Last published: 2025-11-18