Running Local LLMs with 'llama.cpp' Backend
Create a CSV sink for streaming annotation chunks
Apply Chat Template to Format Conversations
Apply Gemma-Compatible Chat Template
Free localLLM backend
Initialize localLLM backend
Compute confusion matrices from multi-model annotations
Create Inference Context for Text Generation
Convert Token IDs Back to Text
Finish automatic run documentation
Start automatic run documentation
Temporarily apply an HF token for a scoped operation
Download a model manually
Compare multiple LLMs over a shared set of prompts
Generate Text in Parallel for Multiple Prompts
Generate Text Using Language Model Context
Get Backend Library Path
Get the model cache directory
Inspect detected hardware resources
Install localLLM Backend Library
Intercoder reliability for LLM annotations
Check if Backend Library is Installed
List cached models on disk
List GGUF models managed by Ollama
R Interface to llama.cpp with Runtime Library Loading
Load Language Model with Automatic Download Support
Reset quick_llama state
Quick LLaMA Inference
Configure Hugging Face access token
Smart Chat Template Application
Test tokenize function (debugging)
Convert Text to Token IDs
Validate model predictions against gold labels and peer agreement
Provides R bindings to the 'llama.cpp' library for running large language models. The package uses a lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled with the package. Package features include text generation, reproducible generation, and parallel inference.
Useful links