Local Large Language Model Inference Engine
Build chat prompt from conversation history
Download using curl command
Download using R's download.file with libcurl
Download using wget command
Check if a file is a valid GGUF file
Robust file download with retry and resume support
Performance benchmarking for model inference
Interactive chat session with streaming responses
Clean up cache directory and manage storage
Generate text completion using loaded model
Download a GGUF model from Hugging Face
Download a model from a direct URL
Find and prepare GGUF models for use with edgemodelr
Find and load Ollama models
Free model context and release memory
List popular pre-configured models
Load a local GGUF model for inference
Load an Ollama model by partial SHA-256 hash
Quick setup for a popular model
Control llama.cpp logging verbosity
Get optimized configuration for small language models
Stream text completion with real-time token generation
edgemodelr: Local Large Language Model Inference Engine
Check if model context is valid
Test if an Ollama model blob can be used with edgemodelr
Enables R users to run large language models locally using 'GGUF' model files and the 'llama.cpp' inference engine. Provides a complete R interface for loading models, generating text completions, and streaming responses in real-time. Supports local inference without requiring cloud APIs or internet connectivity, ensuring complete data privacy and control. Based on the 'llama.cpp' project by Georgi Gerganov (2023) <https://github.com/ggml-org/llama.cpp>.
Useful links