pairwiseLLM1.1.0 package

Pairwise Comparison Tools for Large Language Model-Based Writing Evaluation

alternate_pair_order

Deterministically alternate sample order in pairs

anthropic_compare_pair_live

Live Anthropic (Claude) comparison for a single pair of samples

anthropic_create_batch

Create an Anthropic Message Batch

anthropic_download_batch_results

Download Anthropic Message Batch results (.jsonl)

anthropic_get_batch

Retrieve an Anthropic Message Batch by ID

anthropic_poll_batch_until_complete

Poll an Anthropic Message Batch until completion

build_anthropic_batch_requests

Build Anthropic Message Batch requests from a tibble of pairs

build_bt_data

Build Bradley-Terry comparison data from pairwise results

build_elo_data

Build EloChoice comparison data from pairwise results

build_gemini_batch_requests

Build Gemini batch requests from a tibble of pairs

build_openai_batch_requests

Build OpenAI batch JSONL lines for paired comparisons

build_prompt

Build a concrete LLM prompt from a template

check_llm_api_keys

Check configured API keys for LLM backends

check_positional_bias

Check positional bias and bootstrap consistency reliability

compute_reverse_consistency

Compute consistency between forward and reverse pair comparisons

dot-gemini_api_key

Internal: Google Gemini API key helper

dot-parse_gemini_pair_response

Internal: parse a Gemini GenerateContentResponse into the standard tib...

dot-together_api_key

Internal: Together.ai API key helper

ensure_only_ollama_model_loaded

Ensure only one Ollama model is loaded in memory

fit_bt_model

Fit a Bradley–Terry model with sirt and fallback to BradleyTerry2

fit_elo_model

Fit an EloChoice model to pairwise comparison data

gemini_compare_pair_live

Live Google Gemini comparison for a single pair of samples

gemini_create_batch

Create a Gemini Batch job from request objects

gemini_download_batch_results

Download Gemini Batch results to a JSONL file

gemini_get_batch

Retrieve a Gemini Batch job by name

gemini_poll_batch_until_complete

Poll a Gemini Batch job until completion

get_prompt_template

Retrieve a named prompt template

list_prompt_templates

List available prompt templates

llm_compare_pair

Backend-agnostic live comparison for a single pair of samples

llm_download_batch_results

Extract results from a pairwiseLLM batch object

llm_submit_pairs_batch

Submit pairs to an LLM backend via batch API

make_pairs

Create all unordered pairs of writing samples

ollama_compare_pair_live

Live Ollama comparison for a single pair of samples

openai_compare_pair_live

Live OpenAI comparison for a single pair of samples

openai_create_batch

Create an OpenAI batch from an uploaded file

openai_download_batch_output

Download the output file for a completed batch

openai_get_batch

Retrieve an OpenAI batch

openai_poll_batch_until_complete

Poll an OpenAI batch until it completes or fails

openai_upload_batch_file

Upload a JSONL batch file to OpenAI

parse_anthropic_batch_output

Parse Anthropic Message Batch output into a tibble

parse_gemini_batch_output

Parse Gemini batch JSONL output into a tibble of pairwise results

parse_openai_batch_output

Parse an OpenAI Batch output JSONL file

randomize_pair_order

Randomly assign samples to positions SAMPLE_1 and SAMPLE_2

read_samples_df

Read writing samples from a data frame

read_samples_dir

Read writing samples from a directory of .txt files

register_prompt_template

Register a named prompt template

remove_prompt_template

Remove a registered prompt template

run_anthropic_batch_pipeline

Run an Anthropic batch pipeline for pairwise comparisons

run_gemini_batch_pipeline

Run a Gemini batch pipeline for pairwise comparisons

run_openai_batch_pipeline

Run a full OpenAI batch pipeline for pairwise comparisons

sample_pairs

Randomly sample pairs of writing samples

sample_reverse_pairs

Sample reversed versions of a subset of pairs

set_prompt_template

Get or set a prompt template for pairwise comparisons

submit_anthropic_pairs_live

Live Anthropic (Claude) comparisons for a tibble of pairs

submit_gemini_pairs_live

Live Google Gemini comparisons for a tibble of pairs

submit_llm_pairs

Backend-agnostic live comparisons for a tibble of pairs

submit_ollama_pairs_live

Live Ollama comparisons for a tibble of pairs

submit_openai_pairs_live

Live OpenAI comparisons for a tibble of pairs

submit_together_pairs_live

Live Together.ai comparisons for a tibble of pairs

summarize_bt_fit

Summarize a Bradley–Terry model fit

together_compare_pair_live

Live Together.ai comparison for a single pair of samples

trait_description

Get a trait name and description for prompts

write_openai_batch_file

Write an OpenAI batch table to a JSONL file

Provides a unified framework for generating, submitting, and analyzing pairwise comparisons of writing quality using large language models (LLMs). The package supports live and/or batch evaluation workflows across multiple providers ('OpenAI', 'Anthropic', 'Google Gemini', 'Together AI', and locally-hosted 'Ollama' models), includes bias-tested prompt templates and a flexible template registry, and offers tools for constructing forward and reversed comparison sets to analyze consistency and positional bias. Results can be modeled using Bradley–Terry (1952) <doi:10.2307/2334029> or Elo rating methods to derive writing quality scores. For information on the method of pairwise comparisons, see Thurstone (1927) <doi:10.1037/h0070288> and Heldsinger & Humphry (2010) <doi:10.1007/BF03216919>. For information on Elo ratings, see Clark et al. (2018) <doi:10.1371/journal.pone.0190393>.

  • Maintainer: Sterett H. Mercer
  • License: MIT + file LICENSE
  • Last published: 2025-12-22