Testing Guide

MonoLLM includes a comprehensive test suite to validate functionality across all supported providers and models. This guide covers how to use the testing utilities to verify your setup and ensure everything works correctly.

Overview

The test suite is located in the test/ directory and includes:

Model Testing: Validate all configured models
Provider Testing: Test provider-specific functionality
Reasoning Testing: Specialized tests for thinking-capable models
Integration Testing: End-to-end functionality validation

Test Scripts

Quick Test Runner

The unified test runner provides a convenient entry point for all testing:

# Quick test with a known working model
python test/run_tests.py --quick

# Run comprehensive test suite
python test/run_tests.py --all

# Test specific provider
python test/run_tests.py --provider qwen

# Test specific model
python test/run_tests.py --model qwq-32b --reasoning

Individual Test Scripts

test_all_models.py - Comprehensive model testing:

# Test all configured models
python test/test_all_models.py

test_single_model.py - Individual model testing:

# Basic test
python test/test_single_model.py gpt-4o-mini

# Test with streaming
python test/test_single_model.py claude-3-5-sonnet-20241022 --stream

# Test reasoning model
python test/test_single_model.py qwq-32b --reasoning --stream

test_thinking.py - Reasoning model testing:

# Test all thinking models
python test/test_thinking.py

# Test specific model
python test/test_thinking.py --model qwq-32b

# Test specific reasoning scenario
python test/test_thinking.py --test logic_puzzle

test_providers.py - Provider-specific testing:

# Test all providers
python test/test_providers.py

# Test specific provider
python test/test_providers.py --provider anthropic

# Test specific functionality
python test/test_providers.py --provider qwen --test thinking

Setting Up Tests

Prerequisites

API Keys: Configure API keys for the providers you want to test
Environment: Set up your .env file or environment variables
Dependencies: Ensure all dependencies are installed

Environment Setup

Create a .env file in your project root:

OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
DASHSCOPE_API_KEY=your-dashscope-api-key
DEEPSEEK_API_KEY=your-deepseek-api-key

Running Your First Test

Start with a quick test to verify your setup:

python test/run_tests.py --quick

This will test a known working model (QwQ-32B) with reasoning capabilities.

Test Categories

Basic Functionality Tests

These tests verify core functionality:

Text Generation: Basic prompt-response functionality
Configuration: Model parameter handling
Error Handling: Graceful failure scenarios
Token Usage: Usage tracking and reporting

Example Output:

Testing Model: gpt-4o-mini
✓ Basic generation: Success (1.2s, 45 tokens)
✓ Configuration: Temperature and max_tokens applied
✓ Usage tracking: 45 total tokens

Streaming Tests

Validate real-time streaming capabilities:

Stream Chunks: Proper chunk delivery
Completion Detection: Stream termination handling
Content Assembly: Correct content reconstruction

Example Output:

Testing Streaming: claude-3-5-sonnet-20241022
✓ Stream initialization: Success
✓ Chunk delivery: 23 chunks received
✓ Stream completion: Properly terminated
✓ Content assembly: 156 characters total

Reasoning Tests

Specialized tests for thinking-capable models:

Thinking Steps: Reasoning process validation
Quality Analysis: Thinking content evaluation
Step Coverage: Expected reasoning step detection

Test Scenarios:

basic_math: Simple arithmetic with step-by-step solving
logic_puzzle: Constraint satisfaction problems
multi_step_problem: Complex multi-step calculations
complex_reasoning: Advanced problem-solving strategies
code_reasoning: Code debugging and analysis

Example Output:

Testing QwQ-32B Reasoning:
✓ basic_math: Quality score 0.95 (4/4 steps covered)
✓ logic_puzzle: Quality score 0.88 (3/3 steps covered)
✓ Thinking length: 1,247 characters
✓ Final answer: Correct and complete

Provider-Specific Tests

Test provider-unique features and edge cases:

OpenAI Provider: - Temperature control - Special character handling - Long prompt processing

Anthropic Provider: - System message support - Multi-turn conversations - MCP integration

Qwen Provider: - Chinese language support - Code generation - Thinking mode capabilities

DeepSeek Provider: - Code analysis - Algorithm design - Reasoning capabilities

Understanding Test Results

Success Indicators

✓ PASS: Test completed successfully
Quality Score: 0.8+ indicates high-quality reasoning
Response Time: Typical response latencies
Token Usage: Accurate usage tracking

Partial Success

⚠ PARTIAL (2/3): Some tests failed
⏭ SKIP: Test not applicable (e.g., thinking test on non-reasoning model)
Stream Only: Model requires streaming mode

Failure Indicators

✗ FAIL: Test failed with error
Provider Error: API-related issues
Timeout: Request exceeded time limit
Configuration Error: Setup issues

Common Test Scenarios

Validating New Setup

When setting up MonoLLM for the first time:

# 1. Quick validation
python test/run_tests.py --quick

# 2. Test your primary provider
python test/run_tests.py --provider openai

# 3. Validate reasoning models if needed
python test/run_tests.py --thinking

Testing After Configuration Changes

After modifying config/models.json or adding new API keys:

# Test specific model
python test/test_single_model.py new-model-id

# Test provider functionality
python test/test_providers.py --provider new-provider

Continuous Integration

For CI/CD pipelines:

# Run all tests with timeout
timeout 300 python test/run_tests.py --all

# Test critical models only
python test/test_single_model.py gpt-4o-mini
python test/test_single_model.py qwq-32b --reasoning

Troubleshooting Tests

Common Issues

API Key Missing

Warning: No API key found for provider 'openai'

Solution: Add the API key to your .env file or environment variables.

Model Not Found

Model 'gpt-5' not found in any provider

Solution: Check config/models.json for available models.

Rate Limiting

Error code: 429 - Rate limit exceeded

Solution: Wait and retry, or implement additional backoff strategies.

Quota Exceeded

Error code: 429 - You exceeded your current quota

Solution: Check your API billing or use a different provider.

Debug Mode

For detailed error information, check the console output during test runs. The test scripts provide comprehensive error messages and suggestions.

Performance Monitoring

The test suite includes timing information to help monitor:

Response Latency: Time to first response
Streaming Performance: Chunk delivery rate
Thinking Generation: Reasoning process speed

Custom Testing

Creating Custom Tests

You can extend the test scripts for your specific needs:

# Add to test_thinking.py
CUSTOM_PROMPTS = {
    "domain_specific": {
        "prompt": "Your domain-specific test prompt",
        "expected_steps": ["step1", "step2", "step3"],
        "difficulty": "medium"
    }
}

Batch Testing

Test multiple models sequentially:

# Test multiple models
for model in qwq-32b claude-3-5-sonnet-20241022 deepseek-chat; do
    echo "Testing $model..."
    python test/test_single_model.py $model --stream
done

Integration with Development

Integrate testing into your development workflow:

# Pre-commit testing
python test/run_tests.py --quick

# Feature testing
python test/test_single_model.py your-model --custom-prompt "Your test"

# Performance testing
python test/test_providers.py --provider your-provider

Best Practices

Start Small: Begin with quick tests before running comprehensive suites
Test Incrementally: Test new configurations immediately
Monitor Usage: Be aware of API costs during extensive testing
Document Results: Keep track of which models work best for your use cases
Regular Validation: Run tests periodically to catch configuration drift

The testing suite provides a robust foundation for validating your MonoLLM setup and ensuring reliable operation across all supported providers and models.