Testing Guide
MonoLLM includes a comprehensive test suite to validate functionality across all supported providers and models. This guide covers how to use the testing utilities to verify your setup and ensure everything works correctly.
Overview
The test suite is located in the test/
directory and includes:
Model Testing: Validate all configured models
Provider Testing: Test provider-specific functionality
Reasoning Testing: Specialized tests for thinking-capable models
Integration Testing: End-to-end functionality validation
Test Scripts
Quick Test Runner
The unified test runner provides a convenient entry point for all testing:
# Quick test with a known working model
python test/run_tests.py --quick
# Run comprehensive test suite
python test/run_tests.py --all
# Test specific provider
python test/run_tests.py --provider qwen
# Test specific model
python test/run_tests.py --model qwq-32b --reasoning
Individual Test Scripts
test_all_models.py - Comprehensive model testing:
# Test all configured models
python test/test_all_models.py
test_single_model.py - Individual model testing:
# Basic test
python test/test_single_model.py gpt-4o-mini
# Test with streaming
python test/test_single_model.py claude-3-5-sonnet-20241022 --stream
# Test reasoning model
python test/test_single_model.py qwq-32b --reasoning --stream
test_thinking.py - Reasoning model testing:
# Test all thinking models
python test/test_thinking.py
# Test specific model
python test/test_thinking.py --model qwq-32b
# Test specific reasoning scenario
python test/test_thinking.py --test logic_puzzle
test_providers.py - Provider-specific testing:
# Test all providers
python test/test_providers.py
# Test specific provider
python test/test_providers.py --provider anthropic
# Test specific functionality
python test/test_providers.py --provider qwen --test thinking
Setting Up Tests
Prerequisites
API Keys: Configure API keys for the providers you want to test
Environment: Set up your
.env
file or environment variablesDependencies: Ensure all dependencies are installed
Environment Setup
Create a .env
file in your project root:
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
DASHSCOPE_API_KEY=your-dashscope-api-key
DEEPSEEK_API_KEY=your-deepseek-api-key
Running Your First Test
Start with a quick test to verify your setup:
python test/run_tests.py --quick
This will test a known working model (QwQ-32B) with reasoning capabilities.
Test Categories
Basic Functionality Tests
These tests verify core functionality:
Text Generation: Basic prompt-response functionality
Configuration: Model parameter handling
Error Handling: Graceful failure scenarios
Token Usage: Usage tracking and reporting
Example Output:
Testing Model: gpt-4o-mini
✓ Basic generation: Success (1.2s, 45 tokens)
✓ Configuration: Temperature and max_tokens applied
✓ Usage tracking: 45 total tokens
Streaming Tests
Validate real-time streaming capabilities:
Stream Chunks: Proper chunk delivery
Completion Detection: Stream termination handling
Content Assembly: Correct content reconstruction
Example Output:
Testing Streaming: claude-3-5-sonnet-20241022
✓ Stream initialization: Success
✓ Chunk delivery: 23 chunks received
✓ Stream completion: Properly terminated
✓ Content assembly: 156 characters total
Reasoning Tests
Specialized tests for thinking-capable models:
Thinking Steps: Reasoning process validation
Quality Analysis: Thinking content evaluation
Step Coverage: Expected reasoning step detection
Test Scenarios:
basic_math: Simple arithmetic with step-by-step solving
logic_puzzle: Constraint satisfaction problems
multi_step_problem: Complex multi-step calculations
complex_reasoning: Advanced problem-solving strategies
code_reasoning: Code debugging and analysis
Example Output:
Testing QwQ-32B Reasoning:
✓ basic_math: Quality score 0.95 (4/4 steps covered)
✓ logic_puzzle: Quality score 0.88 (3/3 steps covered)
✓ Thinking length: 1,247 characters
✓ Final answer: Correct and complete
Provider-Specific Tests
Test provider-unique features and edge cases:
OpenAI Provider: - Temperature control - Special character handling - Long prompt processing
Anthropic Provider: - System message support - Multi-turn conversations - MCP integration
Qwen Provider: - Chinese language support - Code generation - Thinking mode capabilities
DeepSeek Provider: - Code analysis - Algorithm design - Reasoning capabilities
Understanding Test Results
Success Indicators
✓ PASS: Test completed successfully
Quality Score: 0.8+ indicates high-quality reasoning
Response Time: Typical response latencies
Token Usage: Accurate usage tracking
Partial Success
⚠ PARTIAL (2/3): Some tests failed
⏭ SKIP: Test not applicable (e.g., thinking test on non-reasoning model)
Stream Only: Model requires streaming mode
Failure Indicators
✗ FAIL: Test failed with error
Provider Error: API-related issues
Timeout: Request exceeded time limit
Configuration Error: Setup issues
Common Test Scenarios
Validating New Setup
When setting up MonoLLM for the first time:
# 1. Quick validation
python test/run_tests.py --quick
# 2. Test your primary provider
python test/run_tests.py --provider openai
# 3. Validate reasoning models if needed
python test/run_tests.py --thinking
Testing After Configuration Changes
After modifying config/models.json
or adding new API keys:
# Test specific model
python test/test_single_model.py new-model-id
# Test provider functionality
python test/test_providers.py --provider new-provider
Continuous Integration
For CI/CD pipelines:
# Run all tests with timeout
timeout 300 python test/run_tests.py --all
# Test critical models only
python test/test_single_model.py gpt-4o-mini
python test/test_single_model.py qwq-32b --reasoning
Troubleshooting Tests
Common Issues
API Key Missing
Warning: No API key found for provider 'openai'
Solution: Add the API key to your .env
file or environment variables.
Model Not Found
Model 'gpt-5' not found in any provider
Solution: Check config/models.json
for available models.
Rate Limiting
Error code: 429 - Rate limit exceeded
Solution: Wait and retry, or implement additional backoff strategies.
Quota Exceeded
Error code: 429 - You exceeded your current quota
Solution: Check your API billing or use a different provider.
Debug Mode
For detailed error information, check the console output during test runs. The test scripts provide comprehensive error messages and suggestions.
Performance Monitoring
The test suite includes timing information to help monitor:
Response Latency: Time to first response
Streaming Performance: Chunk delivery rate
Thinking Generation: Reasoning process speed
Custom Testing
Creating Custom Tests
You can extend the test scripts for your specific needs:
# Add to test_thinking.py
CUSTOM_PROMPTS = {
"domain_specific": {
"prompt": "Your domain-specific test prompt",
"expected_steps": ["step1", "step2", "step3"],
"difficulty": "medium"
}
}
Batch Testing
Test multiple models sequentially:
# Test multiple models
for model in qwq-32b claude-3-5-sonnet-20241022 deepseek-chat; do
echo "Testing $model..."
python test/test_single_model.py $model --stream
done
Integration with Development
Integrate testing into your development workflow:
# Pre-commit testing
python test/run_tests.py --quick
# Feature testing
python test/test_single_model.py your-model --custom-prompt "Your test"
# Performance testing
python test/test_providers.py --provider your-provider
Best Practices
Start Small: Begin with quick tests before running comprehensive suites
Test Incrementally: Test new configurations immediately
Monitor Usage: Be aware of API costs during extensive testing
Document Results: Keep track of which models work best for your use cases
Regular Validation: Run tests periodically to catch configuration drift
The testing suite provides a robust foundation for validating your MonoLLM setup and ensuring reliable operation across all supported providers and models.