Command Line Interface
MonoLLM provides a powerful command-line interface (CLI) for interacting with multiple LLM providers without writing code.
Installation
The CLI is automatically available after installing MonoLLM:
# Verify installation
monollm --help
Basic Usage
The CLI follows this general pattern:
monollm <command> [arguments] [options]
Available Commands
list-providers
List all available LLM providers:
monollm list-providers
Example output:
Available Providers:
┌─────────────┬──────────────────────┬───────────┬──────────────┐
│ Provider ID │ Name │ Streaming │ Reasoning │
├─────────────┼──────────────────────┼───────────┼──────────────┤
│ qwen │ Qwen (DashScope) │ ✅ │ ✅ │
│ anthropic │ Anthropic Claude │ ✅ │ ❌ │
│ openai │ OpenAI │ ✅ │ ✅ │
│ deepseek │ DeepSeek │ ✅ │ ✅ │
└─────────────┴──────────────────────┴───────────┴──────────────┘
list-models
List available models:
# List all models
monollm list-models
# List models for specific provider
monollm list-models --provider qwen
Example output:
Qwen Models:
┌─────────────┬─────────────┬───────────┬──────────────┬─────────────┐
│ Model ID │ Name │ Max Tokens│ Reasoning │ Streaming │
├─────────────┼─────────────┼───────────┼──────────────┼─────────────┤
│ qwq-32b │ QwQ 32B │ 8192 │ ✅ │ ✅ │
│ qwen-plus │ Qwen Plus │ 4096 │ ❌ │ ✅ │
└─────────────┴─────────────┴───────────┴──────────────┴─────────────┘
generate
Generate text using a specified model:
monollm generate "Your prompt here" --model MODEL_NAME [options]
Required Arguments:
prompt
: The text prompt to send to the model--model
: The model to use for generation
Optional Arguments:
--temperature FLOAT
: Creativity level (0.0-2.0, default: 0.7)--max-tokens INT
: Maximum output tokens (default: 1000)--stream
: Enable streaming output--thinking
: Show reasoning process (for reasoning models)--system TEXT
: System message to set context
Examples
Basic Text Generation
# Simple generation
monollm generate "What is artificial intelligence?" --model qwen-plus
# With custom parameters
monollm generate "Write a creative story" --model qwen-plus --temperature 0.9 --max-tokens 500
Streaming Output
# Stream the response in real-time
monollm generate "Tell me a long story about space exploration" --model qwen-plus --stream
Reasoning Models
# Use reasoning model with thinking steps
monollm generate "Solve: If a train travels 60 miles in 45 minutes, what is its speed in mph?" --model qwq-32b --thinking
# Complex reasoning problem
monollm generate "A farmer has 17 sheep. All but 9 die. How many are left?" --model qwq-32b --thinking
System Messages
# Set context with system message
monollm generate "What is 15 × 23?" --model qwen-plus --system "You are a helpful math tutor. Show your work step by step."
# Creative writing with context
monollm generate "Write a poem about coding" --model qwen-plus --system "You are a poet who loves technology"
Provider-Specific Examples
Qwen (DashScope):
# Regular model
monollm generate "Explain quantum computing" --model qwen-plus
# Reasoning model
monollm generate "Solve this logic puzzle step by step" --model qwq-32b --thinking
Anthropic Claude:
# Claude 3.5 Sonnet
monollm generate "Write a technical blog post about APIs" --model claude-3-5-sonnet-20241022
OpenAI:
# GPT-4o
monollm generate "Explain machine learning concepts" --model gpt-4o
# O1 reasoning model
monollm generate "Solve this complex math problem" --model o1-preview --thinking
DeepSeek:
# DeepSeek V3
monollm generate "Code review this Python function" --model deepseek-chat
# DeepSeek R1 (reasoning)
monollm generate "Analyze this algorithm's complexity" --model deepseek-reasoner --thinking
Advanced Usage
Environment Variables
Set default values using environment variables:
# Set default model
export MONOLLM_DEFAULT_MODEL=qwen-plus
# Set default temperature
export MONOLLM_DEFAULT_TEMPERATURE=0.7
# Set default max tokens
export MONOLLM_DEFAULT_MAX_TOKENS=1000
Configuration Files
Create a configuration file at ~/.monollm/config.json
:
{
"default_model": "qwen-plus",
"default_temperature": 0.7,
"default_max_tokens": 1000,
"preferred_providers": ["qwen", "anthropic", "openai"]
}
Batch Processing
Process multiple prompts from a file:
# Create a file with prompts (one per line)
echo "What is AI?" > prompts.txt
echo "Explain quantum computing" >> prompts.txt
echo "Benefits of renewable energy" >> prompts.txt
# Process each prompt
while IFS= read -r prompt; do
echo "Prompt: $prompt"
monollm generate "$prompt" --model qwen-plus
echo "---"
done < prompts.txt
Output Formatting
Control output format:
# JSON output
monollm generate "Hello world" --model qwen-plus --format json
# Markdown output
monollm generate "Write a README" --model qwen-plus --format markdown
# Plain text (default)
monollm generate "Simple response" --model qwen-plus --format text
Error Handling
The CLI provides helpful error messages:
Missing API Key:
Error: No API key found for provider 'qwen'
Please set the DASHSCOPE_API_KEY environment variable.
Invalid Model:
Error: Model 'invalid-model' not found
Available models: qwen-plus, qwq-32b, claude-3-5-sonnet-20241022
Network Issues:
Error: Failed to connect to provider
Please check your internet connection and proxy settings.
Debugging
Enable verbose output for debugging:
# Verbose mode
monollm generate "Hello" --model qwen-plus --verbose
# Debug mode
monollm generate "Hello" --model qwen-plus --debug
Performance Tips
Use streaming for long responses to see output immediately
Set appropriate token limits to control response length and cost
Choose the right model for your task (reasoning vs. general)
Use lower temperatures for factual content, higher for creative content
Cache responses when possible to avoid repeated API calls
Integration with Other Tools
Machine Interface & JSON API
MonoLLM provides a powerful machine-friendly JSON API perfect for integration with external applications, automation scripts, and Tauri sidecars. Use the --machine
flag to get structured JSON output:
# Basic JSON output
monollm list-providers --machine
monollm generate "Hello world" --model gpt-4o --machine
JSON Response Format:
{
"content": "Hello! How can I help you today?",
"model": "gpt-4o",
"provider": "openai",
"timestamp": "2025-01-01T12:00:00.000000",
"usage": {
"prompt_tokens": 8,
"completion_tokens": 12,
"total_tokens": 20
}
}
Error Format:
{
"error": true,
"error_type": "ProviderError",
"error_message": "API key not found",
"timestamp": "2025-01-01T12:00:00.000000",
"context": "generate"
}
Available Machine Commands:
# Information commands
monollm list-providers --machine
monollm list-models --machine
monollm model-config qwq-32b --machine
monollm env-info --machine
# Configuration commands
monollm set-defaults qwq-32b --temperature 0.8 --thinking --machine
monollm proxy-config --show --machine
monollm validate-config qwq-32b --temperature 0.8
# Generation commands
monollm generate "prompt" --model gpt-4o --machine
monollm generate-stream "prompt" --model qwq-32b --thinking
monollm chat-api '[{"role": "user", "content": "Hello"}]' --model gpt-4o
Streaming JSON Output:
# Outputs one JSON object per line
monollm generate-stream "Tell a story" --model qwq-32b --thinking
{"type": "chunk", "content": "Once", "is_complete": false, "timestamp": "..."}
{"type": "chunk", "content": " upon", "is_complete": false, "timestamp": "..."}
{"type": "chunk", "thinking": "I should create...", "timestamp": "..."}
{"type": "chunk", "is_complete": true, "timestamp": "..."}
Tauri Sidecar Integration
Perfect for desktop applications built with Tauri:
Rust Example:
use serde_json::Value;
use std::process::Command;
// Synchronous execution
fn generate_response(prompt: &str, model: &str) -> Result<Value, Box<dyn std::error::Error>> {
let output = Command::new("monollm")
.args(&["generate", prompt, "--model", model, "--machine"])
.output()?;
if output.status.success() {
let result: Value = serde_json::from_slice(&output.stdout)?;
Ok(result)
} else {
let error: Value = serde_json::from_slice(&output.stderr)?;
Err(format!("Generation failed: {}", error).into())
}
}
// Usage
let response = generate_response("What is AI?", "gpt-4o")?;
println!("Response: {}", response["content"]);
JavaScript/Node.js Example:
const { exec } = require('child_process');
const { promisify } = require('util');
const execAsync = promisify(exec);
async function generateResponse(prompt, model) {
const cmd = `monollm generate "${prompt}" --model ${model} --machine`;
const { stdout } = await execAsync(cmd);
return JSON.parse(stdout);
}
// Usage
const response = await generateResponse("What is quantum computing?", "gpt-4o");
console.log(response.content);
Python Integration:
import subprocess
import json
def generate_response(prompt, model):
result = subprocess.run([
"monollm", "generate", prompt,
"--model", model, "--machine"
], capture_output=True, text=True)
if result.returncode == 0:
return json.loads(result.stdout)
else:
error = json.loads(result.stderr)
raise Exception(f"Generation failed: {error['error_message']}")
# Usage
response = generate_response("Explain AI", "gpt-4o")
print(response["content"])
Configuration Management API
Programmatically manage model defaults and proxy settings:
# Set model defaults
monollm set-defaults qwq-32b --temperature 0.8 --thinking --stream --machine
# Configure proxy
monollm proxy-config --http http://proxy:8080 --machine
# Validate configuration
monollm validate-config qwq-32b --temperature 0.8 --stream false
Complete Machine Interface Documentation: src/monollm/cli/README-MACHINE.md
Pipe Output
# Save to file
monollm generate "Write a Python script" --model qwq-32b > script.py
# Pipe to other commands
monollm generate "List of programming languages" --model qwen-plus | grep -i python
Shell Scripts
#!/bin/bash
# ai-helper.sh
MODEL="qwen-plus"
PROMPT="$1"
if [ -z "$PROMPT" ]; then
echo "Usage: $0 'your prompt here'"
exit 1
fi
monollm generate "$PROMPT" --model "$MODEL" --stream
# Usage: ./ai-helper.sh "Explain Docker containers"
Aliases
Create convenient aliases:
# Add to ~/.bashrc or ~/.zshrc
alias ai='monollm generate'
alias ai-reason='monollm generate --model qwq-32b --thinking'
alias ai-stream='monollm generate --stream'
alias ai-creative='monollm generate --temperature 0.9'
# Usage:
# ai "What is machine learning?" --model qwen-plus
# ai-reason "Solve this math problem"
# ai-stream "Tell me a story" --model qwen-plus
Configuration Reference
Command Line Options
Global Options:
--help, -h Show help message
--version, -v Show version information
--config PATH Custom configuration file path
--verbose Enable verbose output
--debug Enable debug output
Generate Command Options:
--model, -m TEXT Model to use (required)
--temperature FLOAT Temperature (0.0-2.0)
--max-tokens INT Maximum output tokens
--stream Enable streaming
--thinking Show reasoning (reasoning models only)
--system TEXT System message
--format TEXT Output format (text|json|markdown)
Environment Variables
# API Keys
OPENAI_API_KEY # OpenAI API key
ANTHROPIC_API_KEY # Anthropic API key
GOOGLE_API_KEY # Google Gemini API key
DASHSCOPE_API_KEY # Qwen/DashScope API key
DEEPSEEK_API_KEY # DeepSeek API key
VOLCENGINE_API_KEY # Volcengine API key
# Proxy Settings
PROXY_ENABLED # Enable proxy (true/false)
PROXY_TYPE # Proxy type (http/socks5)
PROXY_HOST # Proxy host
PROXY_PORT # Proxy port
PROXY_USERNAME # Proxy username (optional)
PROXY_PASSWORD # Proxy password (optional)
# CLI Defaults
MONOLLM_DEFAULT_MODEL # Default model
MONOLLM_DEFAULT_TEMPERATURE # Default temperature
MONOLLM_DEFAULT_MAX_TOKENS # Default max tokens
Troubleshooting
Common Issues
Command not found:
# Ensure MonoLLM is installed
pip install -e .
# Check if it's in PATH
which monollm
Permission denied:
# On Unix systems, ensure execute permissions
chmod +x $(which monollm)
Slow responses:
# Use streaming for immediate feedback
monollm generate "long prompt" --model qwen-plus --stream
# Reduce max tokens for faster responses
monollm generate "prompt" --model qwen-plus --max-tokens 100
Getting Help
# General help
monollm --help
# Command-specific help
monollm generate --help
monollm list-models --help
# Version information
monollm --version
The CLI provides a convenient way to access MonoLLM’s capabilities without writing code, making it perfect for quick tasks, scripting, and experimentation.