Command Line Interface

MonoLLM provides a powerful command-line interface (CLI) for interacting with multiple LLM providers without writing code.

Installation

The CLI is automatically available after installing MonoLLM:

# Verify installation
monollm --help

Basic Usage

The CLI follows this general pattern:

monollm <command> [arguments] [options]

Available Commands

list-providers

List all available LLM providers:

monollm list-providers

Example output:

Available Providers:
┌─────────────┬──────────────────────┬───────────┬──────────────┐
│ Provider ID │ Name                 │ Streaming │ Reasoning    │
├─────────────┼──────────────────────┼───────────┼──────────────┤
│ qwen        │ Qwen (DashScope)     │ ✅        │ ✅           │
│ anthropic   │ Anthropic Claude     │ ✅        │ ❌           │
│ openai      │ OpenAI               │ ✅        │ ✅           │
│ deepseek    │ DeepSeek             │ ✅        │ ✅           │
└─────────────┴──────────────────────┴───────────┴──────────────┘

list-models

List available models:

# List all models
monollm list-models

# List models for specific provider
monollm list-models --provider qwen

Example output:

Qwen Models:
┌─────────────┬─────────────┬───────────┬──────────────┬─────────────┐
│ Model ID    │ Name        │ Max Tokens│ Reasoning    │ Streaming   │
├─────────────┼─────────────┼───────────┼──────────────┼─────────────┤
│ qwq-32b     │ QwQ 32B     │ 8192      │ ✅           │ ✅          │
│ qwen-plus   │ Qwen Plus   │ 4096      │ ❌           │ ✅          │
└─────────────┴─────────────┴───────────┴──────────────┴─────────────┘

generate

Generate text using a specified model:

monollm generate "Your prompt here" --model MODEL_NAME [options]

Required Arguments:

  • prompt: The text prompt to send to the model

  • --model: The model to use for generation

Optional Arguments:

  • --temperature FLOAT: Creativity level (0.0-2.0, default: 0.7)

  • --max-tokens INT: Maximum output tokens (default: 1000)

  • --stream: Enable streaming output

  • --thinking: Show reasoning process (for reasoning models)

  • --system TEXT: System message to set context

Examples

Basic Text Generation

# Simple generation
monollm generate "What is artificial intelligence?" --model qwen-plus

# With custom parameters
monollm generate "Write a creative story" --model qwen-plus --temperature 0.9 --max-tokens 500

Streaming Output

# Stream the response in real-time
monollm generate "Tell me a long story about space exploration" --model qwen-plus --stream

Reasoning Models

# Use reasoning model with thinking steps
monollm generate "Solve: If a train travels 60 miles in 45 minutes, what is its speed in mph?" --model qwq-32b --thinking

# Complex reasoning problem
monollm generate "A farmer has 17 sheep. All but 9 die. How many are left?" --model qwq-32b --thinking

System Messages

# Set context with system message
monollm generate "What is 15 × 23?" --model qwen-plus --system "You are a helpful math tutor. Show your work step by step."

# Creative writing with context
monollm generate "Write a poem about coding" --model qwen-plus --system "You are a poet who loves technology"

Provider-Specific Examples

Qwen (DashScope):

# Regular model
monollm generate "Explain quantum computing" --model qwen-plus

# Reasoning model
monollm generate "Solve this logic puzzle step by step" --model qwq-32b --thinking

Anthropic Claude:

# Claude 3.5 Sonnet
monollm generate "Write a technical blog post about APIs" --model claude-3-5-sonnet-20241022

OpenAI:

# GPT-4o
monollm generate "Explain machine learning concepts" --model gpt-4o

# O1 reasoning model
monollm generate "Solve this complex math problem" --model o1-preview --thinking

DeepSeek:

# DeepSeek V3
monollm generate "Code review this Python function" --model deepseek-chat

# DeepSeek R1 (reasoning)
monollm generate "Analyze this algorithm's complexity" --model deepseek-reasoner --thinking

Advanced Usage

Environment Variables

Set default values using environment variables:

# Set default model
export MONOLLM_DEFAULT_MODEL=qwen-plus

# Set default temperature
export MONOLLM_DEFAULT_TEMPERATURE=0.7

# Set default max tokens
export MONOLLM_DEFAULT_MAX_TOKENS=1000

Configuration Files

Create a configuration file at ~/.monollm/config.json:

{
  "default_model": "qwen-plus",
  "default_temperature": 0.7,
  "default_max_tokens": 1000,
  "preferred_providers": ["qwen", "anthropic", "openai"]
}

Batch Processing

Process multiple prompts from a file:

# Create a file with prompts (one per line)
echo "What is AI?" > prompts.txt
echo "Explain quantum computing" >> prompts.txt
echo "Benefits of renewable energy" >> prompts.txt

# Process each prompt
while IFS= read -r prompt; do
    echo "Prompt: $prompt"
    monollm generate "$prompt" --model qwen-plus
    echo "---"
done < prompts.txt

Output Formatting

Control output format:

# JSON output
monollm generate "Hello world" --model qwen-plus --format json

# Markdown output
monollm generate "Write a README" --model qwen-plus --format markdown

# Plain text (default)
monollm generate "Simple response" --model qwen-plus --format text

Error Handling

The CLI provides helpful error messages:

Missing API Key:

Error: No API key found for provider 'qwen'
Please set the DASHSCOPE_API_KEY environment variable.

Invalid Model:

Error: Model 'invalid-model' not found
Available models: qwen-plus, qwq-32b, claude-3-5-sonnet-20241022

Network Issues:

Error: Failed to connect to provider
Please check your internet connection and proxy settings.

Debugging

Enable verbose output for debugging:

# Verbose mode
monollm generate "Hello" --model qwen-plus --verbose

# Debug mode
monollm generate "Hello" --model qwen-plus --debug

Performance Tips

  1. Use streaming for long responses to see output immediately

  2. Set appropriate token limits to control response length and cost

  3. Choose the right model for your task (reasoning vs. general)

  4. Use lower temperatures for factual content, higher for creative content

  5. Cache responses when possible to avoid repeated API calls

Integration with Other Tools

Machine Interface & JSON API

MonoLLM provides a powerful machine-friendly JSON API perfect for integration with external applications, automation scripts, and Tauri sidecars. Use the --machine flag to get structured JSON output:

# Basic JSON output
monollm list-providers --machine
monollm generate "Hello world" --model gpt-4o --machine

JSON Response Format:

{
  "content": "Hello! How can I help you today?",
  "model": "gpt-4o",
  "provider": "openai",
  "timestamp": "2025-01-01T12:00:00.000000",
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 12,
    "total_tokens": 20
  }
}

Error Format:

{
  "error": true,
  "error_type": "ProviderError",
  "error_message": "API key not found",
  "timestamp": "2025-01-01T12:00:00.000000",
  "context": "generate"
}

Available Machine Commands:

# Information commands
monollm list-providers --machine
monollm list-models --machine
monollm model-config qwq-32b --machine
monollm env-info --machine

# Configuration commands
monollm set-defaults qwq-32b --temperature 0.8 --thinking --machine
monollm proxy-config --show --machine
monollm validate-config qwq-32b --temperature 0.8

# Generation commands
monollm generate "prompt" --model gpt-4o --machine
monollm generate-stream "prompt" --model qwq-32b --thinking
monollm chat-api '[{"role": "user", "content": "Hello"}]' --model gpt-4o

Streaming JSON Output:

# Outputs one JSON object per line
monollm generate-stream "Tell a story" --model qwq-32b --thinking
{"type": "chunk", "content": "Once", "is_complete": false, "timestamp": "..."}
{"type": "chunk", "content": " upon", "is_complete": false, "timestamp": "..."}
{"type": "chunk", "thinking": "I should create...", "timestamp": "..."}
{"type": "chunk", "is_complete": true, "timestamp": "..."}

Tauri Sidecar Integration

Perfect for desktop applications built with Tauri:

Rust Example:

use serde_json::Value;
use std::process::Command;

// Synchronous execution
fn generate_response(prompt: &str, model: &str) -> Result<Value, Box<dyn std::error::Error>> {
    let output = Command::new("monollm")
        .args(&["generate", prompt, "--model", model, "--machine"])
        .output()?;

    if output.status.success() {
        let result: Value = serde_json::from_slice(&output.stdout)?;
        Ok(result)
    } else {
        let error: Value = serde_json::from_slice(&output.stderr)?;
        Err(format!("Generation failed: {}", error).into())
    }
}

// Usage
let response = generate_response("What is AI?", "gpt-4o")?;
println!("Response: {}", response["content"]);

JavaScript/Node.js Example:

const { exec } = require('child_process');
const { promisify } = require('util');
const execAsync = promisify(exec);

async function generateResponse(prompt, model) {
    const cmd = `monollm generate "${prompt}" --model ${model} --machine`;
    const { stdout } = await execAsync(cmd);
    return JSON.parse(stdout);
}

// Usage
const response = await generateResponse("What is quantum computing?", "gpt-4o");
console.log(response.content);

Python Integration:

import subprocess
import json

def generate_response(prompt, model):
    result = subprocess.run([
        "monollm", "generate", prompt,
        "--model", model, "--machine"
    ], capture_output=True, text=True)

    if result.returncode == 0:
        return json.loads(result.stdout)
    else:
        error = json.loads(result.stderr)
        raise Exception(f"Generation failed: {error['error_message']}")

# Usage
response = generate_response("Explain AI", "gpt-4o")
print(response["content"])

Configuration Management API

Programmatically manage model defaults and proxy settings:

# Set model defaults
monollm set-defaults qwq-32b --temperature 0.8 --thinking --stream --machine

# Configure proxy
monollm proxy-config --http http://proxy:8080 --machine

# Validate configuration
monollm validate-config qwq-32b --temperature 0.8 --stream false

Complete Machine Interface Documentation: src/monollm/cli/README-MACHINE.md

Pipe Output

# Save to file
monollm generate "Write a Python script" --model qwq-32b > script.py

# Pipe to other commands
monollm generate "List of programming languages" --model qwen-plus | grep -i python

Shell Scripts

#!/bin/bash
# ai-helper.sh

MODEL="qwen-plus"
PROMPT="$1"

if [ -z "$PROMPT" ]; then
    echo "Usage: $0 'your prompt here'"
    exit 1
fi

monollm generate "$PROMPT" --model "$MODEL" --stream

# Usage: ./ai-helper.sh "Explain Docker containers"

Aliases

Create convenient aliases:

# Add to ~/.bashrc or ~/.zshrc
alias ai='monollm generate'
alias ai-reason='monollm generate --model qwq-32b --thinking'
alias ai-stream='monollm generate --stream'
alias ai-creative='monollm generate --temperature 0.9'

# Usage:
# ai "What is machine learning?" --model qwen-plus
# ai-reason "Solve this math problem"
# ai-stream "Tell me a story" --model qwen-plus

Configuration Reference

Command Line Options

Global Options:
  --help, -h          Show help message
  --version, -v       Show version information
  --config PATH       Custom configuration file path
  --verbose           Enable verbose output
  --debug             Enable debug output

Generate Command Options:
  --model, -m TEXT    Model to use (required)
  --temperature FLOAT Temperature (0.0-2.0)
  --max-tokens INT    Maximum output tokens
  --stream            Enable streaming
  --thinking          Show reasoning (reasoning models only)
  --system TEXT       System message
  --format TEXT       Output format (text|json|markdown)

Environment Variables

# API Keys
OPENAI_API_KEY          # OpenAI API key
ANTHROPIC_API_KEY       # Anthropic API key
GOOGLE_API_KEY          # Google Gemini API key
DASHSCOPE_API_KEY       # Qwen/DashScope API key
DEEPSEEK_API_KEY        # DeepSeek API key
VOLCENGINE_API_KEY      # Volcengine API key

# Proxy Settings
PROXY_ENABLED           # Enable proxy (true/false)
PROXY_TYPE              # Proxy type (http/socks5)
PROXY_HOST              # Proxy host
PROXY_PORT              # Proxy port
PROXY_USERNAME          # Proxy username (optional)
PROXY_PASSWORD          # Proxy password (optional)

# CLI Defaults
MONOLLM_DEFAULT_MODEL       # Default model
MONOLLM_DEFAULT_TEMPERATURE # Default temperature
MONOLLM_DEFAULT_MAX_TOKENS  # Default max tokens

Troubleshooting

Common Issues

Command not found:

# Ensure MonoLLM is installed
pip install -e .

# Check if it's in PATH
which monollm

Permission denied:

# On Unix systems, ensure execute permissions
chmod +x $(which monollm)

Slow responses:

# Use streaming for immediate feedback
monollm generate "long prompt" --model qwen-plus --stream

# Reduce max tokens for faster responses
monollm generate "prompt" --model qwen-plus --max-tokens 100

Getting Help

# General help
monollm --help

# Command-specific help
monollm generate --help
monollm list-models --help

# Version information
monollm --version

The CLI provides a convenient way to access MonoLLM’s capabilities without writing code, making it perfect for quick tasks, scripting, and experimentation.