Quick Start Guide

This guide will help you get started with MonoLLM quickly and efficiently.

Installation

Prerequisites

Python 3.12 or higher
API keys for the providers you want to use

Install MonoLLM

Using pip:

pip install monollm

From source:

git clone https://github.com/cyborgoat/MonoLLM.git
cd MonoLLM
pip install -e .

Configuration

Environment Variables

Set up your API keys as environment variables:

# OpenAI
export OPENAI_API_KEY="your-openai-api-key"

# Anthropic
export ANTHROPIC_API_KEY="your-anthropic-api-key"

# Qwen/DashScope
export DASHSCOPE_API_KEY="your-dashscope-api-key"

# DeepSeek
export DEEPSEEK_API_KEY="your-deepseek-api-key"

# Google (optional)
export GOOGLE_API_KEY="your-google-api-key"

Using .env File

Create a .env file in your project root:

OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
DASHSCOPE_API_KEY=your-dashscope-api-key
DEEPSEEK_API_KEY=your-deepseek-api-key

Basic Usage

Simple Text Generation

import asyncio
from monollm import UnifiedLLMClient, RequestConfig

async def basic_example():
    async with UnifiedLLMClient() as client:
        config = RequestConfig(
            model="gpt-4o-mini",
            temperature=0.7,
            max_tokens=100
        )

        response = await client.generate(
            "Explain machine learning in one paragraph",
            config
        )

        print(response.content)
        print(f"Tokens used: {response.usage.total_tokens}")

asyncio.run(basic_example())

Streaming Responses

For real-time response streaming:

import asyncio
from monollm import UnifiedLLMClient, RequestConfig

async def streaming_example():
    async with UnifiedLLMClient() as client:
        config = RequestConfig(
            model="claude-3-5-sonnet-20241022",
            temperature=0.7,
            stream=True
        )

        print("Streaming response:")
        async for chunk in await client.generate_stream(
            "Write a short story about a robot",
            config
        ):
            if chunk.content:
                print(chunk.content, end="", flush=True)

            if chunk.is_complete:
                print("\n\nStreaming complete!")
                break

asyncio.run(streaming_example())

Multi-turn Conversations

import asyncio
from monollm import UnifiedLLMClient, RequestConfig, Message

async def conversation_example():
    async with UnifiedLLMClient() as client:
        messages = [
            Message(role="system", content="You are a helpful programming assistant."),
            Message(role="user", content="How do I create a list in Python?"),
            Message(role="assistant", content="You can create a list using square brackets: my_list = [1, 2, 3]"),
            Message(role="user", content="How do I add items to it?")
        ]

        config = RequestConfig(model="gpt-4o")
        response = await client.generate(messages, config)

        print("Assistant:", response.content)

asyncio.run(conversation_example())

Reasoning Models

Use models with thinking capabilities:

import asyncio
from monollm import UnifiedLLMClient, RequestConfig

async def reasoning_example():
    async with UnifiedLLMClient() as client:
        config = RequestConfig(
            model="qwq-32b",  # Qwen's reasoning model
            temperature=0.7,
            show_thinking=True,
            stream=True  # Required for QwQ models
        )

        prompt = "Solve this step by step: If a train travels 60 miles in 45 minutes, what is its speed in mph?"

        thinking_content = ""
        final_answer = ""

        async for chunk in await client.generate_stream(prompt, config):
            if chunk.thinking:
                thinking_content += chunk.thinking
            if chunk.content:
                final_answer += chunk.content
            if chunk.is_complete:
                break

        print("Thinking process:")
        print(thinking_content[:200] + "..." if len(thinking_content) > 200 else thinking_content)
        print("\nFinal answer:")
        print(final_answer)

asyncio.run(reasoning_example())

Command Line Interface

MonoLLM includes a powerful CLI for quick interactions:

List Available Providers

monollm list-providers

List Available Models

monollm list-models
monollm list-models --provider qwen

Generate Text

# Basic generation
monollm generate "What is artificial intelligence?" --model gpt-4o-mini

# With streaming
monollm generate "Write a poem about coding" --model claude-3-5-sonnet-20241022 --stream

# Reasoning model with thinking
monollm generate "Solve: 2x + 5 = 13" --model qwq-32b --thinking

Interactive Chat

# Start interactive chat
monollm chat gpt-4o --stream

# Chat with reasoning model
monollm chat qwq-32b --thinking

Error Handling

Always handle potential errors in production code:

import asyncio
from monollm import UnifiedLLMClient, RequestConfig
from monollm.core.exceptions import (
    ProviderError,
    RateLimitError,
    QuotaExceededError,
    ModelNotFoundError
)

async def robust_example():
    async with UnifiedLLMClient() as client:
        try:
            config = RequestConfig(model="gpt-4o")
            response = await client.generate("Hello, world!", config)
            print(response.content)

        except ModelNotFoundError as e:
            print(f"Model not found: {e.message}")
        except RateLimitError as e:
            print(f"Rate limit exceeded: {e.message}")
        except QuotaExceededError as e:
            print(f"Quota exceeded: {e.message}")
        except ProviderError as e:
            print(f"Provider error: {e.message}")
        except Exception as e:
            print(f"Unexpected error: {e}")

asyncio.run(robust_example())

Testing Your Setup

Use the built-in test utilities to verify your configuration:

# Quick test with a working model
python test/run_tests.py --quick

# Test specific provider
python test/run_tests.py --provider qwen

# Test reasoning capabilities
python test/run_tests.py --thinking

Next Steps

Read the Configuration guide for advanced setup options
Explore Examples for more complex use cases
Check the Client API reference for detailed API documentation
Visit the Testing Guide guide to validate your setup

Common Issues

API Key Not Found: Make sure your environment variables are set correctly or your .env file is in the right location.
Model Not Available: Check if the model is configured in config/models.json and your API key has access to it.
Rate Limiting: MonoLLM includes built-in retry mechanisms, but you may need to implement additional backoff strategies for high-volume usage.
Streaming Issues: Some models require streaming mode (like QwQ models). The client will automatically enable streaming when needed.