Client API

The UnifiedLLMClient is the main entry point for interacting with multiple LLM providers through a unified interface.

UnifiedLLMClient

Usage Examples

Basic Usage

import asyncio
from monollm import UnifiedLLMClient, RequestConfig

async def main():
    async with UnifiedLLMClient() as client:
        config = RequestConfig(model="qwq-32b")
        response = await client.generate("Hello, world!", config)
        print(response.content)

asyncio.run(main())

Initialization Options

from pathlib import Path
from rich.console import Console

# Custom configuration directory
client = UnifiedLLMClient(config_dir=Path("./my_config"))

# Custom console for logging
console = Console()
client = UnifiedLLMClient(console=console)

Provider Management

async with UnifiedLLMClient() as client:
    # List all available providers
    providers = client.list_providers()
    for provider_id, info in providers.items():
        print(f"{provider_id}: {info.name}")

    # List models for a specific provider
    models = client.list_models(provider_id="qwen")
    for model_id, info in models["qwen"].items():
        print(f"{model_id}: {info.name}")

    # Get information about a specific model
    provider_id, model_info = client.get_model_info("qwq-32b")
    print(f"Model: {model_info.name}")
    print(f"Provider: {provider_id}")
    print(f"Max tokens: {model_info.max_tokens}")

Text Generation

async with UnifiedLLMClient() as client:
    # Simple text generation
    config = RequestConfig(
        model="qwen-plus",
        temperature=0.7,
        max_tokens=1000
    )

    response = await client.generate(
        "Explain quantum computing",
        config
    )

    print(f"Response: {response.content}")
    print(f"Tokens used: {response.usage.total_tokens}")

Streaming Generation

async with UnifiedLLMClient() as client:
    config = RequestConfig(
        model="qwen-plus",
        stream=True
    )

    streaming_response = await client.generate_stream(
        "Tell me a story",
        config
    )

    async for chunk in streaming_response:
        if chunk.content:
            print(chunk.content, end="", flush=True)

Multi-turn Conversations

from monollm import Message

async with UnifiedLLMClient() as client:
    config = RequestConfig(model="qwen-plus")

    messages = [
        Message(role="system", content="You are a helpful assistant."),
        Message(role="user", content="What's the weather like?"),
    ]

    response = await client.generate(messages, config)

    # Continue the conversation
    messages.append(Message(role="assistant", content=response.content))
    messages.append(Message(role="user", content="What about tomorrow?"))

    response = await client.generate(messages, config)

Error Handling

from monollm.core.exceptions import MonoLLMError, ProviderError

async with UnifiedLLMClient() as client:
    try:
        config = RequestConfig(model="invalid-model")
        response = await client.generate("Hello", config)
    except MonoLLMError as e:
        print(f"MonoLLM error: {e}")
    except ProviderError as e:
        print(f"Provider error: {e}")

Configuration Management

# The client automatically loads configuration from:
# 1. Environment variables
# 2. Configuration files in config/ directory
# 3. Default settings

# You can specify a custom config directory
from pathlib import Path

client = UnifiedLLMClient(config_dir=Path("./custom_config"))

Context Manager Usage

The client should always be used as an async context manager to ensure proper resource cleanup:

# Recommended: Using async context manager
async with UnifiedLLMClient() as client:
    # Your code here
    pass

# Manual management (not recommended)
client = UnifiedLLMClient()
await client.initialize()
try:
    # Your code here
    pass
finally:
    await client.close()

Thread Safety

The UnifiedLLMClient is designed for use with asyncio and is not thread-safe. Each thread should have its own client instance:

import asyncio
import threading

async def worker():
    async with UnifiedLLMClient() as client:
        # Each thread gets its own client
        config = RequestConfig(model="qwen-plus")
        response = await client.generate("Hello", config)
        print(response.content)

def run_in_thread():
    asyncio.run(worker())

# Start multiple threads
threads = []
for i in range(3):
    thread = threading.Thread(target=run_in_thread)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

Performance Considerations

Connection Pooling: The client maintains connection pools for each provider
Async Operations: All operations are async for better concurrency
Resource Management: Use context managers for automatic cleanup
Caching: Provider configurations are cached for better performance

Note

The client automatically initializes providers on first use. This means the first request to a provider may take slightly longer as the provider is set up.

Warning

Always use the client as an async context manager or manually call close() to ensure proper cleanup of resources.