Skip to content
Theme:

LLMs

Aether supports many LLM providers out of the box. And, you can provide your own via implementing a trait.

You can specify the model an agent uses in your agent settings file.

settings.json
{
"name": "my-agent",
"model": "anthropic:claude-sonnet-4-5"
}

In the TUI, only LLM providers with configured credentials appear in the model selector.

CredentialsANTHROPIC_API_KEY
Model syntaxanthropic:<model-id>
CredentialsOPENROUTER_API_KEY
Model syntaxopenrouter:<vendor>/<model-id>

OpenRouter proxies 100+ models from various vendors. Use the vendor/model format from their model list.

CredentialsOPENAI_API_KEY
Model syntaxopenai:<model-id>
CredentialsOAuth login — no API key needed
Model syntaxcodex:<model-id>

Codex authenticates through a browser-based OAuth flow with OpenAI. On first use, Aether opens a browser window to complete the login. Credentials are stored securely in your OS keychain — you only need to log in once.

In the model selector, Codex models show a “Needs login” badge until you’ve authenticated.

CredentialsGEMINI_API_KEY
Model syntaxgemini:<model-id>
CredentialsDEEPSEEK_API_KEY
Model syntaxdeepseek:<model-id>
CredentialsAWS credential chain (environment, config file, or IAM role)
Model syntaxbedrock:<model-id>
CredentialsNone — requires a running Ollama server
Model syntaxollama:<model-id>

Aether auto-discovers models from your Ollama instance. Any model you’ve pulled with ollama pull will appear in the model selector.

Set OLLAMA_HOST to override the default address (http://localhost:11434).

CredentialsNone — requires a running llama.cpp server
Model syntaxllamacpp:<model-id>

Llama.cpp serves a single model at a time. Aether queries the server’s /v1/models endpoint to discover the loaded model, which then appears in the model selector.

Set LLAMA_CPP_HOST to override the default address (http://localhost:8080).

CredentialsMOONSHOT_API_KEY
Model syntaxmoonshot:<model-id>
CredentialsZAI_API_KEY
Model syntaxzai:<model-id>

Aether’s llm crate exposes the StreamingModelProvider trait, so you can integrate any LLM backend:

pub trait StreamingModelProvider: Send + Sync {
fn stream_response(&self, context: &Context) -> LlmResponseStream;
fn display_name(&self) -> String;
fn context_window(&self) -> Option<u32>;
fn model(&self) -> Option<LlmModel> { None }
}

Implement this trait on your own struct, and it can be passed directly to the agent builder. See the Custom Providers guide for a full walkthrough with examples.

Some models support extended thinking. Set reasoningEffort on an agent to control the thinking budget:

LevelDescription
"low"Minimal thinking — fastest responses
"medium"Moderate thinking
"high"Extended thinking for complex tasks
"xhigh"Maximum thinking budget
{
"name": "deep-thinker",
"model": "anthropic:claude-sonnet-4-5",
"reasoningEffort": "high",
"..."
}

Model alloying lets you combine multiple LLM providers into a single agent by comma-separating model specs. Each turn uses the next model in the list, cycling through in round-robin order.

"model": "provider1:model1,provider2:model2,provider3:model3"

An agent that alternates between DeepSeek and Anthropic:

{
"name": "alloy-coder",
"description": "Cost-optimized coding agent",
"model": "deepseek:deepseek-chat,anthropic:claude-sonnet-4-5",
"userInvocable": true
}

Turn 1 uses DeepSeek, turn 2 uses Anthropic, turn 3 uses DeepSeek, and so on.

  • Cost optimization — Alternate between expensive and cheaper models. Use a powerful model for complex turns and a lighter one for simple follow-ups.
  • Redundancy — If one provider has an outage, the other turns still work.
  • Comparison — See how different models handle the same conversation context.
  • Each model in the alloy must have its corresponding API key set
  • Reasoning effort applies to all models in the alloy (models that don’t support it will ignore it)
  • The conversation context is shared across all models — each model sees the full history regardless of which model generated previous turns