LLMs

Aether supports many LLM providers out of the box. And, you can provide your own via implementing a trait.

You can specify the model an agent uses in your agent settings file.

{
  "name": "my-agent",
  "model": "anthropic:claude-sonnet-4-5"
}

In the TUI, only LLM providers with configured credentials appear in the model selector.

Providers

Anthropic


Credentials	`ANTHROPIC_API_KEY`
Model syntax	`anthropic:<model-id>`

OpenRouter


Credentials	`OPENROUTER_API_KEY`
Model syntax	`openrouter:<vendor>/<model-id>`

OpenRouter proxies 100+ models from various vendors. Use the vendor/model format from their model list.

OpenAI


Credentials	`OPENAI_API_KEY`
Model syntax	`openai:<model-id>`

Codex


Credentials	OAuth login — no API key needed
Model syntax	`codex:<model-id>`

Codex authenticates through a browser-based OAuth flow with OpenAI. On first use, Aether opens a browser window to complete the login. Credentials are stored securely in your OS keychain — you only need to log in once.

In the model selector, Codex models show a “Needs login” badge until you’ve authenticated.

Gemini


Credentials	`GEMINI_API_KEY`
Model syntax	`gemini:<model-id>`

DeepSeek


Credentials	`DEEPSEEK_API_KEY`
Model syntax	`deepseek:<model-id>`

AWS Bedrock


Credentials	AWS credential chain (environment, config file, or IAM role)
Model syntax	`bedrock:<model-id>`

Ollama


Credentials	None — requires a running Ollama server
Model syntax	`ollama:<model-id>`

Aether auto-discovers models from your Ollama instance. Any model you’ve pulled with ollama pull will appear in the model selector.

Set OLLAMA_HOST to override the default address (http://localhost:11434).

Llama.cpp


Credentials	None — requires a running llama.cpp server
Model syntax	`llamacpp:<model-id>`

Llama.cpp serves a single model at a time. Aether queries the server’s /v1/models endpoint to discover the loaded model, which then appears in the model selector.

Set LLAMA_CPP_HOST to override the default address (http://localhost:8080).

Moonshot (Kimi)


Credentials	`MOONSHOT_API_KEY`
Model syntax	`moonshot:<model-id>`

ZAI (Zhipu)


Credentials	`ZAI_API_KEY`
Model syntax	`zai:<model-id>`

Bring your own

Aether’s llm crate exposes the StreamingModelProvider trait, so you can integrate any LLM backend:

pub trait StreamingModelProvider: Send + Sync {
    fn stream_response(&self, context: &Context) -> LlmResponseStream;
    fn display_name(&self) -> String;
    fn context_window(&self) -> Option<u32>;
    fn model(&self) -> Option<LlmModel> { None }
}

Implement this trait on your own struct, and it can be passed directly to the agent builder. See the Custom Providers guide for a full walkthrough with examples.

Reasoning effort

Some models support extended thinking. Set reasoningEffort on an agent to control the thinking budget:

Level	Description
`"low"`	Minimal thinking — fastest responses
`"medium"`	Moderate thinking
`"high"`	Extended thinking for complex tasks
`"xhigh"`	Maximum thinking budget

{
  "name": "deep-thinker",
  "model": "anthropic:claude-sonnet-4-5",
  "reasoningEffort": "high",
  "..."
}

Alloying

Model alloying lets you combine multiple LLM providers into a single agent by comma-separating model specs. Each turn uses the next model in the list, cycling through in round-robin order.

Syntax

"model": "provider1:model1,provider2:model2,provider3:model3"

Example

An agent that alternates between DeepSeek and Anthropic:

{
  "name": "alloy-coder",
  "description": "Cost-optimized coding agent",
  "model": "deepseek:deepseek-chat,anthropic:claude-sonnet-4-5",
  "userInvocable": true
}

Turn 1 uses DeepSeek, turn 2 uses Anthropic, turn 3 uses DeepSeek, and so on.

Use cases

Cost optimization — Alternate between expensive and cheaper models. Use a powerful model for complex turns and a lighter one for simple follow-ups.
Redundancy — If one provider has an outage, the other turns still work.
Comparison — See how different models handle the same conversation context.

Considerations

Each model in the alloy must have its corresponding API key set
Reasoning effort applies to all models in the alloy (models that don’t support it will ignore it)
The conversation context is shared across all models — each model sees the full history regardless of which model generated previous turns