LLM Cascade

HYRE uses a multi-model cascade to generate AI insights. Models are tried in priority order. If one fails (timeout, rate limit, content policy), the next model is tried automatically. If all models fail, raw data is returned with insight: null and HTTP 206 status.

Cascade Order

Primary: OpenRouter

Priority	Model	Timeout	Notes
1	DeepSeek V3.2	10s	Primary. Fast, high-quality JSON output.
2	DeepSeek V3 (Chat)	8s	Fallback. Slightly older model version.
3	GLM 4.5 Air (Free)	8s	Free tier. Good for simple enrichment.
4	Claude 3.5 Haiku	8s	Premium fallback. Highest quality reasoning.

Secondary: Venice AI

If all OpenRouter models fail, HYRE falls back to Venice AI:

Priority	Model	Timeout	Notes
5	Venice DeepSeek V3.2	10s	Same model, different provider.
6	Venice GLM 4.7 Flash	8s	Fast, lightweight.

Chat Agent (Playground)

The Playground chat agent uses a separate model:

Model	Provider	Use Case
Gemini 2.5 Flash-Lite	Google AI	Conversation flow, tool selection, response summarization

Failure Modes

Failure	Behavior
Timeout (>8-10s)	Abort and try next model
HTTP 429 (rate limit)	Skip and try next model
HTTP 5xx (server error)	Skip and try next model
Content policy block	Skip and try next model
Empty response	Skip and try next model
All models fail	Return raw data, `insight: null`, HTTP 206

HTTP 206 (Partial Content) indicates the data was fetched successfully but the LLM enrichment failed. The data field contains the full upstream data. The signal field will be absent or set to a default value.

LLM Call Configuration

Every LLM call uses these parameters:

{
  "temperature": 0.3,
  "max_tokens": 800,
  "response_format": { "type": "json_object" }
}

Low temperature (0.3) — Prioritizes consistent, factual output over creative variation.
JSON mode — Forces the model to return valid JSON, parsed into the response envelope.
800 token limit — Keeps insights concise (1-2 sentences) and response times fast.

System Prompts

Each endpoint segment has a dedicated system prompt that instructs the LLM:

Segment	Signal Vocabulary	Prompt Focus
Trenches	`snipe`, `watch`, `avoid`	Token risk assessment, sniper detection, dev behavior
Traders	`follow`, `ignore`	Wallet profitability, copy-trade worthiness
LPs	`add_liquidity`, `rebalance`, `hold`, `remove`	Pool APR sustainability, IL risk, range optimization
DeFi	`high_yield`, `medium_yield`, `low_yield`, `risky`	TVL trends, yield opportunity assessment
deBridge	`execute`, `wait`, `avoid` (quote) / `migrate`, `stay`, `wait` (yield)	Bridge cost efficiency, cross-chain yield comparison
Nansen	`follow`, `ignore`, `accumulate`, `distribute`	Smart money flow interpretation, wallet classification

Response Format

The LLM returns JSON matching this structure:

{
  "insight": "Solana TVL surged 12% this week, driven by...",
  "signal": "high_yield",
  "confidence": 0.87
}

The enrich() function merges this with the raw data:

{
  "data": { ... },
  "insight": "Solana TVL surged 12% this week, driven by...",
  "signal": "high_yield",
  "confidence": 0.87,
  "sources": ["defillama"],
  "model_used": "deepseek-v3.2",
  "latency_ms": 342,
  "timestamp": "2026-04-17T10:30:00.000Z"
}

Getting Started

Agent Playground

Payment

Architecture

ME Protocol

API Reference

LLM Cascade

LLM Cascade

Cascade Order

Primary: OpenRouter

Secondary: Venice AI

Chat Agent (Playground)

Failure Modes

LLM Call Configuration

System Prompts

Response Format

Getting Started

Agent Playground

Payment

Architecture

ME Protocol

API Reference

Documentation Index

​LLM Cascade

​Cascade Order

​Primary: OpenRouter

​Secondary: Venice AI

​Chat Agent (Playground)

​Failure Modes

​LLM Call Configuration

​System Prompts

​Response Format

LLM Cascade

Cascade Order

Primary: OpenRouter

Secondary: Venice AI

Chat Agent (Playground)

Failure Modes

LLM Call Configuration

System Prompts

Response Format