Documentation Index
Fetch the complete documentation index at: https://docs.hyreagent.fun/llms.txt
Use this file to discover all available pages before exploring further.
LLM Cascade
HYRE uses a multi-model cascade to generate AI insights. Models are tried in priority order. If one fails (timeout, rate limit, content policy), the next model is tried automatically. If all models fail, raw data is returned with insight: null and HTTP 206 status.
Cascade Order
Primary: OpenRouter
| Priority | Model | Timeout | Notes |
|---|
| 1 | DeepSeek V3.2 | 10s | Primary. Fast, high-quality JSON output. |
| 2 | DeepSeek V3 (Chat) | 8s | Fallback. Slightly older model version. |
| 3 | GLM 4.5 Air (Free) | 8s | Free tier. Good for simple enrichment. |
| 4 | Claude 3.5 Haiku | 8s | Premium fallback. Highest quality reasoning. |
Secondary: Venice AI
If all OpenRouter models fail, HYRE falls back to Venice AI:
| Priority | Model | Timeout | Notes |
|---|
| 5 | Venice DeepSeek V3.2 | 10s | Same model, different provider. |
| 6 | Venice GLM 4.7 Flash | 8s | Fast, lightweight. |
Chat Agent (Playground)
The Playground chat agent uses a separate model:
| Model | Provider | Use Case |
|---|
| Gemini 2.5 Flash-Lite | Google AI | Conversation flow, tool selection, response summarization |
Failure Modes
| Failure | Behavior |
|---|
| Timeout (>8-10s) | Abort and try next model |
| HTTP 429 (rate limit) | Skip and try next model |
| HTTP 5xx (server error) | Skip and try next model |
| Content policy block | Skip and try next model |
| Empty response | Skip and try next model |
| All models fail | Return raw data, insight: null, HTTP 206 |
HTTP 206 (Partial Content) indicates the data was fetched successfully but the LLM enrichment failed. The data field contains the full upstream data. The signal field will be absent or set to a default value.
LLM Call Configuration
Every LLM call uses these parameters:
{
"temperature": 0.3,
"max_tokens": 800,
"response_format": { "type": "json_object" }
}
- Low temperature (0.3) — Prioritizes consistent, factual output over creative variation.
- JSON mode — Forces the model to return valid JSON, parsed into the response envelope.
- 800 token limit — Keeps insights concise (1-2 sentences) and response times fast.
System Prompts
Each endpoint segment has a dedicated system prompt that instructs the LLM:
| Segment | Signal Vocabulary | Prompt Focus |
|---|
| Trenches | snipe, watch, avoid | Token risk assessment, sniper detection, dev behavior |
| Traders | follow, ignore | Wallet profitability, copy-trade worthiness |
| LPs | add_liquidity, rebalance, hold, remove | Pool APR sustainability, IL risk, range optimization |
| DeFi | high_yield, medium_yield, low_yield, risky | TVL trends, yield opportunity assessment |
| deBridge | execute, wait, avoid (quote) / migrate, stay, wait (yield) | Bridge cost efficiency, cross-chain yield comparison |
| Nansen | follow, ignore, accumulate, distribute | Smart money flow interpretation, wallet classification |
The LLM returns JSON matching this structure:
{
"insight": "Solana TVL surged 12% this week, driven by...",
"signal": "high_yield",
"confidence": 0.87
}
The enrich() function merges this with the raw data:
{
"data": { ... },
"insight": "Solana TVL surged 12% this week, driven by...",
"signal": "high_yield",
"confidence": 0.87,
"sources": ["defillama"],
"model_used": "deepseek-v3.2",
"latency_ms": 342,
"timestamp": "2026-04-17T10:30:00.000Z"
}