Files
skills-library/skills/ai-providers/z-ai.md
christiankrag 4c6ec6f10d Update z.ai skill with latest GLM models (Dec 2025)
- Add GLM-4.6 (flagship, 200K context, agentic)
- Add GLM-4.6V and GLM-4.6V-Flash (vision models)
- Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash
- Add GLM-Z1-Rumination-32B (deep reasoning)
- Update model selection logic and pricing
- Add references to official documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 19:45:30 +01:00

389 lines
9.5 KiB
Markdown

# Skill: AI Provider - z.ai
## Description
Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.
## Status
**FALLBACK** - Use when:
1. synthetic.new rate limits or errors
2. GLM models outperform alternatives for the task
3. New models available earlier on z.ai
4. Extended context (200K+) needed
5. Vision/multimodal tasks required
## Configuration
```yaml
provider: z.ai (Zhipu AI / BigModel)
base_url: https://open.bigmodel.cn/api/paas/v4
api_key_env: Z_AI_API_KEY
compatibility: openai
rate_limit: 60 requests/minute
```
**API Key configured:** `Z_AI_API_KEY` in environment variables.
## Available Models (Updated Dec 2025)
### GLM-4.6 (Flagship - Latest)
```json
{
"model_id": "glm-4.6",
"best_for": ["agentic", "reasoning", "coding", "frontend_dev"],
"context_window": 202752,
"max_output": 128000,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"],
"pricing": { "input": "$0.40/M", "output": "$1.75/M" },
"released": "2025-09-30"
}
```
**Use when:**
- Complex agentic tasks
- Advanced reasoning
- Frontend/UI development
- Tool-calling workflows
- Extended context needs (200K)
### GLM-4.6V (Vision - Latest)
```json
{
"model_id": "glm-4.6v",
"best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"],
"parameters": "106B",
"released": "2025-12-08"
}
```
**Use when:**
- Image analysis and understanding
- Document OCR and processing
- Video content analysis
- Multimodal reasoning
### GLM-4.6V-Flash (Vision - Lightweight)
```json
{
"model_id": "glm-4.6v-flash",
"best_for": ["fast_image_analysis", "local_deployment", "low_latency"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["9B parameters", "Fast inference", "Local deployable"],
"parameters": "9B",
"released": "2025-12-08"
}
```
**Use when:**
- Quick image classification
- Edge/local deployment
- Low-latency vision tasks
### GLM-4.5 (Previous Flagship)
```json
{
"model_id": "glm-4.5",
"best_for": ["reasoning", "tool_use", "coding", "agents"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["355B MoE", "32B active params", "Proven stability"],
"parameters": "355B (32B active)"
}
```
**Use when:**
- Need proven stable model
- Standard reasoning tasks
- Backward compatibility
### GLM-4.5-Air (Efficient)
```json
{
"model_id": "glm-4.5-air",
"best_for": ["cost_efficient", "standard_tasks", "high_volume"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["106B MoE", "12B active params", "Cost effective"],
"parameters": "106B (12B active)"
}
```
**Use when:**
- Cost-sensitive operations
- High-volume processing
- Standard quality acceptable
### GLM-4.5-Flash (Fast)
```json
{
"model_id": "glm-4.5-flash",
"best_for": ["ultra_fast", "simple_tasks", "streaming"],
"context_window": 32000,
"max_output": 2048,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["Fastest inference", "Lowest cost", "Simple tasks"]
}
```
**Use when:**
- Real-time responses needed
- Simple classification/extraction
- Budget constraints
### GLM-Z1-Rumination-32B (Deep Reasoning)
```json
{
"model_id": "glm-z1-rumination-32b-0414",
"best_for": ["deep_reasoning", "complex_analysis", "deliberation"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.7,
"strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"],
"released": "2025-04-14"
}
```
**Use when:**
- Complex multi-step reasoning
- Problems requiring deliberation
- Chain-of-thought tasks
## Model Selection Logic
```javascript
function selectZAIModel(taskType, contextLength, needsVision = false) {
// Vision tasks
if (needsVision) {
return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash';
}
// Context-based selection
if (contextLength > 128000) {
return 'glm-4.6'; // 200K context
}
const modelMap = {
// Flagship tasks
'agentic': 'glm-4.6',
'frontend': 'glm-4.6',
'tool_use': 'glm-4.6',
// Deep reasoning
'deep_reasoning': 'glm-z1-rumination-32b-0414',
'deliberation': 'glm-z1-rumination-32b-0414',
// Standard reasoning
'reasoning': 'glm-4.5',
'analysis': 'glm-4.5',
'planning': 'glm-4.5',
'coding': 'glm-4.5',
// Cost-efficient
'cost_efficient': 'glm-4.5-air',
'high_volume': 'glm-4.5-air',
// Fast operations
'classification': 'glm-4.5-flash',
'extraction': 'glm-4.5-flash',
'simple_qa': 'glm-4.5-flash',
'streaming': 'glm-4.5-flash',
// Default to flagship
'default': 'glm-4.6'
};
return modelMap[taskType] || modelMap.default;
}
```
## Fallback Logic
### When to Fallback from synthetic.new
```javascript
async function callWithFallback(systemPrompt, userPrompt, options = {}) {
const primaryResult = await callSyntheticAI(systemPrompt, userPrompt, options);
// Check for fallback conditions
if (primaryResult.error) {
const errorCode = primaryResult.error.code;
// Rate limit or server error - fallback to z.ai
if ([429, 500, 502, 503, 504].includes(errorCode)) {
console.log('Falling back to z.ai GLM-4.6');
return await callZAI(systemPrompt, userPrompt, options);
}
}
return primaryResult;
}
```
### GLM Superiority Conditions
```javascript
function shouldPreferGLM(task) {
const glmSuperiorTasks = [
'chinese_translation',
'chinese_content',
'extended_context_200k',
'vision_analysis',
'multimodal',
'frontend_development',
'deep_rumination',
'cost_optimization'
];
return glmSuperiorTasks.includes(task.type);
}
```
## n8n Integration
### HTTP Request Node Configuration
```json
{
"method": "POST",
"url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
"headers": {
"Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"model": "glm-4.6",
"messages": [
{ "role": "system", "content": "{{ systemPrompt }}" },
{ "role": "user", "content": "{{ userPrompt }}" }
],
"max_tokens": 4000,
"temperature": 0.5
},
"timeout": 90000
}
```
### Code Node Helper
```javascript
// z.ai Request Helper for n8n Code Node
async function callZAI(systemPrompt, userPrompt, options = {}) {
const {
model = 'glm-4.6',
maxTokens = 4000,
temperature = 0.5
} = options;
const response = await $http.request({
method: 'POST',
url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions',
headers: {
'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
'Content-Type': 'application/json'
},
body: {
model,
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt }
],
max_tokens: maxTokens,
temperature
}
});
return response.choices[0].message.content;
}
```
## Comparison: synthetic.new vs z.ai
| Feature | synthetic.new | z.ai |
|---------|---------------|------|
| Primary Use | All tasks | Fallback + GLM tasks |
| Best Model (Code) | DeepSeek-V3 | GLM-4.6 |
| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-Z1-Rumination |
| Best Model (Vision) | N/A | GLM-4.6V |
| Max Context | 200K | 200K (GLM-4.6) |
| Chinese Support | Good | Excellent |
| Rate Limit | 100/min | 60/min |
| Cost (Input) | ~$0.50/M | $0.40/M (GLM-4.6) |
| Open Source | No | Yes (MIT) |
## Model Hierarchy (Recommended)
```
Task Complexity:
HIGH → GLM-Z1-Rumination (deep reasoning)
→ GLM-4.6 (agentic, coding)
→ GLM-4.6V (vision tasks)
MEDIUM → GLM-4.5 (standard tasks)
→ GLM-4.5-Air (cost-efficient)
LOW → GLM-4.5-Flash (fast, simple)
→ GLM-4.6V-Flash (fast vision)
```
## Setup Instructions
### 1. Get API Key
1. Visit https://z.ai/dashboard or https://open.bigmodel.cn
2. Create account or login
3. Navigate to API Keys
4. Generate new key
5. Store as `Z_AI_API_KEY` environment variable
### 2. Configure in Coolify
```bash
# Add to service environment variables
Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62
```
### 3. Test Connection
```bash
curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \
-H "Authorization: Bearer $Z_AI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4.6",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 50
}'
```
## Error Handling
| Error Code | Cause | Action |
|------------|-------|--------|
| 401 | Invalid API key | Check Z_AI_API_KEY |
| 429 | Rate limit | Wait and retry |
| 400 | Invalid model | Check model name |
## References
- [GLM-4.6 Announcement](https://z.ai/blog/glm-4.6)
- [GLM-4.6V Multimodal](https://z.ai/blog/glm-4.6v)
- [OpenRouter GLM-4.6](https://openrouter.ai/z-ai/glm-4.6)
- [Hugging Face Models](https://huggingface.co/zai-org)
## Related Skills
- `ai-providers/synthetic-new.md` - Primary provider
- `code/implement.md` - Code generation
- `design-thinking/ideate.md` - Solution brainstorming
## Token Budget
- Max input: 500 tokens
- Max output: 800 tokens
## Model
- Recommended: haiku (configuration lookup)