From 4c6ec6f10dc83ea5a7a32aefef0b17f56ce3aaca Mon Sep 17 00:00:00 2001 From: christiankrag Date: Sun, 14 Dec 2025 19:45:30 +0100 Subject: [PATCH] Update z.ai skill with latest GLM models (Dec 2025) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add GLM-4.6 (flagship, 200K context, agentic) - Add GLM-4.6V and GLM-4.6V-Flash (vision models) - Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash - Add GLM-Z1-Rumination-32B (deep reasoning) - Update model selection logic and pricing - Add references to official documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- skills/ai-providers/z-ai.md | 239 +++++++++++++++++++++++++++--------- 1 file changed, 183 insertions(+), 56 deletions(-) diff --git a/skills/ai-providers/z-ai.md b/skills/ai-providers/z-ai.md index 41de751..21acb2e 100644 --- a/skills/ai-providers/z-ai.md +++ b/skills/ai-providers/z-ai.md @@ -1,13 +1,15 @@ # Skill: AI Provider - z.ai ## Description -Fallback AI provider with GLM (General Language Model) support. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks. +Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks. ## Status **FALLBACK** - Use when: 1. synthetic.new rate limits or errors 2. GLM models outperform alternatives for the task 3. New models available earlier on z.ai +4. Extended context (200K+) needed +5. Vision/multimodal tasks required ## Configuration ```yaml @@ -20,89 +22,187 @@ rate_limit: 60 requests/minute **API Key configured:** `Z_AI_API_KEY` in environment variables. -## Available Models +## Available Models (Updated Dec 2025) -### GLM-4-Plus (Reasoning & Analysis) +### GLM-4.6 (Flagship - Latest) ```json { - "model_id": "glm-4-plus", - "best_for": ["reasoning", "analysis", "chinese_language", "long_context"], + "model_id": "glm-4.6", + "best_for": ["agentic", "reasoning", "coding", "frontend_dev"], + "context_window": 202752, + "max_output": 128000, + "temperature_range": [0.0, 1.0], + "recommended_temp": 0.5, + "strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"], + "pricing": { "input": "$0.40/M", "output": "$1.75/M" }, + "released": "2025-09-30" +} +``` + +**Use when:** +- Complex agentic tasks +- Advanced reasoning +- Frontend/UI development +- Tool-calling workflows +- Extended context needs (200K) + +### GLM-4.6V (Vision - Latest) +```json +{ + "model_id": "glm-4.6v", + "best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"], + "context_window": 128000, + "max_output": 4096, + "temperature_range": [0.0, 1.0], + "recommended_temp": 0.3, + "strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"], + "parameters": "106B", + "released": "2025-12-08" +} +``` + +**Use when:** +- Image analysis and understanding +- Document OCR and processing +- Video content analysis +- Multimodal reasoning + +### GLM-4.6V-Flash (Vision - Lightweight) +```json +{ + "model_id": "glm-4.6v-flash", + "best_for": ["fast_image_analysis", "local_deployment", "low_latency"], + "context_window": 128000, + "max_output": 4096, + "temperature_range": [0.0, 1.0], + "recommended_temp": 0.3, + "strengths": ["9B parameters", "Fast inference", "Local deployable"], + "parameters": "9B", + "released": "2025-12-08" +} +``` + +**Use when:** +- Quick image classification +- Edge/local deployment +- Low-latency vision tasks + +### GLM-4.5 (Previous Flagship) +```json +{ + "model_id": "glm-4.5", + "best_for": ["reasoning", "tool_use", "coding", "agents"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.5, - "strengths": ["Chinese content", "Logical reasoning", "Long documents"] + "strengths": ["355B MoE", "32B active params", "Proven stability"], + "parameters": "355B (32B active)" } ``` **Use when:** -- Complex logical reasoning -- Chinese language content -- Long document analysis -- Comparative analysis +- Need proven stable model +- Standard reasoning tasks +- Backward compatibility -### GLM-4-Flash (Fast Responses) +### GLM-4.5-Air (Efficient) ```json { - "model_id": "glm-4-flash", - "best_for": ["quick_responses", "simple_tasks", "high_volume"], + "model_id": "glm-4.5-air", + "best_for": ["cost_efficient", "standard_tasks", "high_volume"], + "context_window": 128000, + "max_output": 4096, + "temperature_range": [0.0, 1.0], + "recommended_temp": 0.5, + "strengths": ["106B MoE", "12B active params", "Cost effective"], + "parameters": "106B (12B active)" +} +``` + +**Use when:** +- Cost-sensitive operations +- High-volume processing +- Standard quality acceptable + +### GLM-4.5-Flash (Fast) +```json +{ + "model_id": "glm-4.5-flash", + "best_for": ["ultra_fast", "simple_tasks", "streaming"], "context_window": 32000, "max_output": 2048, "temperature_range": [0.0, 1.0], "recommended_temp": 0.3, - "strengths": ["Speed", "Cost efficiency", "Simple tasks"] + "strengths": ["Fastest inference", "Lowest cost", "Simple tasks"] } ``` **Use when:** -- Quick classification -- Simple transformations -- High-volume processing -- Cost-sensitive operations +- Real-time responses needed +- Simple classification/extraction +- Budget constraints -### GLM-4-Long (Extended Context) +### GLM-Z1-Rumination-32B (Deep Reasoning) ```json { - "model_id": "glm-4-long", - "best_for": ["long_documents", "codebase_analysis", "summarization"], - "context_window": 1000000, + "model_id": "glm-z1-rumination-32b-0414", + "best_for": ["deep_reasoning", "complex_analysis", "deliberation"], + "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], - "recommended_temp": 0.3, - "strengths": ["1M token context", "Document processing", "Code analysis"] + "recommended_temp": 0.7, + "strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"], + "released": "2025-04-14" } ``` **Use when:** -- Entire codebase analysis -- Long document summarization -- Multi-file code review +- Complex multi-step reasoning +- Problems requiring deliberation +- Chain-of-thought tasks ## Model Selection Logic ```javascript -function selectZAIModel(taskType, contextLength) { +function selectZAIModel(taskType, contextLength, needsVision = false) { + // Vision tasks + if (needsVision) { + return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash'; + } + // Context-based selection if (contextLength > 128000) { - return 'glm-4-long'; + return 'glm-4.6'; // 200K context } const modelMap = { + // Flagship tasks + 'agentic': 'glm-4.6', + 'frontend': 'glm-4.6', + 'tool_use': 'glm-4.6', + + // Deep reasoning + 'deep_reasoning': 'glm-z1-rumination-32b-0414', + 'deliberation': 'glm-z1-rumination-32b-0414', + + // Standard reasoning + 'reasoning': 'glm-4.5', + 'analysis': 'glm-4.5', + 'planning': 'glm-4.5', + 'coding': 'glm-4.5', + + // Cost-efficient + 'cost_efficient': 'glm-4.5-air', + 'high_volume': 'glm-4.5-air', + // Fast operations - 'classification': 'glm-4-flash', - 'extraction': 'glm-4-flash', - 'simple_qa': 'glm-4-flash', + 'classification': 'glm-4.5-flash', + 'extraction': 'glm-4.5-flash', + 'simple_qa': 'glm-4.5-flash', + 'streaming': 'glm-4.5-flash', - // Complex reasoning - 'reasoning': 'glm-4-plus', - 'analysis': 'glm-4-plus', - 'planning': 'glm-4-plus', - - // Long context - 'codebase': 'glm-4-long', - 'summarization': 'glm-4-long', - - // Default - 'default': 'glm-4-plus' + // Default to flagship + 'default': 'glm-4.6' }; return modelMap[taskType] || modelMap.default; @@ -122,7 +222,7 @@ async function callWithFallback(systemPrompt, userPrompt, options = {}) { // Rate limit or server error - fallback to z.ai if ([429, 500, 502, 503, 504].includes(errorCode)) { - console.log('Falling back to z.ai'); + console.log('Falling back to z.ai GLM-4.6'); return await callZAI(systemPrompt, userPrompt, options); } } @@ -137,9 +237,12 @@ function shouldPreferGLM(task) { const glmSuperiorTasks = [ 'chinese_translation', 'chinese_content', - 'million_token_context', - 'cost_optimization', - 'new_model_testing' + 'extended_context_200k', + 'vision_analysis', + 'multimodal', + 'frontend_development', + 'deep_rumination', + 'cost_optimization' ]; return glmSuperiorTasks.includes(task.type); @@ -158,7 +261,7 @@ function shouldPreferGLM(task) { "Content-Type": "application/json" }, "body": { - "model": "glm-4-plus", + "model": "glm-4.6", "messages": [ { "role": "system", "content": "{{ systemPrompt }}" }, { "role": "user", "content": "{{ userPrompt }}" } @@ -175,7 +278,7 @@ function shouldPreferGLM(task) { // z.ai Request Helper for n8n Code Node async function callZAI(systemPrompt, userPrompt, options = {}) { const { - model = 'glm-4-plus', + model = 'glm-4.6', maxTokens = 4000, temperature = 0.5 } = options; @@ -207,17 +310,35 @@ async function callZAI(systemPrompt, userPrompt, options = {}) { | Feature | synthetic.new | z.ai | |---------|---------------|------| | Primary Use | All tasks | Fallback + GLM tasks | -| Best Model (Code) | DeepSeek-V3 | GLM-4-Flash | -| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-4-Plus | -| Max Context | 200K | 1M (GLM-4-Long) | +| Best Model (Code) | DeepSeek-V3 | GLM-4.6 | +| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-Z1-Rumination | +| Best Model (Vision) | N/A | GLM-4.6V | +| Max Context | 200K | 200K (GLM-4.6) | | Chinese Support | Good | Excellent | | Rate Limit | 100/min | 60/min | -| Cost | Standard | Lower (Flash) | +| Cost (Input) | ~$0.50/M | $0.40/M (GLM-4.6) | +| Open Source | No | Yes (MIT) | + +## Model Hierarchy (Recommended) + +``` +Task Complexity: + +HIGH → GLM-Z1-Rumination (deep reasoning) + → GLM-4.6 (agentic, coding) + → GLM-4.6V (vision tasks) + +MEDIUM → GLM-4.5 (standard tasks) + → GLM-4.5-Air (cost-efficient) + +LOW → GLM-4.5-Flash (fast, simple) + → GLM-4.6V-Flash (fast vision) +``` ## Setup Instructions ### 1. Get API Key -1. Visit https://z.ai/dashboard +1. Visit https://z.ai/dashboard or https://open.bigmodel.cn 2. Create account or login 3. Navigate to API Keys 4. Generate new key @@ -226,7 +347,7 @@ async function callZAI(systemPrompt, userPrompt, options = {}) { ### 2. Configure in Coolify ```bash # Add to service environment variables -Z_AI_API_KEY=your_key_here +Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62 ``` ### 3. Test Connection @@ -235,7 +356,7 @@ curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \ -H "Authorization: Bearer $Z_AI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ - "model": "glm-4-flash", + "model": "glm-4.6", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50 }' @@ -248,6 +369,12 @@ curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \ | 429 | Rate limit | Wait and retry | | 400 | Invalid model | Check model name | +## References +- [GLM-4.6 Announcement](https://z.ai/blog/glm-4.6) +- [GLM-4.6V Multimodal](https://z.ai/blog/glm-4.6v) +- [OpenRouter GLM-4.6](https://openrouter.ai/z-ai/glm-4.6) +- [Hugging Face Models](https://huggingface.co/zai-org) + ## Related Skills - `ai-providers/synthetic-new.md` - Primary provider - `code/implement.md` - Code generation