# Skill: AI Provider - z.ai ## Description Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks. ## Status **FALLBACK** - Use when: 1. synthetic.new rate limits or errors 2. GLM models outperform alternatives for the task 3. New models available earlier on z.ai 4. Extended context (200K+) needed 5. Vision/multimodal tasks required ## Configuration ```yaml provider: z.ai (Zhipu AI / BigModel) base_url: https://open.bigmodel.cn/api/paas/v4 api_key_env: Z_AI_API_KEY compatibility: openai rate_limit: 60 requests/minute ``` **API Key configured:** `Z_AI_API_KEY` in environment variables. ## Available Models (Updated Dec 2025) ### GLM-4.6 (Flagship - Latest) ```json { "model_id": "glm-4.6", "best_for": ["agentic", "reasoning", "coding", "frontend_dev"], "context_window": 202752, "max_output": 128000, "temperature_range": [0.0, 1.0], "recommended_temp": 0.5, "strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"], "pricing": { "input": "$0.40/M", "output": "$1.75/M" }, "released": "2025-09-30" } ``` **Use when:** - Complex agentic tasks - Advanced reasoning - Frontend/UI development - Tool-calling workflows - Extended context needs (200K) ### GLM-4.6V (Vision - Latest) ```json { "model_id": "glm-4.6v", "best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.3, "strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"], "parameters": "106B", "released": "2025-12-08" } ``` **Use when:** - Image analysis and understanding - Document OCR and processing - Video content analysis - Multimodal reasoning ### GLM-4.6V-Flash (Vision - Lightweight) ```json { "model_id": "glm-4.6v-flash", "best_for": ["fast_image_analysis", "local_deployment", "low_latency"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.3, "strengths": ["9B parameters", "Fast inference", "Local deployable"], "parameters": "9B", "released": "2025-12-08" } ``` **Use when:** - Quick image classification - Edge/local deployment - Low-latency vision tasks ### GLM-4.5 (Previous Flagship) ```json { "model_id": "glm-4.5", "best_for": ["reasoning", "tool_use", "coding", "agents"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.5, "strengths": ["355B MoE", "32B active params", "Proven stability"], "parameters": "355B (32B active)" } ``` **Use when:** - Need proven stable model - Standard reasoning tasks - Backward compatibility ### GLM-4.5-Air (Efficient) ```json { "model_id": "glm-4.5-air", "best_for": ["cost_efficient", "standard_tasks", "high_volume"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.5, "strengths": ["106B MoE", "12B active params", "Cost effective"], "parameters": "106B (12B active)" } ``` **Use when:** - Cost-sensitive operations - High-volume processing - Standard quality acceptable ### GLM-4.5-Flash (Fast) ```json { "model_id": "glm-4.5-flash", "best_for": ["ultra_fast", "simple_tasks", "streaming"], "context_window": 32000, "max_output": 2048, "temperature_range": [0.0, 1.0], "recommended_temp": 0.3, "strengths": ["Fastest inference", "Lowest cost", "Simple tasks"] } ``` **Use when:** - Real-time responses needed - Simple classification/extraction - Budget constraints ### GLM-Z1-Rumination-32B (Deep Reasoning) ```json { "model_id": "glm-z1-rumination-32b-0414", "best_for": ["deep_reasoning", "complex_analysis", "deliberation"], "context_window": 128000, "max_output": 4096, "temperature_range": [0.0, 1.0], "recommended_temp": 0.7, "strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"], "released": "2025-04-14" } ``` **Use when:** - Complex multi-step reasoning - Problems requiring deliberation - Chain-of-thought tasks ## Model Selection Logic ```javascript function selectZAIModel(taskType, contextLength, needsVision = false) { // Vision tasks if (needsVision) { return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash'; } // Context-based selection if (contextLength > 128000) { return 'glm-4.6'; // 200K context } const modelMap = { // Flagship tasks 'agentic': 'glm-4.6', 'frontend': 'glm-4.6', 'tool_use': 'glm-4.6', // Deep reasoning 'deep_reasoning': 'glm-z1-rumination-32b-0414', 'deliberation': 'glm-z1-rumination-32b-0414', // Standard reasoning 'reasoning': 'glm-4.5', 'analysis': 'glm-4.5', 'planning': 'glm-4.5', 'coding': 'glm-4.5', // Cost-efficient 'cost_efficient': 'glm-4.5-air', 'high_volume': 'glm-4.5-air', // Fast operations 'classification': 'glm-4.5-flash', 'extraction': 'glm-4.5-flash', 'simple_qa': 'glm-4.5-flash', 'streaming': 'glm-4.5-flash', // Default to flagship 'default': 'glm-4.6' }; return modelMap[taskType] || modelMap.default; } ``` ## Fallback Logic ### When to Fallback from synthetic.new ```javascript async function callWithFallback(systemPrompt, userPrompt, options = {}) { const primaryResult = await callSyntheticAI(systemPrompt, userPrompt, options); // Check for fallback conditions if (primaryResult.error) { const errorCode = primaryResult.error.code; // Rate limit or server error - fallback to z.ai if ([429, 500, 502, 503, 504].includes(errorCode)) { console.log('Falling back to z.ai GLM-4.6'); return await callZAI(systemPrompt, userPrompt, options); } } return primaryResult; } ``` ### GLM Superiority Conditions ```javascript function shouldPreferGLM(task) { const glmSuperiorTasks = [ 'chinese_translation', 'chinese_content', 'extended_context_200k', 'vision_analysis', 'multimodal', 'frontend_development', 'deep_rumination', 'cost_optimization' ]; return glmSuperiorTasks.includes(task.type); } ``` ## n8n Integration ### HTTP Request Node Configuration ```json { "method": "POST", "url": "https://open.bigmodel.cn/api/paas/v4/chat/completions", "headers": { "Authorization": "Bearer {{ $env.Z_AI_API_KEY }}", "Content-Type": "application/json" }, "body": { "model": "glm-4.6", "messages": [ { "role": "system", "content": "{{ systemPrompt }}" }, { "role": "user", "content": "{{ userPrompt }}" } ], "max_tokens": 4000, "temperature": 0.5 }, "timeout": 90000 } ``` ### Code Node Helper ```javascript // z.ai Request Helper for n8n Code Node async function callZAI(systemPrompt, userPrompt, options = {}) { const { model = 'glm-4.6', maxTokens = 4000, temperature = 0.5 } = options; const response = await $http.request({ method: 'POST', url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions', headers: { 'Authorization': `Bearer ${$env.Z_AI_API_KEY}`, 'Content-Type': 'application/json' }, body: { model, messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: userPrompt } ], max_tokens: maxTokens, temperature } }); return response.choices[0].message.content; } ``` ## Comparison: synthetic.new vs z.ai | Feature | synthetic.new | z.ai | |---------|---------------|------| | Primary Use | All tasks | Fallback + GLM tasks | | Best Model (Code) | DeepSeek-V3 | GLM-4.6 | | Best Model (Reasoning) | Kimi-K2-Thinking | GLM-Z1-Rumination | | Best Model (Vision) | N/A | GLM-4.6V | | Max Context | 200K | 200K (GLM-4.6) | | Chinese Support | Good | Excellent | | Rate Limit | 100/min | 60/min | | Cost (Input) | ~$0.50/M | $0.40/M (GLM-4.6) | | Open Source | No | Yes (MIT) | ## Model Hierarchy (Recommended) ``` Task Complexity: HIGH → GLM-Z1-Rumination (deep reasoning) → GLM-4.6 (agentic, coding) → GLM-4.6V (vision tasks) MEDIUM → GLM-4.5 (standard tasks) → GLM-4.5-Air (cost-efficient) LOW → GLM-4.5-Flash (fast, simple) → GLM-4.6V-Flash (fast vision) ``` ## Setup Instructions ### 1. Get API Key 1. Visit https://z.ai/dashboard or https://open.bigmodel.cn 2. Create account or login 3. Navigate to API Keys 4. Generate new key 5. Store as `Z_AI_API_KEY` environment variable ### 2. Configure in Coolify ```bash # Add to service environment variables Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62 ``` ### 3. Test Connection ```bash curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \ -H "Authorization: Bearer $Z_AI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.6", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50 }' ``` ## Error Handling | Error Code | Cause | Action | |------------|-------|--------| | 401 | Invalid API key | Check Z_AI_API_KEY | | 429 | Rate limit | Wait and retry | | 400 | Invalid model | Check model name | ## References - [GLM-4.6 Announcement](https://z.ai/blog/glm-4.6) - [GLM-4.6V Multimodal](https://z.ai/blog/glm-4.6v) - [OpenRouter GLM-4.6](https://openrouter.ai/z-ai/glm-4.6) - [Hugging Face Models](https://huggingface.co/zai-org) ## Related Skills - `ai-providers/synthetic-new.md` - Primary provider - `code/implement.md` - Code generation - `design-thinking/ideate.md` - Solution brainstorming ## Token Budget - Max input: 500 tokens - Max output: 800 tokens ## Model - Recommended: haiku (configuration lookup)