Update z.ai skill with latest GLM models (Dec 2025)

- Add GLM-4.6 (flagship, 200K context, agentic) - Add GLM-4.6V and GLM-4.6V-Flash (vision models) - Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash - Add GLM-Z1-Rumination-32B (deep reasoning) - Update model selection logic and pricing - Add references to official documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update z.ai skill with correct BigModel API URL
2025-12-14 19:45:30 +01:00 · 2025-12-14 19:29:28 +01:00
1 changed files with 189 additions and 62 deletions
--- a/skills/ai-providers/z-ai.md
+++ b/skills/ai-providers/z-ai.md
@@ -1,108 +1,208 @@
 # Skill: AI Provider - z.ai

 ## Description
-Fallback AI provider with GLM (General Language Model) support. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.
+Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.

 ## Status
 **FALLBACK** - Use when:
 1. synthetic.new rate limits or errors
 2. GLM models outperform alternatives for the task
 3. New models available earlier on z.ai
+4. Extended context (200K+) needed
+5. Vision/multimodal tasks required

 ## Configuration
 ```yaml
-provider: z.ai
-base_url: https://api.z.ai/v1
+provider: z.ai (Zhipu AI / BigModel)
+base_url: https://open.bigmodel.cn/api/paas/v4
 api_key_env: Z_AI_API_KEY
 compatibility: openai
 rate_limit: 60 requests/minute
 ```

-**Note:** API key needs to be configured. Check z.ai dashboard for key generation.
+**API Key configured:** `Z_AI_API_KEY` in environment variables.

-## Available Models
+## Available Models (Updated Dec 2025)

-### GLM-4-Plus (Reasoning & Analysis)
+### GLM-4.6 (Flagship - Latest)
 ```json
 {
-  "model_id": "glm-4-plus",
-  "best_for": ["reasoning", "analysis", "chinese_language", "long_context"],
+  "model_id": "glm-4.6",
+  "best_for": ["agentic", "reasoning", "coding", "frontend_dev"],
+  "context_window": 202752,
+  "max_output": 128000,
+  "temperature_range": [0.0, 1.0],
+  "recommended_temp": 0.5,
+  "strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"],
+  "pricing": { "input": "$0.40/M", "output": "$1.75/M" },
+  "released": "2025-09-30"
+}
+```
+
+**Use when:**
+- Complex agentic tasks
+- Advanced reasoning
+- Frontend/UI development
+- Tool-calling workflows
+- Extended context needs (200K)
+
+### GLM-4.6V (Vision - Latest)
+```json
+{
+  "model_id": "glm-4.6v",
+  "best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"],
+  "context_window": 128000,
+  "max_output": 4096,
+  "temperature_range": [0.0, 1.0],
+  "recommended_temp": 0.3,
+  "strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"],
+  "parameters": "106B",
+  "released": "2025-12-08"
+}
+```
+
+**Use when:**
+- Image analysis and understanding
+- Document OCR and processing
+- Video content analysis
+- Multimodal reasoning
+
+### GLM-4.6V-Flash (Vision - Lightweight)
+```json
+{
+  "model_id": "glm-4.6v-flash",
+  "best_for": ["fast_image_analysis", "local_deployment", "low_latency"],
+  "context_window": 128000,
+  "max_output": 4096,
+  "temperature_range": [0.0, 1.0],
+  "recommended_temp": 0.3,
+  "strengths": ["9B parameters", "Fast inference", "Local deployable"],
+  "parameters": "9B",
+  "released": "2025-12-08"
+}
+```
+
+**Use when:**
+- Quick image classification
+- Edge/local deployment
+- Low-latency vision tasks
+
+### GLM-4.5 (Previous Flagship)
+```json
+{
+  "model_id": "glm-4.5",
+  "best_for": ["reasoning", "tool_use", "coding", "agents"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
-  "strengths": ["Chinese content", "Logical reasoning", "Long documents"]
+  "strengths": ["355B MoE", "32B active params", "Proven stability"],
+  "parameters": "355B (32B active)"
 }
 ```

 **Use when:**
- Complex logical reasoning
- Chinese language content
- Long document analysis
- Comparative analysis
+- Need proven stable model
+- Standard reasoning tasks
+- Backward compatibility

-### GLM-4-Flash (Fast Responses)
+### GLM-4.5-Air (Efficient)
 ```json
 {
-  "model_id": "glm-4-flash",
-  "best_for": ["quick_responses", "simple_tasks", "high_volume"],
+  "model_id": "glm-4.5-air",
+  "best_for": ["cost_efficient", "standard_tasks", "high_volume"],
+  "context_window": 128000,
+  "max_output": 4096,
+  "temperature_range": [0.0, 1.0],
+  "recommended_temp": 0.5,
+  "strengths": ["106B MoE", "12B active params", "Cost effective"],
+  "parameters": "106B (12B active)"
+}
+```
+
+**Use when:**
+- Cost-sensitive operations
+- High-volume processing
+- Standard quality acceptable
+
+### GLM-4.5-Flash (Fast)
+```json
+{
+  "model_id": "glm-4.5-flash",
+  "best_for": ["ultra_fast", "simple_tasks", "streaming"],
  "context_window": 32000,
  "max_output": 2048,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
-  "strengths": ["Speed", "Cost efficiency", "Simple tasks"]
+  "strengths": ["Fastest inference", "Lowest cost", "Simple tasks"]
 }
 ```

 **Use when:**
- Quick classification
- Simple transformations
- High-volume processing
- Cost-sensitive operations
+- Real-time responses needed
+- Simple classification/extraction
+- Budget constraints

-### GLM-4-Long (Extended Context)
+### GLM-Z1-Rumination-32B (Deep Reasoning)
 ```json
 {
-  "model_id": "glm-4-long",
-  "best_for": ["long_documents", "codebase_analysis", "summarization"],
-  "context_window": 1000000,
+  "model_id": "glm-z1-rumination-32b-0414",
+  "best_for": ["deep_reasoning", "complex_analysis", "deliberation"],
+  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
-  "recommended_temp": 0.3,
-  "strengths": ["1M token context", "Document processing", "Code analysis"]
+  "recommended_temp": 0.7,
+  "strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"],
+  "released": "2025-04-14"
 }
 ```

 **Use when:**
- Entire codebase analysis
- Long document summarization
- Multi-file code review
+- Complex multi-step reasoning
+- Problems requiring deliberation
+- Chain-of-thought tasks

 ## Model Selection Logic
 ```javascript
-function selectZAIModel(taskType, contextLength) {
+function selectZAIModel(taskType, contextLength, needsVision = false) {
+  // Vision tasks
+  if (needsVision) {
+    return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash';
+  }
+
  // Context-based selection
  if (contextLength > 128000) {
-    return 'glm-4-long';
+    return 'glm-4.6'; // 200K context
  }

  const modelMap = {
+    // Flagship tasks
+    'agentic': 'glm-4.6',
+    'frontend': 'glm-4.6',
+    'tool_use': 'glm-4.6',
+
+    // Deep reasoning
+    'deep_reasoning': 'glm-z1-rumination-32b-0414',
+    'deliberation': 'glm-z1-rumination-32b-0414',
+
+    // Standard reasoning
+    'reasoning': 'glm-4.5',
+    'analysis': 'glm-4.5',
+    'planning': 'glm-4.5',
+    'coding': 'glm-4.5',
+
+    // Cost-efficient
+    'cost_efficient': 'glm-4.5-air',
+    'high_volume': 'glm-4.5-air',
+
    // Fast operations
-    'classification': 'glm-4-flash',
-    'extraction': 'glm-4-flash',
-    'simple_qa': 'glm-4-flash',
+    'classification': 'glm-4.5-flash',
+    'extraction': 'glm-4.5-flash',
+    'simple_qa': 'glm-4.5-flash',
+    'streaming': 'glm-4.5-flash',

-    // Complex reasoning
-    'reasoning': 'glm-4-plus',
-    'analysis': 'glm-4-plus',
-    'planning': 'glm-4-plus',
-
-    // Long context
-    'codebase': 'glm-4-long',
-    'summarization': 'glm-4-long',
-
-    // Default
-    'default': 'glm-4-plus'
+    // Default to flagship
+    'default': 'glm-4.6'
  };

  return modelMap[taskType] || modelMap.default;
@@ -122,7 +222,7 @@ async function callWithFallback(systemPrompt, userPrompt, options = {}) {

    // Rate limit or server error - fallback to z.ai
    if ([429, 500, 502, 503, 504].includes(errorCode)) {
-      console.log('Falling back to z.ai');
+      console.log('Falling back to z.ai GLM-4.6');
      return await callZAI(systemPrompt, userPrompt, options);
    }
  }
@@ -137,9 +237,12 @@ function shouldPreferGLM(task) {
  const glmSuperiorTasks = [
    'chinese_translation',
    'chinese_content',
-    'million_token_context',
-    'cost_optimization',
-    'new_model_testing'
+    'extended_context_200k',
+    'vision_analysis',
+    'multimodal',
+    'frontend_development',
+    'deep_rumination',
+    'cost_optimization'
  ];

  return glmSuperiorTasks.includes(task.type);
@@ -152,13 +255,13 @@ function shouldPreferGLM(task) {
 ```json
 {
  "method": "POST",
-  "url": "https://api.z.ai/v1/chat/completions",
+  "url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
  "headers": {
    "Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
-    "model": "glm-4-plus",
+    "model": "glm-4.6",
    "messages": [
      { "role": "system", "content": "{{ systemPrompt }}" },
      { "role": "user", "content": "{{ userPrompt }}" }
@@ -175,14 +278,14 @@ function shouldPreferGLM(task) {
 // z.ai Request Helper for n8n Code Node
 async function callZAI(systemPrompt, userPrompt, options = {}) {
  const {
-    model = 'glm-4-plus',
+    model = 'glm-4.6',
    maxTokens = 4000,
    temperature = 0.5
  } = options;

  const response = await $http.request({
    method: 'POST',
-    url: 'https://api.z.ai/v1/chat/completions',
+    url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions',
    headers: {
      'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
      'Content-Type': 'application/json'
@@ -207,17 +310,35 @@ async function callZAI(systemPrompt, userPrompt, options = {}) {
 | Feature | synthetic.new | z.ai |
 |---------|---------------|------|
 | Primary Use | All tasks | Fallback + GLM tasks |
-| Best Model (Code) | DeepSeek-V3 | GLM-4-Flash |
-| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-4-Plus |
-| Max Context | 200K | 1M (GLM-4-Long) |
+| Best Model (Code) | DeepSeek-V3 | GLM-4.6 |
+| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-Z1-Rumination |
+| Best Model (Vision) | N/A | GLM-4.6V |
+| Max Context | 200K | 200K (GLM-4.6) |
 | Chinese Support | Good | Excellent |
 | Rate Limit | 100/min | 60/min |
-| Cost | Standard | Lower (Flash) |
+| Cost (Input) | ~$0.50/M | $0.40/M (GLM-4.6) |
+| Open Source | No | Yes (MIT) |
+
+## Model Hierarchy (Recommended)
+
+```
+Task Complexity:
+
+HIGH    → GLM-Z1-Rumination (deep reasoning)
+        → GLM-4.6 (agentic, coding)
+        → GLM-4.6V (vision tasks)
+
+MEDIUM  → GLM-4.5 (standard tasks)
+        → GLM-4.5-Air (cost-efficient)
+
+LOW     → GLM-4.5-Flash (fast, simple)
+        → GLM-4.6V-Flash (fast vision)
+```

 ## Setup Instructions

 ### 1. Get API Key
-1. Visit https://z.ai/dashboard
+1. Visit https://z.ai/dashboard or https://open.bigmodel.cn
 2. Create account or login
 3. Navigate to API Keys
 4. Generate new key
@@ -226,16 +347,16 @@ async function callZAI(systemPrompt, userPrompt, options = {}) {
 ### 2. Configure in Coolify
 ```bash
 # Add to service environment variables
-Z_AI_API_KEY=your_key_here
+Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62
 ```

 ### 3. Test Connection
 ```bash
-curl -X POST https://api.z.ai/v1/chat/completions \
+curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $Z_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
-    "model": "glm-4-flash",
+    "model": "glm-4.6",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'
@@ -248,6 +369,12 @@ curl -X POST https://api.z.ai/v1/chat/completions \
 | 429 | Rate limit | Wait and retry |
 | 400 | Invalid model | Check model name |

+## References
+- [GLM-4.6 Announcement](https://z.ai/blog/glm-4.6)
+- [GLM-4.6V Multimodal](https://z.ai/blog/glm-4.6v)
+- [OpenRouter GLM-4.6](https://openrouter.ai/z-ai/glm-4.6)
+- [Hugging Face Models](https://huggingface.co/zai-org)
+
 ## Related Skills
 - `ai-providers/synthetic-new.md` - Primary provider
 - `code/implement.md` - Code generation