Files

christiankrag 4c6ec6f10d Update z.ai skill with latest GLM models (Dec 2025)

- Add GLM-4.6 (flagship, 200K context, agentic)
- Add GLM-4.6V and GLM-4.6V-Flash (vision models)
- Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash
- Add GLM-Z1-Rumination-32B (deep reasoning)
- Update model selection logic and pricing
- Add references to official documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-14 19:45:30 +01:00

9.5 KiB

Raw Blame History

Skill: AI Provider - z.ai

Description

Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.

Status

FALLBACK - Use when:

synthetic.new rate limits or errors
GLM models outperform alternatives for the task
New models available earlier on z.ai
Extended context (200K+) needed
Vision/multimodal tasks required

Configuration

provider: z.ai (Zhipu AI / BigModel)
base_url: https://open.bigmodel.cn/api/paas/v4
api_key_env: Z_AI_API_KEY
compatibility: openai
rate_limit: 60 requests/minute

API Key configured: Z_AI_API_KEY in environment variables.

Available Models (Updated Dec 2025)

GLM-4.6 (Flagship - Latest)

{
  "model_id": "glm-4.6",
  "best_for": ["agentic", "reasoning", "coding", "frontend_dev"],
  "context_window": 202752,
  "max_output": 128000,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"],
  "pricing": { "input": "$0.40/M", "output": "$1.75/M" },
  "released": "2025-09-30"
}

Use when:

Complex agentic tasks
Advanced reasoning
Frontend/UI development
Tool-calling workflows
Extended context needs (200K)

GLM-4.6V (Vision - Latest)

{
  "model_id": "glm-4.6v",
  "best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"],
  "parameters": "106B",
  "released": "2025-12-08"
}

Use when:

Image analysis and understanding
Document OCR and processing
Video content analysis
Multimodal reasoning

GLM-4.6V-Flash (Vision - Lightweight)

{
  "model_id": "glm-4.6v-flash",
  "best_for": ["fast_image_analysis", "local_deployment", "low_latency"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["9B parameters", "Fast inference", "Local deployable"],
  "parameters": "9B",
  "released": "2025-12-08"
}

Use when:

Quick image classification
Edge/local deployment
Low-latency vision tasks

GLM-4.5 (Previous Flagship)

{
  "model_id": "glm-4.5",
  "best_for": ["reasoning", "tool_use", "coding", "agents"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["355B MoE", "32B active params", "Proven stability"],
  "parameters": "355B (32B active)"
}

Use when:

Need proven stable model
Standard reasoning tasks
Backward compatibility

GLM-4.5-Air (Efficient)

{
  "model_id": "glm-4.5-air",
  "best_for": ["cost_efficient", "standard_tasks", "high_volume"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["106B MoE", "12B active params", "Cost effective"],
  "parameters": "106B (12B active)"
}

Use when:

Cost-sensitive operations
High-volume processing
Standard quality acceptable

GLM-4.5-Flash (Fast)

{
  "model_id": "glm-4.5-flash",
  "best_for": ["ultra_fast", "simple_tasks", "streaming"],
  "context_window": 32000,
  "max_output": 2048,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["Fastest inference", "Lowest cost", "Simple tasks"]
}

Use when:

Real-time responses needed
Simple classification/extraction
Budget constraints

GLM-Z1-Rumination-32B (Deep Reasoning)

{
  "model_id": "glm-z1-rumination-32b-0414",
  "best_for": ["deep_reasoning", "complex_analysis", "deliberation"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.7,
  "strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"],
  "released": "2025-04-14"
}

Use when:

Complex multi-step reasoning
Problems requiring deliberation
Chain-of-thought tasks

Model Selection Logic

function selectZAIModel(taskType, contextLength, needsVision = false) {
  // Vision tasks
  if (needsVision) {
    return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash';
  }

  // Context-based selection
  if (contextLength > 128000) {
    return 'glm-4.6'; // 200K context
  }

  const modelMap = {
    // Flagship tasks
    'agentic': 'glm-4.6',
    'frontend': 'glm-4.6',
    'tool_use': 'glm-4.6',

    // Deep reasoning
    'deep_reasoning': 'glm-z1-rumination-32b-0414',
    'deliberation': 'glm-z1-rumination-32b-0414',

    // Standard reasoning
    'reasoning': 'glm-4.5',
    'analysis': 'glm-4.5',
    'planning': 'glm-4.5',
    'coding': 'glm-4.5',

    // Cost-efficient
    'cost_efficient': 'glm-4.5-air',
    'high_volume': 'glm-4.5-air',

    // Fast operations
    'classification': 'glm-4.5-flash',
    'extraction': 'glm-4.5-flash',
    'simple_qa': 'glm-4.5-flash',
    'streaming': 'glm-4.5-flash',

    // Default to flagship
    'default': 'glm-4.6'
  };

  return modelMap[taskType] || modelMap.default;
}

Fallback Logic

When to Fallback from synthetic.new

async function callWithFallback(systemPrompt, userPrompt, options = {}) {
  const primaryResult = await callSyntheticAI(systemPrompt, userPrompt, options);

  // Check for fallback conditions
  if (primaryResult.error) {
    const errorCode = primaryResult.error.code;

    // Rate limit or server error - fallback to z.ai
    if ([429, 500, 502, 503, 504].includes(errorCode)) {
      console.log('Falling back to z.ai GLM-4.6');
      return await callZAI(systemPrompt, userPrompt, options);
    }
  }

  return primaryResult;
}

GLM Superiority Conditions

function shouldPreferGLM(task) {
  const glmSuperiorTasks = [
    'chinese_translation',
    'chinese_content',
    'extended_context_200k',
    'vision_analysis',
    'multimodal',
    'frontend_development',
    'deep_rumination',
    'cost_optimization'
  ];

  return glmSuperiorTasks.includes(task.type);
}

n8n Integration

HTTP Request Node Configuration

{
  "method": "POST",
  "url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
  "headers": {
    "Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "model": "glm-4.6",
    "messages": [
      { "role": "system", "content": "{{ systemPrompt }}" },
      { "role": "user", "content": "{{ userPrompt }}" }
    ],
    "max_tokens": 4000,
    "temperature": 0.5
  },
  "timeout": 90000
}

Code Node Helper

// z.ai Request Helper for n8n Code Node
async function callZAI(systemPrompt, userPrompt, options = {}) {
  const {
    model = 'glm-4.6',
    maxTokens = 4000,
    temperature = 0.5
  } = options;

  const response = await $http.request({
    method: 'POST',
    url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions',
    headers: {
      'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: {
      model,
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt }
      ],
      max_tokens: maxTokens,
      temperature
    }
  });

  return response.choices[0].message.content;
}

Comparison: synthetic.new vs z.ai

Feature	synthetic.new	z.ai
Primary Use	All tasks	Fallback + GLM tasks
Best Model (Code)	DeepSeek-V3	GLM-4.6
Best Model (Reasoning)	Kimi-K2-Thinking	GLM-Z1-Rumination
Best Model (Vision)	N/A	GLM-4.6V
Max Context	200K	200K (GLM-4.6)
Chinese Support	Good	Excellent
Rate Limit	100/min	60/min
Cost (Input)	~$0.50/M	$0.40/M (GLM-4.6)
Open Source	No	Yes (MIT)

Model Hierarchy (Recommended)

Task Complexity:

HIGH    → GLM-Z1-Rumination (deep reasoning)
        → GLM-4.6 (agentic, coding)
        → GLM-4.6V (vision tasks)

MEDIUM  → GLM-4.5 (standard tasks)
        → GLM-4.5-Air (cost-efficient)

LOW     → GLM-4.5-Flash (fast, simple)
        → GLM-4.6V-Flash (fast vision)

Setup Instructions

1. Get API Key

Visit https://z.ai/dashboard or https://open.bigmodel.cn
Create account or login
Navigate to API Keys
Generate new key
Store as Z_AI_API_KEY environment variable

2. Configure in Coolify

# Add to service environment variables
Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62

3. Test Connection

curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $Z_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.6",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'

Error Handling

Error Code	Cause	Action
401	Invalid API key	Check Z_AI_API_KEY
429	Rate limit	Wait and retry
400	Invalid model	Check model name

References

ai-providers/synthetic-new.md - Primary provider
code/implement.md - Code generation
design-thinking/ideate.md - Solution brainstorming

Token Budget

Max input: 500 tokens
Max output: 800 tokens

Model

Recommended: haiku (configuration lookup)

9.5 KiB Raw Blame History

Skill: AI Provider - z.ai

Description

Status

Configuration

Available Models (Updated Dec 2025)

GLM-4.6 (Flagship - Latest)

GLM-4.6V (Vision - Latest)

GLM-4.6V-Flash (Vision - Lightweight)

GLM-4.5 (Previous Flagship)

GLM-4.5-Air (Efficient)

GLM-4.5-Flash (Fast)

GLM-Z1-Rumination-32B (Deep Reasoning)

Model Selection Logic

Fallback Logic

When to Fallback from synthetic.new

GLM Superiority Conditions

n8n Integration

HTTP Request Node Configuration

Code Node Helper

Comparison: synthetic.new vs z.ai

Model Hierarchy (Recommended)

Setup Instructions

1. Get API Key

2. Configure in Coolify

3. Test Connection

Error Handling

References

Related Skills

Token Budget

Model

9.5 KiB

Raw Blame History