Files
skills-library/skills/ai-providers/z-ai.md
christiankrag 4c6ec6f10d Update z.ai skill with latest GLM models (Dec 2025)
- Add GLM-4.6 (flagship, 200K context, agentic)
- Add GLM-4.6V and GLM-4.6V-Flash (vision models)
- Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash
- Add GLM-Z1-Rumination-32B (deep reasoning)
- Update model selection logic and pricing
- Add references to official documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 19:45:30 +01:00

9.5 KiB

Skill: AI Provider - z.ai

Description

Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.

Status

FALLBACK - Use when:

  1. synthetic.new rate limits or errors
  2. GLM models outperform alternatives for the task
  3. New models available earlier on z.ai
  4. Extended context (200K+) needed
  5. Vision/multimodal tasks required

Configuration

provider: z.ai (Zhipu AI / BigModel)
base_url: https://open.bigmodel.cn/api/paas/v4
api_key_env: Z_AI_API_KEY
compatibility: openai
rate_limit: 60 requests/minute

API Key configured: Z_AI_API_KEY in environment variables.

Available Models (Updated Dec 2025)

GLM-4.6 (Flagship - Latest)

{
  "model_id": "glm-4.6",
  "best_for": ["agentic", "reasoning", "coding", "frontend_dev"],
  "context_window": 202752,
  "max_output": 128000,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"],
  "pricing": { "input": "$0.40/M", "output": "$1.75/M" },
  "released": "2025-09-30"
}

Use when:

  • Complex agentic tasks
  • Advanced reasoning
  • Frontend/UI development
  • Tool-calling workflows
  • Extended context needs (200K)

GLM-4.6V (Vision - Latest)

{
  "model_id": "glm-4.6v",
  "best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"],
  "parameters": "106B",
  "released": "2025-12-08"
}

Use when:

  • Image analysis and understanding
  • Document OCR and processing
  • Video content analysis
  • Multimodal reasoning

GLM-4.6V-Flash (Vision - Lightweight)

{
  "model_id": "glm-4.6v-flash",
  "best_for": ["fast_image_analysis", "local_deployment", "low_latency"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["9B parameters", "Fast inference", "Local deployable"],
  "parameters": "9B",
  "released": "2025-12-08"
}

Use when:

  • Quick image classification
  • Edge/local deployment
  • Low-latency vision tasks

GLM-4.5 (Previous Flagship)

{
  "model_id": "glm-4.5",
  "best_for": ["reasoning", "tool_use", "coding", "agents"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["355B MoE", "32B active params", "Proven stability"],
  "parameters": "355B (32B active)"
}

Use when:

  • Need proven stable model
  • Standard reasoning tasks
  • Backward compatibility

GLM-4.5-Air (Efficient)

{
  "model_id": "glm-4.5-air",
  "best_for": ["cost_efficient", "standard_tasks", "high_volume"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["106B MoE", "12B active params", "Cost effective"],
  "parameters": "106B (12B active)"
}

Use when:

  • Cost-sensitive operations
  • High-volume processing
  • Standard quality acceptable

GLM-4.5-Flash (Fast)

{
  "model_id": "glm-4.5-flash",
  "best_for": ["ultra_fast", "simple_tasks", "streaming"],
  "context_window": 32000,
  "max_output": 2048,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["Fastest inference", "Lowest cost", "Simple tasks"]
}

Use when:

  • Real-time responses needed
  • Simple classification/extraction
  • Budget constraints

GLM-Z1-Rumination-32B (Deep Reasoning)

{
  "model_id": "glm-z1-rumination-32b-0414",
  "best_for": ["deep_reasoning", "complex_analysis", "deliberation"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.7,
  "strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"],
  "released": "2025-04-14"
}

Use when:

  • Complex multi-step reasoning
  • Problems requiring deliberation
  • Chain-of-thought tasks

Model Selection Logic

function selectZAIModel(taskType, contextLength, needsVision = false) {
  // Vision tasks
  if (needsVision) {
    return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash';
  }

  // Context-based selection
  if (contextLength > 128000) {
    return 'glm-4.6'; // 200K context
  }

  const modelMap = {
    // Flagship tasks
    'agentic': 'glm-4.6',
    'frontend': 'glm-4.6',
    'tool_use': 'glm-4.6',

    // Deep reasoning
    'deep_reasoning': 'glm-z1-rumination-32b-0414',
    'deliberation': 'glm-z1-rumination-32b-0414',

    // Standard reasoning
    'reasoning': 'glm-4.5',
    'analysis': 'glm-4.5',
    'planning': 'glm-4.5',
    'coding': 'glm-4.5',

    // Cost-efficient
    'cost_efficient': 'glm-4.5-air',
    'high_volume': 'glm-4.5-air',

    // Fast operations
    'classification': 'glm-4.5-flash',
    'extraction': 'glm-4.5-flash',
    'simple_qa': 'glm-4.5-flash',
    'streaming': 'glm-4.5-flash',

    // Default to flagship
    'default': 'glm-4.6'
  };

  return modelMap[taskType] || modelMap.default;
}

Fallback Logic

When to Fallback from synthetic.new

async function callWithFallback(systemPrompt, userPrompt, options = {}) {
  const primaryResult = await callSyntheticAI(systemPrompt, userPrompt, options);

  // Check for fallback conditions
  if (primaryResult.error) {
    const errorCode = primaryResult.error.code;

    // Rate limit or server error - fallback to z.ai
    if ([429, 500, 502, 503, 504].includes(errorCode)) {
      console.log('Falling back to z.ai GLM-4.6');
      return await callZAI(systemPrompt, userPrompt, options);
    }
  }

  return primaryResult;
}

GLM Superiority Conditions

function shouldPreferGLM(task) {
  const glmSuperiorTasks = [
    'chinese_translation',
    'chinese_content',
    'extended_context_200k',
    'vision_analysis',
    'multimodal',
    'frontend_development',
    'deep_rumination',
    'cost_optimization'
  ];

  return glmSuperiorTasks.includes(task.type);
}

n8n Integration

HTTP Request Node Configuration

{
  "method": "POST",
  "url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
  "headers": {
    "Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "model": "glm-4.6",
    "messages": [
      { "role": "system", "content": "{{ systemPrompt }}" },
      { "role": "user", "content": "{{ userPrompt }}" }
    ],
    "max_tokens": 4000,
    "temperature": 0.5
  },
  "timeout": 90000
}

Code Node Helper

// z.ai Request Helper for n8n Code Node
async function callZAI(systemPrompt, userPrompt, options = {}) {
  const {
    model = 'glm-4.6',
    maxTokens = 4000,
    temperature = 0.5
  } = options;

  const response = await $http.request({
    method: 'POST',
    url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions',
    headers: {
      'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: {
      model,
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt }
      ],
      max_tokens: maxTokens,
      temperature
    }
  });

  return response.choices[0].message.content;
}

Comparison: synthetic.new vs z.ai

Feature synthetic.new z.ai
Primary Use All tasks Fallback + GLM tasks
Best Model (Code) DeepSeek-V3 GLM-4.6
Best Model (Reasoning) Kimi-K2-Thinking GLM-Z1-Rumination
Best Model (Vision) N/A GLM-4.6V
Max Context 200K 200K (GLM-4.6)
Chinese Support Good Excellent
Rate Limit 100/min 60/min
Cost (Input) ~$0.50/M $0.40/M (GLM-4.6)
Open Source No Yes (MIT)
Task Complexity:

HIGH    → GLM-Z1-Rumination (deep reasoning)
        → GLM-4.6 (agentic, coding)
        → GLM-4.6V (vision tasks)

MEDIUM  → GLM-4.5 (standard tasks)
        → GLM-4.5-Air (cost-efficient)

LOW     → GLM-4.5-Flash (fast, simple)
        → GLM-4.6V-Flash (fast vision)

Setup Instructions

1. Get API Key

  1. Visit https://z.ai/dashboard or https://open.bigmodel.cn
  2. Create account or login
  3. Navigate to API Keys
  4. Generate new key
  5. Store as Z_AI_API_KEY environment variable

2. Configure in Coolify

# Add to service environment variables
Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62

3. Test Connection

curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \
  -H "Authorization: Bearer $Z_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.6",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'

Error Handling

Error Code Cause Action
401 Invalid API key Check Z_AI_API_KEY
429 Rate limit Wait and retry
400 Invalid model Check model name

References

  • ai-providers/synthetic-new.md - Primary provider
  • code/implement.md - Code generation
  • design-thinking/ideate.md - Solution brainstorming

Token Budget

  • Max input: 500 tokens
  • Max output: 800 tokens

Model

  • Recommended: haiku (configuration lookup)