skills-library/skills/ai-providers/z-ai.md

# Skill: AI Provider - z.ai

## Description
Fallback AI provider with GLM (General Language Model) support. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.

## Status
**FALLBACK** - Use when:
1. synthetic.new rate limits or errors
2. GLM models outperform alternatives for the task
3. New models available earlier on z.ai

## Configuration
```yaml
provider: z.ai
base_url: https://api.z.ai/v1
api_key_env: Z_AI_API_KEY
compatibility: openai
rate_limit: 60 requests/minute
```

**Note:** API key needs to be configured. Check z.ai dashboard for key generation.

## Available Models

### GLM-4-Plus (Reasoning & Analysis)
```json
{
  "model_id": "glm-4-plus",
  "best_for": ["reasoning", "analysis", "chinese_language", "long_context"],
  "context_window": 128000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.5,
  "strengths": ["Chinese content", "Logical reasoning", "Long documents"]
}
```

**Use when:**
- Complex logical reasoning
- Chinese language content
- Long document analysis
- Comparative analysis

### GLM-4-Flash (Fast Responses)
```json
{
  "model_id": "glm-4-flash",
  "best_for": ["quick_responses", "simple_tasks", "high_volume"],
  "context_window": 32000,
  "max_output": 2048,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["Speed", "Cost efficiency", "Simple tasks"]
}
```

**Use when:**
- Quick classification
- Simple transformations
- High-volume processing
- Cost-sensitive operations

### GLM-4-Long (Extended Context)
```json
{
  "model_id": "glm-4-long",
  "best_for": ["long_documents", "codebase_analysis", "summarization"],
  "context_window": 1000000,
  "max_output": 4096,
  "temperature_range": [0.0, 1.0],
  "recommended_temp": 0.3,
  "strengths": ["1M token context", "Document processing", "Code analysis"]
}
```

**Use when:**
- Entire codebase analysis
- Long document summarization
- Multi-file code review

## Model Selection Logic
```javascript
function selectZAIModel(taskType, contextLength) {
  // Context-based selection
  if (contextLength > 128000) {
    return 'glm-4-long';
  }

  const modelMap = {
    // Fast operations
    'classification': 'glm-4-flash',
    'extraction': 'glm-4-flash',
    'simple_qa': 'glm-4-flash',

    // Complex reasoning
    'reasoning': 'glm-4-plus',
    'analysis': 'glm-4-plus',
    'planning': 'glm-4-plus',

    // Long context
    'codebase': 'glm-4-long',
    'summarization': 'glm-4-long',

    // Default
    'default': 'glm-4-plus'
  };

  return modelMap[taskType] || modelMap.default;
}
```

## Fallback Logic

### When to Fallback from synthetic.new
```javascript
async function callWithFallback(systemPrompt, userPrompt, options = {}) {
  const primaryResult = await callSyntheticAI(systemPrompt, userPrompt, options);

  // Check for fallback conditions
  if (primaryResult.error) {
    const errorCode = primaryResult.error.code;

    // Rate limit or server error - fallback to z.ai
    if ([429, 500, 502, 503, 504].includes(errorCode)) {
      console.log('Falling back to z.ai');
      return await callZAI(systemPrompt, userPrompt, options);
    }
  }

  return primaryResult;
}
```

### GLM Superiority Conditions
```javascript
function shouldPreferGLM(task) {
  const glmSuperiorTasks = [
    'chinese_translation',
    'chinese_content',
    'million_token_context',
    'cost_optimization',
    'new_model_testing'
  ];

  return glmSuperiorTasks.includes(task.type);
}
```

## n8n Integration

### HTTP Request Node Configuration
```json
{
  "method": "POST",
  "url": "https://api.z.ai/v1/chat/completions",
  "headers": {
    "Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "model": "glm-4-plus",
    "messages": [
      { "role": "system", "content": "{{ systemPrompt }}" },
      { "role": "user", "content": "{{ userPrompt }}" }
    ],
    "max_tokens": 4000,
    "temperature": 0.5
  },
  "timeout": 90000
}
```

### Code Node Helper
```javascript
// z.ai Request Helper for n8n Code Node
async function callZAI(systemPrompt, userPrompt, options = {}) {
  const {
    model = 'glm-4-plus',
    maxTokens = 4000,
    temperature = 0.5
  } = options;

  const response = await $http.request({
    method: 'POST',
    url: 'https://api.z.ai/v1/chat/completions',
    headers: {
      'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: {
      model,
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt }
      ],
      max_tokens: maxTokens,
      temperature
    }
  });

  return response.choices[0].message.content;
}
```

## Comparison: synthetic.new vs z.ai

| Feature | synthetic.new | z.ai |
|---------|---------------|------|
| Primary Use | All tasks | Fallback + GLM tasks |
| Best Model (Code) | DeepSeek-V3 | GLM-4-Flash |
| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-4-Plus |
| Max Context | 200K | 1M (GLM-4-Long) |
| Chinese Support | Good | Excellent |
| Rate Limit | 100/min | 60/min |
| Cost | Standard | Lower (Flash) |

## Setup Instructions

### 1. Get API Key
1. Visit https://z.ai/dashboard
2. Create account or login
3. Navigate to API Keys
4. Generate new key
5. Store as `Z_AI_API_KEY` environment variable

### 2. Configure in Coolify
```bash
# Add to service environment variables
Z_AI_API_KEY=your_key_here
```

### 3. Test Connection
```bash
curl -X POST https://api.z.ai/v1/chat/completions \
  -H "Authorization: Bearer $Z_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'
```

## Error Handling
| Error Code | Cause | Action |
|------------|-------|--------|
| 401 | Invalid API key | Check Z_AI_API_KEY |
| 429 | Rate limit | Wait and retry |
| 400 | Invalid model | Check model name |

## Related Skills
- `ai-providers/synthetic-new.md` - Primary provider
- `code/implement.md` - Code generation
- `design-thinking/ideate.md` - Solution brainstorming

## Token Budget
- Max input: 500 tokens
- Max output: 800 tokens

## Model
- Recommended: haiku (configuration lookup)