Compare commits

...

2 Commits

Author SHA1 Message Date
4c6ec6f10d Update z.ai skill with latest GLM models (Dec 2025)
- Add GLM-4.6 (flagship, 200K context, agentic)
- Add GLM-4.6V and GLM-4.6V-Flash (vision models)
- Add GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash
- Add GLM-Z1-Rumination-32B (deep reasoning)
- Update model selection logic and pricing
- Add references to official documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 19:45:30 +01:00
d04a7439a8 Update z.ai skill with correct BigModel API URL
- Update base_url to https://open.bigmodel.cn/api/paas/v4
- Mark API key as configured

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 19:29:28 +01:00

View File

@@ -1,108 +1,208 @@
# Skill: AI Provider - z.ai
## Description
Fallback AI provider with GLM (General Language Model) support. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.
Fallback AI provider with GLM (General Language Model) support from Zhipu AI. Use when synthetic.new is unavailable or when GLM models are superior for specific tasks.
## Status
**FALLBACK** - Use when:
1. synthetic.new rate limits or errors
2. GLM models outperform alternatives for the task
3. New models available earlier on z.ai
4. Extended context (200K+) needed
5. Vision/multimodal tasks required
## Configuration
```yaml
provider: z.ai
base_url: https://api.z.ai/v1
provider: z.ai (Zhipu AI / BigModel)
base_url: https://open.bigmodel.cn/api/paas/v4
api_key_env: Z_AI_API_KEY
compatibility: openai
rate_limit: 60 requests/minute
```
**Note:** API key needs to be configured. Check z.ai dashboard for key generation.
**API Key configured:** `Z_AI_API_KEY` in environment variables.
## Available Models
## Available Models (Updated Dec 2025)
### GLM-4-Plus (Reasoning & Analysis)
### GLM-4.6 (Flagship - Latest)
```json
{
"model_id": "glm-4-plus",
"best_for": ["reasoning", "analysis", "chinese_language", "long_context"],
"model_id": "glm-4.6",
"best_for": ["agentic", "reasoning", "coding", "frontend_dev"],
"context_window": 202752,
"max_output": 128000,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["200K context", "Tool use", "Agent workflows", "15% more token efficient"],
"pricing": { "input": "$0.40/M", "output": "$1.75/M" },
"released": "2025-09-30"
}
```
**Use when:**
- Complex agentic tasks
- Advanced reasoning
- Frontend/UI development
- Tool-calling workflows
- Extended context needs (200K)
### GLM-4.6V (Vision - Latest)
```json
{
"model_id": "glm-4.6v",
"best_for": ["image_analysis", "multimodal", "document_processing", "video_understanding"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["Native tool calling", "150 pages/1hr video input", "SOTA vision understanding"],
"parameters": "106B",
"released": "2025-12-08"
}
```
**Use when:**
- Image analysis and understanding
- Document OCR and processing
- Video content analysis
- Multimodal reasoning
### GLM-4.6V-Flash (Vision - Lightweight)
```json
{
"model_id": "glm-4.6v-flash",
"best_for": ["fast_image_analysis", "local_deployment", "low_latency"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["9B parameters", "Fast inference", "Local deployable"],
"parameters": "9B",
"released": "2025-12-08"
}
```
**Use when:**
- Quick image classification
- Edge/local deployment
- Low-latency vision tasks
### GLM-4.5 (Previous Flagship)
```json
{
"model_id": "glm-4.5",
"best_for": ["reasoning", "tool_use", "coding", "agents"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["Chinese content", "Logical reasoning", "Long documents"]
"strengths": ["355B MoE", "32B active params", "Proven stability"],
"parameters": "355B (32B active)"
}
```
**Use when:**
- Complex logical reasoning
- Chinese language content
- Long document analysis
- Comparative analysis
- Need proven stable model
- Standard reasoning tasks
- Backward compatibility
### GLM-4-Flash (Fast Responses)
### GLM-4.5-Air (Efficient)
```json
{
"model_id": "glm-4-flash",
"best_for": ["quick_responses", "simple_tasks", "high_volume"],
"model_id": "glm-4.5-air",
"best_for": ["cost_efficient", "standard_tasks", "high_volume"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.5,
"strengths": ["106B MoE", "12B active params", "Cost effective"],
"parameters": "106B (12B active)"
}
```
**Use when:**
- Cost-sensitive operations
- High-volume processing
- Standard quality acceptable
### GLM-4.5-Flash (Fast)
```json
{
"model_id": "glm-4.5-flash",
"best_for": ["ultra_fast", "simple_tasks", "streaming"],
"context_window": 32000,
"max_output": 2048,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["Speed", "Cost efficiency", "Simple tasks"]
"strengths": ["Fastest inference", "Lowest cost", "Simple tasks"]
}
```
**Use when:**
- Quick classification
- Simple transformations
- High-volume processing
- Cost-sensitive operations
- Real-time responses needed
- Simple classification/extraction
- Budget constraints
### GLM-4-Long (Extended Context)
### GLM-Z1-Rumination-32B (Deep Reasoning)
```json
{
"model_id": "glm-4-long",
"best_for": ["long_documents", "codebase_analysis", "summarization"],
"context_window": 1000000,
"model_id": "glm-z1-rumination-32b-0414",
"best_for": ["deep_reasoning", "complex_analysis", "deliberation"],
"context_window": 128000,
"max_output": 4096,
"temperature_range": [0.0, 1.0],
"recommended_temp": 0.3,
"strengths": ["1M token context", "Document processing", "Code analysis"]
"recommended_temp": 0.7,
"strengths": ["Rumination capability", "Step-by-step reasoning", "Complex problems"],
"released": "2025-04-14"
}
```
**Use when:**
- Entire codebase analysis
- Long document summarization
- Multi-file code review
- Complex multi-step reasoning
- Problems requiring deliberation
- Chain-of-thought tasks
## Model Selection Logic
```javascript
function selectZAIModel(taskType, contextLength) {
function selectZAIModel(taskType, contextLength, needsVision = false) {
// Vision tasks
if (needsVision) {
return contextLength > 64000 ? 'glm-4.6v' : 'glm-4.6v-flash';
}
// Context-based selection
if (contextLength > 128000) {
return 'glm-4-long';
return 'glm-4.6'; // 200K context
}
const modelMap = {
// Flagship tasks
'agentic': 'glm-4.6',
'frontend': 'glm-4.6',
'tool_use': 'glm-4.6',
// Deep reasoning
'deep_reasoning': 'glm-z1-rumination-32b-0414',
'deliberation': 'glm-z1-rumination-32b-0414',
// Standard reasoning
'reasoning': 'glm-4.5',
'analysis': 'glm-4.5',
'planning': 'glm-4.5',
'coding': 'glm-4.5',
// Cost-efficient
'cost_efficient': 'glm-4.5-air',
'high_volume': 'glm-4.5-air',
// Fast operations
'classification': 'glm-4-flash',
'extraction': 'glm-4-flash',
'simple_qa': 'glm-4-flash',
'classification': 'glm-4.5-flash',
'extraction': 'glm-4.5-flash',
'simple_qa': 'glm-4.5-flash',
'streaming': 'glm-4.5-flash',
// Complex reasoning
'reasoning': 'glm-4-plus',
'analysis': 'glm-4-plus',
'planning': 'glm-4-plus',
// Long context
'codebase': 'glm-4-long',
'summarization': 'glm-4-long',
// Default
'default': 'glm-4-plus'
// Default to flagship
'default': 'glm-4.6'
};
return modelMap[taskType] || modelMap.default;
@@ -122,7 +222,7 @@ async function callWithFallback(systemPrompt, userPrompt, options = {}) {
// Rate limit or server error - fallback to z.ai
if ([429, 500, 502, 503, 504].includes(errorCode)) {
console.log('Falling back to z.ai');
console.log('Falling back to z.ai GLM-4.6');
return await callZAI(systemPrompt, userPrompt, options);
}
}
@@ -137,9 +237,12 @@ function shouldPreferGLM(task) {
const glmSuperiorTasks = [
'chinese_translation',
'chinese_content',
'million_token_context',
'cost_optimization',
'new_model_testing'
'extended_context_200k',
'vision_analysis',
'multimodal',
'frontend_development',
'deep_rumination',
'cost_optimization'
];
return glmSuperiorTasks.includes(task.type);
@@ -152,13 +255,13 @@ function shouldPreferGLM(task) {
```json
{
"method": "POST",
"url": "https://api.z.ai/v1/chat/completions",
"url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
"headers": {
"Authorization": "Bearer {{ $env.Z_AI_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"model": "glm-4-plus",
"model": "glm-4.6",
"messages": [
{ "role": "system", "content": "{{ systemPrompt }}" },
{ "role": "user", "content": "{{ userPrompt }}" }
@@ -175,14 +278,14 @@ function shouldPreferGLM(task) {
// z.ai Request Helper for n8n Code Node
async function callZAI(systemPrompt, userPrompt, options = {}) {
const {
model = 'glm-4-plus',
model = 'glm-4.6',
maxTokens = 4000,
temperature = 0.5
} = options;
const response = await $http.request({
method: 'POST',
url: 'https://api.z.ai/v1/chat/completions',
url: 'https://open.bigmodel.cn/api/paas/v4/chat/completions',
headers: {
'Authorization': `Bearer ${$env.Z_AI_API_KEY}`,
'Content-Type': 'application/json'
@@ -207,17 +310,35 @@ async function callZAI(systemPrompt, userPrompt, options = {}) {
| Feature | synthetic.new | z.ai |
|---------|---------------|------|
| Primary Use | All tasks | Fallback + GLM tasks |
| Best Model (Code) | DeepSeek-V3 | GLM-4-Flash |
| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-4-Plus |
| Max Context | 200K | 1M (GLM-4-Long) |
| Best Model (Code) | DeepSeek-V3 | GLM-4.6 |
| Best Model (Reasoning) | Kimi-K2-Thinking | GLM-Z1-Rumination |
| Best Model (Vision) | N/A | GLM-4.6V |
| Max Context | 200K | 200K (GLM-4.6) |
| Chinese Support | Good | Excellent |
| Rate Limit | 100/min | 60/min |
| Cost | Standard | Lower (Flash) |
| Cost (Input) | ~$0.50/M | $0.40/M (GLM-4.6) |
| Open Source | No | Yes (MIT) |
## Model Hierarchy (Recommended)
```
Task Complexity:
HIGH → GLM-Z1-Rumination (deep reasoning)
→ GLM-4.6 (agentic, coding)
→ GLM-4.6V (vision tasks)
MEDIUM → GLM-4.5 (standard tasks)
→ GLM-4.5-Air (cost-efficient)
LOW → GLM-4.5-Flash (fast, simple)
→ GLM-4.6V-Flash (fast vision)
```
## Setup Instructions
### 1. Get API Key
1. Visit https://z.ai/dashboard
1. Visit https://z.ai/dashboard or https://open.bigmodel.cn
2. Create account or login
3. Navigate to API Keys
4. Generate new key
@@ -226,16 +347,16 @@ async function callZAI(systemPrompt, userPrompt, options = {}) {
### 2. Configure in Coolify
```bash
# Add to service environment variables
Z_AI_API_KEY=your_key_here
Z_AI_API_KEY=60d1f6bb3ef74aa7a42680dd85f5ac4b.hxa0gtYtoHfBRI62
```
### 3. Test Connection
```bash
curl -X POST https://api.z.ai/v1/chat/completions \
curl -X POST https://open.bigmodel.cn/api/paas/v4/chat/completions \
-H "Authorization: Bearer $Z_AI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4-flash",
"model": "glm-4.6",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 50
}'
@@ -248,6 +369,12 @@ curl -X POST https://api.z.ai/v1/chat/completions \
| 429 | Rate limit | Wait and retry |
| 400 | Invalid model | Check model name |
## References
- [GLM-4.6 Announcement](https://z.ai/blog/glm-4.6)
- [GLM-4.6V Multimodal](https://z.ai/blog/glm-4.6v)
- [OpenRouter GLM-4.6](https://openrouter.ai/z-ai/glm-4.6)
- [Hugging Face Models](https://huggingface.co/zai-org)
## Related Skills
- `ai-providers/synthetic-new.md` - Primary provider
- `code/implement.md` - Code generation