askill
moonshot-ai

moonshot-aiSafety 100Repository

Moonshot AI Kimi API - Trillion-parameter MoE model with 256K context, tool calling, and agentic capabilities for chat, coding, and autonomous task execution

1 stars
1.2k downloads
Updated 2/15/2026

Package Files

Loading files...
SKILL.md

Moonshot AI Skill

Moonshot AI provides the Kimi large language model series, featuring the flagship Kimi K2 - a state-of-the-art mixture-of-experts (MoE) model with 1 trillion total parameters. The API offers OpenAI-compatible endpoints with 256K context length, strong tool calling capabilities, and competitive pricing.

Key Value Proposition: Access a trillion-parameter model optimized for agentic tasks, tool use, and coding at significantly lower costs than competitors (up to 100x cheaper than GPT-4 for some tasks), with excellent multilingual support for Chinese and English.

When to Use This Skill

  • Integrating Moonshot AI/Kimi models into applications
  • Building agentic AI systems with autonomous tool calling
  • Processing long documents with 128K-256K context windows
  • Developing cost-effective LLM solutions
  • Creating multilingual applications (Chinese/English)
  • Implementing function calling and tool use patterns

When NOT to Use This Skill

  • For OpenAI API specifically (use openai skill)
  • For Claude/Anthropic API (use anthropic skill)
  • For image generation or multimodal tasks (Kimi is text-focused)
  • For models requiring real-time voice interaction

Core Concepts

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Moonshot AI Platform                          │
│                  platform.moonshot.ai                            │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│  Kimi K2      │    │ moonshot-v1   │    │   Tool Use    │
│  (Latest)     │    │  (Legacy)     │    │               │
├───────────────┤    ├───────────────┤    ├───────────────┤
│ • 1T params   │    │ • v1-8k       │    │ • Functions   │
│ • 32B active  │    │ • v1-32k      │    │ • Web Search  │
│ • 128K-256K   │    │ • v1-128k     │    │ • Code Exec   │
│ • MoE arch    │    │               │    │ • Custom      │
└───────────────┘    └───────────────┘    └───────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                              ▼
                    ┌───────────────────┐
                    │   API Endpoints   │
                    ├───────────────────┤
                    │ • OpenAI compat   │
                    │ • Anthropic compat│
                    │ • Streaming       │
                    │ • Tool calling    │
                    └───────────────────┘

Model Specifications

ModelParametersActiveContextBest For
kimi-k2-0905-preview1T32B256KLatest, agentic tasks
kimi-k2-turbo-preview1T32B128KFast, general use
kimi-k2-thinking1T32B128KMulti-step reasoning
moonshot-v1-8k--8KShort context
moonshot-v1-32k--32KMedium context
moonshot-v1-128k--128KLong documents
kimi-latest--AutoAuto-selects tier

Kimi K2 Technical Details

Architecture: Mixture-of-Experts (MoE)
Total Parameters: 1 Trillion
Activated Parameters: 32 Billion per token
Layers: 61 (including 1 dense layer)
Experts: 384 total, 8 selected per token
Attention: MLA (Multi-head Latent Attention)
Activation: SwiGLU
Vocabulary: 160K tokens
Context: 128K tokens (256K for 0905-preview)
Training Data: 15.5T tokens

Quick Start

Get API Key

  1. Visit platform.moonshot.ai
  2. Create an account
  3. Generate API key from dashboard

Environment Setup

export MOONSHOT_API_KEY="your-api-key-here"

# Optional: Use China endpoint
export MOONSHOT_API_BASE="https://api.moonshot.cn/v1"

Basic Chat Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.moonshot.ai/v1"
)

response = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[
        {"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.6,  # Recommended
    max_tokens=1024
)

print(response.choices[0].message.content)

API Reference

Base URLs

RegionURL
Globalhttps://api.moonshot.ai/v1
Chinahttps://api.moonshot.cn/v1

Authentication

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2-0905-preview",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Chat Completions

Endpoint: POST /v1/chat/completions

Request Parameters:

ParameterTypeRequiredDescription
modelstringYesModel identifier
messagesarrayYesConversation history
temperaturefloatNo0.0-1.0, recommended 0.6
max_tokensintNoMaximum response length
streamboolNoEnable streaming
top_pfloatNoNucleus sampling
toolsarrayNoFunction definitions
tool_choicestringNoauto, none, or specific

Message Format:

{
  "messages": [
    {"role": "system", "content": "System prompt"},
    {"role": "user", "content": "User message"},
    {"role": "assistant", "content": "Previous response"},
    {"role": "user", "content": [
      {"type": "text", "text": "Multimodal content"}
    ]}
  ]
}

Response:

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "kimi-k2-0905-preview",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Response text"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  }
}

Tool Calling / Function Calling

Kimi K2 has strong native support for tool calling, enabling agentic applications.

Define Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "required": ["city"],
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                }
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "required": ["query"],
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                }
            }
        }
    }
]

Make Tool Call Request

response = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=tools,
    tool_choice="auto",
    temperature=0.6
)

# Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

Complete Tool Call Loop

import json

def execute_tool(name: str, args: dict) -> str:
    """Execute tool and return result."""
    if name == "get_weather":
        return json.dumps({"temp": 22, "condition": "sunny"})
    elif name == "search_web":
        return json.dumps({"results": ["Result 1", "Result 2"]})
    return json.dumps({"error": "Unknown tool"})

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

while True:
    response = client.chat.completions.create(
        model="kimi-k2-0905-preview",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        temperature=0.6
    )

    message = response.choices[0].message
    messages.append(message)

    if not message.tool_calls:
        # No more tool calls, done
        print(message.content)
        break

    # Execute each tool call
    for tool_call in message.tool_calls:
        result = execute_tool(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result
        })

Streaming

Python Streaming

stream = client.chat.completions.create(
    model="kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True,
    temperature=0.6
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JavaScript/Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: 'https://api.moonshot.ai/v1'
});

async function chat() {
  const stream = await client.chat.completions.create({
    model: 'kimi-k2-0905-preview',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

cURL Streaming

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2-0905-preview",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Pricing

Kimi K2 Models

ModelInput (per 1M tokens)Output (per 1M tokens)
kimi-k2-0905-preview~$0.15~$2.50
kimi-k2-turbo-preview~$0.15~$2.50

moonshot-v1 Models (kimi-latest auto-selects)

Context TierInput (per 1M tokens)Output (per 1M tokens)
8K$0.20$2.00
32K$1.00$3.00
128K$2.00$5.00

Built-in Tools

ToolCost per Call
$web_search~$0.005

LiteLLM Integration

Configuration

from litellm import completion

response = completion(
    model="moonshot/kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Hello"}]
)

Proxy Config (config.yaml)

model_list:
  - model_name: kimi-k2
    litellm_params:
      model: moonshot/kimi-k2-0905-preview
      api_key: os.environ/MOONSHOT_API_KEY

  - model_name: kimi-128k
    litellm_params:
      model: moonshot/moonshot-v1-128k
      api_key: os.environ/MOONSHOT_API_KEY

Handled Quirks

LiteLLM automatically handles:

  • Temperature capping: Values > 1 are clamped
  • Temperature constraint: Sets to 0.3 when temp < 0.3 and n > 1
  • Tool choice: Converts "required" by adding context

Anthropic-Compatible API

Moonshot also offers an Anthropic-compatible API endpoint:

from anthropic import Anthropic

client = Anthropic(
    api_key="your-moonshot-key",
    base_url="https://api.moonshot.ai/v1"
)

# Note: Temperature mapping
# real_temperature = request_temperature * 0.6
response = client.messages.create(
    model="kimi-k2-0905-preview",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024,
    temperature=1.0  # Will become 0.6 internally
)

Best Practices

Temperature Settings

# Recommended default
temperature = 0.6

# For creative tasks
temperature = 0.8

# For factual/deterministic tasks
temperature = 0.3

System Prompts

# Default system prompt (good starting point)
system_prompt = "You are Kimi, an AI assistant created by Moonshot AI."

# Custom for specific tasks
system_prompt = """You are a coding assistant.
Provide clean, well-documented code with explanations.
Use Python unless otherwise specified."""

Long Context Usage

# For documents up to 256K tokens
response = client.chat.completions.create(
    model="kimi-k2-0905-preview",  # Supports 256K
    messages=[
        {"role": "system", "content": "Analyze the following document."},
        {"role": "user", "content": very_long_document}
    ],
    temperature=0.3  # Lower for analysis tasks
)

Performance Benchmarks

BenchmarkScoreNotes
AIME 202469.6%Math reasoning
MATH-50097.4%Mathematics
LiveCodeBench53.7%Code generation
SWE-bench Verified71.6%Agentic coding
MMLU89.5%General knowledge
MMLU-Redux92.7%Updated evaluation
Tau2 Retail70.6%Tool use
AceBench76.5%Agent evaluation

Troubleshooting

Authentication Errors

Error: 401 Unauthorized

Solutions:

  1. Verify API key is correct
  2. Check environment variable is set
  3. Ensure key hasn't expired

Rate Limiting

Error: 429 Too Many Requests

Solutions:

  1. Implement exponential backoff
  2. Reduce request frequency
  3. Consider upgrading plan

Context Length Exceeded

Error: Context length exceeded

Solutions:

  1. Use longer context model (kimi-k2-0905-preview for 256K)
  2. Truncate input text
  3. Summarize previous messages

Tool Call Issues

Error: Invalid tool definition

Solutions:

  1. Verify JSON schema is valid
  2. Check required fields are present
  3. Ensure parameter types are correct

Resources

Official Documentation

Open Source

Integration Guides

Support


Version History

  • 1.0.0 (2026-01-12): Initial skill release
    • Complete Kimi K2 model documentation
    • API reference with all parameters
    • Tool calling / function calling guide
    • Streaming examples (Python, Node.js, cURL)
    • Pricing information
    • LiteLLM and Anthropic-compatible API integration
    • Performance benchmarks
    • Troubleshooting guide

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/18/2026

High-quality, comprehensive API documentation skill for Moonshot AI Kimi. Includes architecture diagrams, model specifications, step-by-step quick start, full API reference, tool calling examples with complete loop implementation, streaming examples in multiple languages, pricing tables, and LiteLLM integration. Well-structured with clear when to use/not use sections. No internal-only indicators - this is a general reference skill for an external LLM API service.

100
95
95
95
95

Metadata

Licenseunknown
Version1.0.0
Updated2/15/2026
Publishermajiayu000

Tags

apigithubllmprompting