Github开源生信云平台 DEMO
data: [DONE]
finish_reason == "tool_calls"
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-plus", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "杭州天气怎么样" } ], "stream":true, "tools": [ { "type": "function", "function": { "name": "get_current_time", "description": "当你想知道现在的时间时非常有用。", "parameters": {} } }, { "type": "function", "function": { "name": "get_current_weather", "description": "当你想查询指定城市的天气时非常有用。", "parameters": { "type": "object", "properties": { "location":{ "type": "string", "description": "城市或县区,比如北京市、杭州市、余杭区等。" } }, "required": ["location"] } } } ] }'
data: { "choices": [ { "delta": { "content": null, "tool_calls": [ { "index": 0, "id": "call_0bdcc155f2534f65a05cb1", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":" } } ], "role": "assistant" }, "finish_reason": null, "index": 0, "logprobs": null } ], "object": "chat.completion.chunk", "usage": null, "created": 1770343950, "system_fingerprint": null, "model": "qwen-plus", "id": "chatcmpl-6b9f079d-c440-9fc5-bb6a-963ad8387e02" } data: { "choices": [ { "delta": { "tool_calls": [ { "function": { "arguments": " \"杭州市\"}" }, "index": 0, "id": "", "type": "function" } ] }, "index": 0 } ], "object": "chat.completion.chunk", "usage": null, "created": 1770343950, "system_fingerprint": null, "model": "qwen-plus", "id": "chatcmpl-6b9f079d-c440-9fc5-bb6a-963ad8387e02" } data: { "choices": [ { "finish_reason": "tool_calls", "delta": {}, "index": 0, "logprobs": null } ], "object": "chat.completion.chunk", "usage": null, "created": 1770343950, "system_fingerprint": null, "model": "qwen-plus", "id": "chatcmpl-6b9f079d-c440-9fc5-bb6a-963ad8387e02" } data: [DONE ]
这个例子非常标准,而且比刚才那段更“高级”——这是 LLM 在流式生成 function / tool call。我们还是用同一把刀,严格区分:模型输出 vs API 包装。
我先给你一句 总原则,再逐条拆。
只要是在 choices[].delta 里面、并且是模型“决定的内容”,都属于 LLM 输出控制流、结束标志、DONE、usage、id、object、model 等,都是 API 厂商加的
choices[].delta
在 tool_calls 场景 下,LLM 输出的不是自然语言,而是:
"delta": { "content": null, "tool_calls": [{ "index": 0, "id": "call_0bdcc155f2534f65a05cb1", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":" } }], "role": "assistant" }
✅ LLM 输出:
tool_calls[0].function.name = "get_current_weather"
tool_calls[0].function.arguments = "{\"location\":
⚠️ 注意:arguments 是 token 级流式拼 JSON,不是一次性给完
arguments
❌ API 协议字段:
role: "assistant"
index
id: "call_0bdcc155f2534f65a05cb1"
type: "function"
📌 结论:这是 LLM 决定“我要调用 get_current_weather,并开始写参数”
"delta": { "tool_calls": [{ "function": { "arguments": " \"杭州市\"}" } }] }
{"location": "杭州市"}
📌 结论:
index / id / type
"finish_reason": "tool_calls", "delta": {}
模型已经 完成 tool call 的生成
tool_calls
❌ 完全不是 LLM 输出✅ API 控制信号
❌ 不是 LLM 输出✅ SSE 协议层结束标记
delta.content
delta.tool_calls[].function.name
delta.tool_calls[].function.arguments
delta.role
tool_calls[].id
finish_reason = tool_calls
// pseudo args[index] += delta.tool_calls[i].function.arguments
👉 不要再等 content👉 立即执行工具
LLM 并不是“调用函数” 它只是 生成了一段结构化 token 👉 “要调用哪个工具 + 参数长什么样” 真正调用函数,是你 / API 网关做的
LLM 并不是“调用函数”
它只是 生成了一段结构化 token
👉 “要调用哪个工具 + 参数长什么样”
真正调用函数,是你 / API 网关做的
如果你愿意,下一步我可以直接帮你:
你现在已经在 实现 LLM Runtime 层 了,不是普通使用者 😄