大家好!🌺 今天,我们发布 Ornith-1.0——一个自我改进的开源智能体编码模型系列。
主要亮点:
本模型卡片介绍 Ornith-1.0-35B,它是 Ornith 系列中的轻量级成员,专为高效单 GPU 部署而设计。
| Ornith-1.0-35B | Qwen3.5-35B | Qwen3.6-35B | Gemma4-31B | Qwen3.5-397B | |
|---|---|---|---|---|---|
| 智能体编码 | |||||
| Terminal-Bench 2.1 (Terminus-2) | 64.2 | 41.4 | 52.5 | 42.1 | 53.5 |
| Terminal-Bench 2.1 (Claude Code) | 62.8 | 38.9 | 49.2 | - | 48.6 |
| SWE-bench Verified | 75.6 | 70 | 73.4 | 52 | 76.4 |
| SWE-bench Pro | 50.4 | 44.6 | 49.5 | 35.7 | 51.6 |
| SWE-bench Multilingual | 69.3 | 60.3 | 67.2 | 51.7 | 69.3 |
| NL2Repo | 34.6 | 20.5 | 29.4 | 15.5 | 36.8 |
| Claw-eval Avg | 69.8 | 65.4 | 68.7 | 48.5 | 70.7 |
| SWE Atlas - QnA | 37.1 | 13.2 | 15.5 | - | 20.4 |
| SWE Atlas - RF | 29.7 | 10.2 | 11.4 | - | 18.4 |
| SWE Atlas - TW | 27.8 | 9.8 | 13.3 | - | 18.5 |
* Terminal-Bench 2.1 (Terminus-2):我们使用 Harbor/Terminus-2 框架评估 Terminal-Bench 2.1,参数设置为 parser=json、temperature=1.0、top_p=1.0 以及 128K 上下文窗口。每次运行使用 4 小时超时,配备 32 个 CPU 核心和 48GB 内存,结果取 5 次运行的平均值。我们调整了 Qwen 对话模板以确保训练和推理之间的一致性(https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/chat_template.jinja),并修改了 Harbor 以适配 vLLM 的 reasoning_content 键。
* Terminal-Bench 2.1 (Claude Code):我们使用 Claude Code 2.1.126 评估 Terminal-Bench 2.1,参数设置为 parser=json、temperature=1.0、top_p=1.0、max_new_tokens=131072。结果取 5 次运行的平均值。同样,Qwen 对话模板需要修改。
* SWE-Bench Verified、Pro 和 Multilingual:使用 OpenHands 工具包,参数设置为 temp=1.0、top_p=0.95、256k 上下文窗口。
* SWE Atlas QnA、RF、TW:使用 mini SWE 智能体工具包,参数设置为 temp=1.0、top_p=0.95、128K 上下文窗口。结果取 5 次运行的平均值。
* NL2Repo:参数设置为 temperature=1.0、top_p=1.0、400K 上下文、48K 输出长度以及防黑客过滤。
* ClawEval:基于真实用户任务分布的智能体代码基准测试;参数设置为 temp=0.6 和 256K 上下文。
Ornith-1.0-35B 是一个 推理模型:默认情况下,助手回复会先以 <think> … </think> 块开头,然后才是最终答案。以下提供的服务配置方案集成了推理解析器,可将思维链(chain-of-thought)单独返回至 reasoning_content 字段;同时也集成了工具调用解析器,能将模型的 <tool_call> 块转换为 OpenAI 风格的 tool_calls 格式。
部署 Ornith-1.0-35B 需要使用较新版本的运行环境:
以下两种方案可在单节点 8×80GB GPU(张量并行度为 8)上搭建兼容 OpenAI API 的服务。请根据您实际拥有的 GPU 数量调整 --tensor-parallel-size / --tp 参数。
vllm serve deepreinforce-ai/Ornith-1.0-35B \
--served-model-name Ornith-1.0-35B \
--tensor-parallel-size 8 \
--host 0.0.0.0 --port 8000 \
--max-model-len 262144 \
--gpu-memory-utilization 0.90 \
--enable-prefix-caching \
--enable-auto-tool-choice --tool-call-parser qwen3_xml \
--reasoning-parser qwen3 \
--trust-remote-codepython -m sglang.launch_server \
--model-path deepreinforce-ai/Ornith-1.0-35B \
--served-model-name Ornith-1.0-35B \
--tp 8 \
--host 0.0.0.0 --port 8000 \
--context-length 262144 \
--mem-fraction-static 0.85 \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3若要进行快速本地测试(或编写离线生成脚本),可直接使用Transformers加载模型。请确保已安装最新版本——详见Transformers安装指南;Ornith-1.0-35B要求transformers >= 5.8.1。
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "deepreinforce-ai/Ornith-1.0-35B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Write a Python function is_prime(n). Keep it short."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
generated = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.95,
top_k=20,
)
output_ids = generated[0][inputs.input_ids.shape[1]:]
# The reply contains a <think> ... </think> reasoning block followed by the answer.
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)为将推理过程与最终答案分离,请根据 </think> 标记进行解析:
text = tokenizer.decode(output_ids, skip_special_tokens=True)
if "</think>" in text:
reasoning, answer = text.split("</think>", 1)
reasoning = reasoning.replace("<think>", "").strip()
answer = answer.strip()
else:
reasoning, answer = "", text.strip()一旦 vLLM 或 SGLang 服务器运行起来,就可以使用任何与 OpenAI 兼容的客户端与其对话。
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY", # any non-empty string works for a local server
)
response = client.chat.completions.create(
model="Ornith-1.0-35B",
messages=[
{"role": "user", "content": "Write a one-line Python lambda that squares a number."}
],
temperature=0.6,
top_p=0.95,
max_tokens=1024,
)
message = response.choices[0].message
# reasoning_content holds the <think> trace; content holds the final answer.
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)您还可以流式传输 tokens,或为模型提供工具——Ornith-1.0-35B 会生成格式规范的函数调用,服务器会将其解析为标准的 tool_calls 字段:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="Ornith-1.0-35B",
messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
tools=tools,
tool_choice="auto",
temperature=0.6,
max_tokens=2048,
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)
# -> get_weather {"city": "Paris"}您可以将任何与 OpenAI 兼容的 SDK(Python、Node.js 等)或 curl 指向相同的 /v1/chat/completions 端点。
Ornith-1.0-35B 在工具调用和智能体编码能力方面表现出色。
由于 Ornith-1.0-35B 公开了一个支持工具调用的 OpenAI 兼容端点,因此它可以直接与标准智能体框架配合使用。以下是一个通过 MCP 服务器将 Ornith-1.0-35B 连接到工具的简单示例。
import os
from openai import OpenAI
client = OpenAI(
base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1"),
api_key=os.getenv("OPENAI_API_KEY", "EMPTY"),
)
tools = [
{
"type": "function",
"function": {
"name": "run_shell",
"description": "Run a shell command and return its output.",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "The command to run"}
},
"required": ["command"],
},
},
}
]
messages = [{"role": "user", "content": "List the Python files in the current directory."}]
response = client.chat.completions.create(
model="deepreinforce-ai/Ornith-1.0-35B",
messages=messages,
tools=tools,
temperature=0.6,
top_p=0.95,
)
print(response.choices[0].message)Ornith 与智能体测试框架搭配使用的示例:
# Hermes talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="deepreinforce-ai/Ornith-1.0-35B"# Both runtimes load a GGUF build of Ornith (publish one at deepreinforce-ai/Ornith-1.0-35B-GGUF).
# llama.cpp — serve an OpenAI-compatible API on port 8000.
llama-server -hf deepreinforce-ai/Ornith-1.0-35B-GGUF --port 8000 -c 262144
# Ollama — pull and chat with the same GGUF straight from Hugging Face.
ollama run hf.co/deepreinforce-ai/Ornith-1.0-35B-GGUF# OpenClaw talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="deepreinforce-ai/Ornith-1.0-35B"pip install unsloth
# Load Ornith for fast local inference or fine-tuning (Python):
# from unsloth import FastLanguageModel
# model, tokenizer = FastLanguageModel.from_pretrained(
# "deepreinforce-ai/Ornith-1.0-35B",
# max_seq_length=262144,
# load_in_4bit=True,
# )pip install openhands-ai
# OpenHands routes through LiteLLM; the "openai/" prefix selects the OpenAI-compatible path.
export LLM_MODEL="openai/deepreinforce-ai/Ornith-1.0-35B"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"
# Launch the CLI (or run the official OpenHands Docker image with the same env vars).
openhandsOrnith-1.0-35B 针对基于终端的编码代理进行了优化。将任何兼容 OpenAI 的编码命令行界面指向您的 Ornith-1.0-35B 端点(设置 OPENAI_BASE_URL 和 OPENAI_API_KEY),即可理解大型代码库、自动完成繁琐工作并加快交付速度。
# Register your local Ornith endpoint as a provider in ~/.config/opencode/opencode.json:
#
# {
# "$schema": "https://opencode.ai/config.json",
# "provider": {
# "ornith": {
# "npm": "@ai-sdk/openai-compatible",
# "name": "Ornith (local)",
# "options": { "baseURL": "http://localhost:8000/v1", "apiKey": "EMPTY" },
# "models": { "deepreinforce-ai/Ornith-1.0-35B": { "name": "Ornith-1.0-35B" } }
# }
# }
# }
opencode如果您觉得我们的工作有帮助,欢迎引用我们的成果。
@misc{ornith-35b,
title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
url = {https://deep-reinforce.com/ornith_1_0.html},
author = {{DeepReinforce Team}},
year = {2026}
}