Hy3 Preview: Tencent’s Base-Model Play Built For The Larger Ecosystem

A user on Little Red Book recently asked Yuanbao, Tencent’s AI chatbot: "I always feel lonely. What should I do?" The response was not a list of coping strategies. The model spent two seconds — visible in its reasoning trace — calibrating an empathetic tone and leaving the conversation open. The post went viral. What made it notable was not that an AI had learned to be nice, but that an AI product had learned to behave in a way its users actually wanted. That distinction — between model capability and product fit — is what Hy3 Preview is designed to solve, and it points to Tencent's broader base-model strategy.

In February 2026, Tencent tore down its pre-training and reinforcement-learning infrastructure and rebuilt both from scratch. Six weeks later it began training Hy3 preview. Ten weeks after that, it went live. The rebuild was guided by three principles: capability systematisation (refusing to let any model "specialise" its way out of product usefulness), evaluation authenticity (testing against real tasks, not leaderboards), and cost-performance (co-designing model and inference framework so capability gains do not price the model out of deployment). The 90-day timeline is impressive. These three principles explain how it was possible.

The Deliberate Choice Not to Go Bigger

Hy3 Preview runs a mixture-of-experts architecture that Tencent describes as a fusion of fast and slow thinking: 294 billion parameters total, 21 billion activated per forward pass, routing routine queries to quick pattern-matching experts and complex problems to deeper reasoning chains. Tencent claims this architecture delivers substantially more reasoning capability than its predecessor, Hy2.0, at a fraction of the compute per query. The 300B range is not a compromise. It is a deliberate ceiling — beyond roughly one trillion parameters, multi-node deployment erodes latency and throughput faster than marginal capability gains justify.

Built With Its Products, Not Just For Them

What makes Hy3 different is how it was made. The model team merged with the Yuanbao, WorkBuddy, CodeBuddy, ima, and QQ Browser product teams into a single development loop, with live product metrics shaping training priorities directly. Tencent calls the result "the feeling of a living person." As the Yuanbao team described it:

Both teams jointly optimised training data — fine-tuning the model's writing style, emotional intelligence, content organisation and subject matter depth. The result is an interaction that feels smarter and more genuinely human.

Ready to Learn, Not Just Ready

Tencent is treating the release as part of the training process — integrating Hy3 preview into Yuanbao, CodeBuddy, WorkBuddy and other products and exposing it to real queries at scale. Instead of a linear pipeline, Hy3 preview runs in a loop: deployment generates feedback, feedback drives optimisation, optimisation improves deployment.

What does ready mean for a 294B model built in 90 days? It looks more like ready to learn than plain ready. If future AI competition comes down to which players can most efficiently convert product interaction into training, Tencent’s integration of products and models gives it a data flywheel that few competitors can match.

Reliability Over Rankings

Tencent has chosen to compete on different terrain from most AI releases — not leaderboard scores, but reductions in factual errors, improvements in instruction following, and better handling of real-world queries. Benchmarks capture what a model can do under ideal conditions. Real users send incomplete requests, contradictory instructions, fragmented context. It's the lower bound — what the model does when things get messy — that determines whether it gets trusted.

In a workplace scheduling task, Hy3 preview was given meeting minutes with implicit start dates, leave arrangements, and overtime requirements scattered across multiple exchanges. It produced a correct, executable schedule without guessing. In a multi-day travel task, it handled cross-day budgets, opening hours, and deduplication simultaneously — without speculative reasoning. A model that resists inventing answers when information is incomplete is not a benchmark achievement. It's what reliability looks like in practice.

The early numbers are specific. In CodeBuddy and WorkBuddy: latency down 54%, end-to-end duration down 47%, success rate above 99.99%, agent workflows of up to 495 steps stable in production. On the benchmark side, Hy3 preview reportedly scored highest domestically on the Tsinghua University mathematics PhD qualifying exam , posted competitive results on SWE-Bench Verified (coding agents) and BrowseComp (search agents), and outperformed comparable open-source models on Tencent's internal ClawEval agent-evaluation framework. They are what a base-model approach looks like when it ships.

Which brings it back to the woman on Little Red Book. In two seconds, before writing a single word, the model assessed her emotional state, chose a tone, and planned how to leave the conversation open. That's what the Yuanbao team means by more genuinely human interaction. It's also what the engineers mean when they say demand understanding showed the biggest improvement of any metric tested.

Tencent has calibrated Hy3 preview to respond like a friend — present, practical, conversationally fluent. As models converge on baseline capability, tone and interaction quality become real differentiators. For a company built on social platforms, that alignment is structural. That is: optimising for emotional attachment is a different proposition from warmth that emerges naturally.

In a work scenario, Hy3 preview was given a set of meeting minutes in which start dates, vacation schedules, and overtime requests were scattered across multiple exchanges. It generated an accurate, actionable schedule without any guesswork. For a multi-day business trip, it simultaneously handled cross-day budgets, business hours, and deduplication—all without speculative reasoning. A model that is willing to refuse to answer when information is incomplete is a tangible demonstration of reliability.

Tencent has open-sourced the model on GitHub and Hugging Face, priced API access at roughly a tenth of GPT-4-class rates, and framed the release as the first step, not the final form. Chief AI Scientist Shunyu Yao says the team is " exploring non-homogeneous capabilities " — features shaped by specific products and users, not better versions of what every model can do. The loop is running. Whether it can keep accelerating is the question.

Hy3 Preview: Tencent’s Base-Model Play Built For The Larger Ecosystem

The Deliberate Choice Not to Go Bigger

Built With Its Products, Not Just For Them

Ready to Learn, Not Just Ready

Reliability Over Rankings

Read Next