Hummingbird Labs

What to Do When AI Coding Tasks Exceed Context Limits or Output Length?

Mon, 25 May 2026 14:42:00 +0800

What to Do When AI Coding Tasks Exceed Context Limits or Output Length?

When I use the Claude Sonnet 4.6 model for programming, I frequently encounter two situations:

A single session’s task exceeds the context limit or output length, causing a significant drop in code quality.
AI generates an excessively long response, triggering “Sorry, the response hit the length limit. Please rephrase your prompt.”

GitHub Codespaces has a very useful feature: you can see the token consumption status for the current session.

My solution is: before exceeding the context limit, open a new session and re-describe the unfinished task.

I strongly recommend never letting a single session’s task exceed the context limit, otherwise it will cause a significant drop in AI code quality. In my own development experience, I’ve encountered similar situations more than once.

Although everyone has the awareness: too long context increases the probability of AI hallucinations. I don’t think it’s caused by the model’s own capabilities; 160K tokens are already sufficient. But another possibility is: it’s caused by Copilot’s context management strategy.

I Asked Gemini 3.5 Flash and Got This Response:

When coding quality drops significantly after exceeding 160K tokens - this is actually a classic “scapegoat” phenomenon between underlying model capabilities and engineering wrappers.

This is mainly caused by GitHub Copilot’s context handling (engineering truncation, retrieval-augmented suppression, etc.), not the limits of the Claude 3.5 Sonnet model itself.

Core Reason: Copilot’s “Context Management” Strategy Claude 3.5 Sonnet itself has a 200K token context window. Under ideal conditions, it maintains extremely high “Needle In A Haystack” retrieval capabilities across the entire 200K range. But inside Copilot, things get complicated:

Sliding Window and Implicit Truncation: To save token costs (API fees) and ensure response speed, Copilot rarely sends the entire 160K+ raw conversation history word-for-word to Anthropic’s servers. It uses special algorithms for summarization, sliding window truncation, or vector retrieval (RAG).

Information Distortion: When your session is extremely long, what Copilot sends to Claude may no longer be the code you originally wrote, but “second-hand context” compressed and refined by Copilot. This engineering processing causes logical gaps in the code the model receives, and code quality naturally snowballs downward.

System Prompt Interference: Copilot injects very heavy system prompts (to constrain its behavior as an IDE programming assistant). When the context is extremely long, the model may experience conflicts in attention allocation between “following the Copilot framework” and “understanding user’s long code”.

My Solution: Let AI Design Phased Tasks, Open New Sessions, and Re-describe New Stage Tasks

Taking code review as an example:

In Session 1: You can first let AI perform a code review and sort the issues to be fixed.
Get the review conclusions first; then open a new Session 2, with the prompt being the review conclusions, and let AI fix the P0 issues.
Open another Session 3, input the code review conclusions again, and let AI continue fixing P1 issues.

This way, you split one code review + bug fix into different sessions, which not only ensures coding quality but also reduces context consumption (reducing your costs).

A Brief About Me

I’ve worked at NetEase Games, Baidu, Tencent (8 years), and Meituan (nearly 7 years), leading large-scale R&D projects and managing teams of 100+ engineers.

Currently, I’m pursuing entrepreneurship in the AI field.

Why? The world runs on uncertainty — staying in corporate roles too long breeds addiction to certainty. Starting an AI venture is like setting sail into uncharted waters.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

Deepseek V4 Pro Price Drop Again on May 23, 2026

Sun, 24 May 2026 09:36:00 +0800

Worried about Deepseek returning to full price in June 2026, but unexpectedly it dropped again yesterday

Here’s the information from Deepseek’s official website: For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.

The deepseek-v4-pro model API pricing will be officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC.

This means: Deepseek V4 Pro will permanently stay at 25% of the original price.

DeepSeek is so generous, I must support them by adding funds. So I happily recharged another 700 RMB (about $100 USD) to DeepSeek. Let’s see how long this $100 credit will last.

Model capabilities still need improvement. Fixing certain UI interaction bugs isn’t necessarily faster than human developers.

I’ve found that on Windows WPF UI, all models (deepseek-v4-pro, Qwen3.6 Plus, Claude Sonnet 4.6, etc.) don’t perform very well. They easily produce compilation errors, runtime errors, and various UI data passing anomalies.

Is it because the Windows tech stack has fewer training materials for models, plus the numerous and complex library versions, leading to poor model performance?

My current solution is: add detailed local logs in debug mode to provide AI with more runtime information.

Static code + runtime logs = complete program information.

Simply letting the LLM review code is not enough. Because the LLM can only judge program behavior based on static code. There are many hidden assumptions here. When AI reviews only a portion of code, it assumes other modules are working correctly.

If you let AI review all code in a project, the context becomes too long, leading to AI hallucinations (I mentioned this in a previous blog).

So I strongly recommend: always add detailed local logs in debug mode to provide AI with more runtime information.

A Brief About Me

I’ve worked at NetEase Games, Baidu, Tencent (8 years), and Meituan (nearly 7 years), leading large-scale R&D projects and managing teams of 100+ engineers.

Currently, I’m pursuing entrepreneurship in the AI field.

Why? The world runs on uncertainty — staying in corporate roles too long breeds addiction to certainty. Starting an AI venture is like setting sail into uncharted waters.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

More on TRAE China Version: Free Models Are Great But Slow

Fri, 22 May 2026 21:16:00 +0800

TRAE China Version: Real Data on Free Model Speed

Let me start with the conclusion: if you use the free models in TRAE China’s version, they will be slower than custom paid models (even when comparing the same model).

The free models are mainly slower in 3 aspects:

When processing large tasks, free models may display a prompt: “Model has reached the maximum number of thinking attempts. Please type ‘continue’ to get more results.” When this happens, you need to manually type “continue” to proceed. As shown in Figure 1.
Calling free models requires queuing; this was already mentioned in the previous blog. As shown in Figure 2. Additionally, during task processing, there’s a chance you’ll need to queue again.
The DeepSeek V4 Pro model has think mode enabled by default, with a thinking depth of around 200, which makes it slower when processing large tasks. To illustrate this point, I recorded the execution time from my own project.

The specific execution time for a task was: begin 11:34 / end 12:24. As shown in Figure 3.

But here’s what I found: the free Qwen3.6 Plus model is significantly faster than the paid DeepSeek V4 Pro model when handling large tasks — and I mean much faster. Moreover, the free Qwen3.6 Plus model has shorter queue times.

Those Are the Real Data and Facts. Now Here’s My Opinion: DeepSeek Is a Great Company, and Its Pricing and Services Truly Benefit the Public

I can understand that DeepSeek is still in development, which is why there are some limitations.

But the technology and services they provide are deeply imbued with a sense of human mission; based on this sense of mission and responsibility, this company is contributing to increasing the average intelligence of humanity.

This sense of mission and responsibility, in my personal view, stems from the founder’s simple beliefs and character. In short, it’s not about money — it’s about serving the people.

A Brief About Me

I’ve worked at NetEase Games, Baidu, Tencent (8 years), and Meituan (nearly 7 years), leading large-scale R&D projects and managing teams of 100+ engineers.

Currently, I’m pursuing entrepreneurship in the AI field.

Why? The world runs on uncertainty — staying in corporate roles too long breeds addiction to certainty. Starting an AI venture is like setting sail into uncharted waters.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

Why I Use TRAE: Free LLMs, Stability, and 1M Token Context

Fri, 22 May 2026 14:37:00 +0800

My Main Reason for Using TRAE: Free Programming LLMs

Yes — the TRAE China version lets you try multiple large models for free. As shown in the screenshot, all of these models are available at no cost for trial. Here’s the full list of free models:

Doubao-Seed-2.0-Code、 Doubao-Seed-1.8、 Doubao-Seed-Code、 MiniMax-M2.7、 MiniMax-M2.5、 GLM-5.1、 GLM-5V-Turbo、 GLM-5、 DeepSeek-V4-Pro、 DeepSeek-V4-Flash、 Kimi-K2.6、 Kimi-K2.5、 Qwen3.6-Plus、 Qwen3.5-Plus、

But here’s the catch: when using these free models, you often need to wait anywhere from 1 to 10 minutes. In my experience, the average wait is around 3 minutes. But honestly — when you’re heading to bed or stepping away for a coffee, waiting 3–10 minutes is perfectly acceptable.

Another thing worth noting: TRAE also supports custom models. You can top up credits directly on DeepSeek’s official platform, or on Alibaba Cloud, then use your API key inside TRAE to call models. As shown below:

My Second Main Reason for Using TRAE: Fewer Freezes and Timeouts During Task Execution

When I previously used Copilot’s LLMs for AI coding, a recurring problem was the model getting stuck on a command, effectively blocking all subsequent tasks.

On TRAE, I encounter far fewer of these situations. Moreover, the entire workflow requires very few manual permission confirmations. This frees up my time and lets me run more tasks in parallel.

In fact, I’m currently juggling 4 projects simultaneously:

TRAE: rendering astronomical survey data into images.
GitHub Codespaces: an offline old-photo AI restoration tool built with C# and WPF on Windows.
Local VS 2026 IDE: a pet costume image generator built with C# and WPF — for example, dressing a puppy in a spacesuit or a kitten in a gothic dress.
Local VS 2026 IDE: deploying LLMs locally on Windows with C# and WPF, and benchmarking model performance across different GPUs and CPUs.

My Third Main Reason for Using TRAE: DeepSeek v4 Pro Supports a 1-Million-Token Context Window

I’ve observed that Claude Sonnet 4.6 and Opus 4.7 both show noticeable code quality degradation once the task context exceeds 168K tokens.

DeepSeek v4 Pro, by contrast, supports a 1-million-token context window. This allows it to maintain consistent code quality even when working on large-scale projects.

My Next Blog: Rendering Astronomical Survey Data into Images

I love astronomy. I love looking at images of the universe. That’s why I built this project. I hope to share it with you soon — I think you’ll enjoy it too.

Beyond Earth lie the stars and the vast cosmic ocean. That is the ultimate destination for humanity.

A Brief About Me

I’ve worked at NetEase Games, Baidu, Tencent (8 years), and Meituan (nearly 7 years), leading large-scale R&D projects and managing teams of 100+ engineers.

Now, I’m building an AI startup.

Why? The world runs on uncertainty — staying in corporate roles too long breeds addiction to certainty. Starting an AI venture is like setting sail into uncharted waters.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

Using Qwen 3.6 Plus: Great but a Bit Expensive

Fri, 22 May 2026 08:22:00 +0800

I Think Qwen 3.6 Plus Has Strong Coding Capabilities, But My Costs Are Higher Than Expected

I compared two approaches: 1、Using Qwen 3.6 Plus to write large-scale C# programs, then having DeepSeek v4 Pro conduct code reviews; 2、Using DeepSeek v4 Pro to write large-scale C# programs, then having Qwen 3.6 Plus conduct code reviews.

I prefer the second approach for these reasons: 1、DeepSeek v4 Pro supports a context length of up to 1 million tokens. For large projects, this helps maintain clear logical connections between modules. Additionally, DeepSeek v4 Pro is currently more affordable (until May 31, 2026, it’s offered at 25% of the regular price—see screenshots in my previous blog). 2、Qwen 3.6 Plus delivers higher code quality but at a higher cost. Using it only for code reviews helps reduce overall expenses.

Below is a partial cost breakdown from my usage of Qwen 3.6 Plus. It might look cheap at first glance: one entry shows 876K tokens costing 1.7 RMB (≈ $0.24). But in practice, completing a single large engineering task often costs 30 RMB (≈ $4.00). The credits I top up on Alibaba Cloud deplete much faster with Qwen than with DeepSeek.

Another important note: Alibaba grants new users 1 million free tokens for many models, as shown below.

But is 1 million tokens truly generous? From my hands-on coding experience: 1 million tokens only cover 1–3 large programming tasks or several code reviews. For heavy AI-assisted coding users, 1 million tokens feel like a 100ml beer—barely a sip.

So, if an article boasts about “burning 100 million tokens,” it likely reflects limited real-world AI coding experience.

To wrap up, I’d like to acknowledge: 1、ByteDance’s TRAE IDE for its innovation; 2、DeepSeek v4 Pro for its generous long-context support and current affordability (I’ll share updated billing data in June); 3、Qwen 3.6 Plus for its strong coding capabilities and responsive API.

In upcoming blogs, I’ll detail how to leverage AI coding within TRAE.

A Brief About Me

I’ve worked at NetEase Games, Baidu, Tencent (8 years), and Meituan (nearly 7 years), leading large-scale R&D projects and managing teams of 100+ engineers.

Now, I’m building an AI startup.

Why? The world runs on uncertainty—staying in corporate roles too long breeds addiction to certainty. Starting an AI venture is like setting sail into uncharted waters.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

DeepSeek v4 Pro, Qwen 3.6 Plus, or Others: Which Should I Use?

Thu, 21 May 2026 19:11:00 +0800

i like deepseek and Qwen

Before May 2026, I had never used DeepSeek, Qwen 3.6 Plus, or any other Chinese LLMs for programming. As readers of my previous blog might recall, I primarily relied on GitHub Copilot’s models, favoring Claude Sonnet 3.6 and Claude Opus 4.7 (a bit pricey—if you’re wealthy, pretend I didn’t say that). My secondary choice was GPT Codex 5.3.

So when I first considered using DeepSeek or Qwen 3.6 Plus, I was skeptical—worried their code quality wouldn’t meet my standards.

I knew strategies like syntax/structure constraints and cross-model code reviews could mitigate risks, but I still wanted the base model’s capability to be as strong as possible.

First Steps with DeepSeek

I started by topping up credits on DeepSeek’s official platform. Proof below:

Over the next few days, I intensively tested DeepSeek v4 Pro. To give you a clear picture, here’s my usage breakdown:

May 17, 2026

Cost: 18.28 RMB (≈ $2.53) Total tokens: 66,488,180 Input (cached): 61,606,016 Input (uncached): 4,193,347 Output: 687,817

May 20, 2026

Cost: 6.61 RMB (≈ $0.92) Total tokens: 38,690,681 Input (cached): 37,049,600 Input (uncached): 1,387,345 Output: 253,736

If I maintain my recent high-intensity AI coding pace with DeepSeek v4 Pro: Daily cost: ~40 RMB (≈ $5.55); Monthly cost (30 days): ~1,200 RMB (≈ $166.50).

Is this cheap? Compared to Copilot Pro+ ($39/month for 1,500 premium requests, e.g., one Claude Sonnet 4.6 call), no.

But compared to Copilot’s post-June 2026 pricing (see my first blog), it’s a bargain.

Important Note:DeepSeek v4 Pro is currently 25% off until May 31, 2026 (see screenshot below). After June, prices will revert to standard rates.

I’ll share updated billing data in a follow-up blog to track real-world costs post-discount.

What’s Next?

In my next post, I’ll analyze Qwen 3.6 Plus’s AI coding costs.

After that, I’ll dive into: 1、Token-saving strategies without sacrificing code quality. 2、Cost-cutting methods that don’t rely on reducing token usage. 3、Balancing affordability and reliability—how to save money while maintaining high code standards.

A Shoutout to Google Gemini

Today, I must praise Google Gemini. When I pasted an image asking for help, it returned a step-by-step guide image—truly impressive!

About Me

I’ve worked at Tencent (8 years), Meituan (7 years), Baidu, and NetEase Games, leading large-scale R&D projects.

Now, I’m building an AI startup—because uncertainty fuels innovation, and corporate roles breed complacency.

Reach out: mailto: HummingbirdLabs@outlook.com.

Let’s discuss AI coding, cost optimization, or the future of LLMs.

GitHub Copilot’s June 2026 Billing Changes: My April and May Statements (Preview) Shocked Me

Tue, 19 May 2026 20:00:00 +0800

I Love GitHub Copilot, But Its June 2026 Billing Changes Worried Me

Let me be clear: I genuinely love GitHub Copilot. As a loyal user and Copilot Pro+ subscriber ($39.00 per month), the 1,500 premium requests per month shown below have been invaluable—it’s the fuel behind my AI-powered coding workflow.

I’d call this the most affordable, seamless token fuel for AI coding available. I’m deeply grateful to Microsoft for this service—though it’s disappointing that starting June 2026, billing will shift to a per-token model.

Using Microsoft’s Preview tool (https://copilot-billing-preview.github.com/), I analyzed my April and May statements. Under the new June pricing: 1、April would cost $141.04 2、May (through May 18) would cost $425.15

This is extremely expensive. Extremely expensive. Extremely expensive. (Yes, I’m repeating it three times.) I understand Microsoft’s pricing strategy—they rely on third-party models and lack full control over upstream LLM costs. Still, the jump is staggering.

What’s Next? Given this, I’ll now explore more affordable large models that can handle complex coding tasks, like DeepSeek v4 Pro and Qwen 3.6 Plus. My next blog will compare their coding capabilities and cost efficiency. For developers deeply reliant on AI coding, tokens should feel as abundant and accessible as rain—not a luxury resource.

Feel free to reach out: mailto:HummingbirdLabs@outlook.com.

reach out to discuss AI coding tools, cost strategies, or stormy billing surprises. P.S. As a former engineer at Tencent (8 years), Meituan (7 years), Baidu, and NetEase Games, I’ve seen tech pricing shifts before. But this one stings.