Using GitHub Copilot Affordably in Neovim

Feb 3, 2026 · updated Feb 3, 2026

Translation Notice: This article was translated from the original by LLM and may contain inaccuracies. Please refer to the original article for the most accurate content.

Using AI to assist in programming is no longer a novel topic. From GitHub Copilot at the beginning of last year to more recent tools like Cursor and Claude Code, which lean towards Agent-based architectures, AI and “vibe coding” have become increasingly prevalent in software development.

Requirements

As a long-time Neovim enthusiast, I have a few specific needs:

I don’t fully trust delegating code entirely to an AI via Agent architectures for automated generation.
- I’ve tried it, but it ultimately requires heavy code reviews, and the sense of control over the codebase is significantly diminished.
However, I strongly desire to leverage AI for assistive tasks:
- For instance, code completion, snippet generation, and documentation.
- Even some simple Agent functionalities, such as tool-calling (MCP, function calling, etc.) within a chat session to help me quickly understand repository structures or write scoped code. My role would be to manually approve tool calls and interpret its relatively concise responses (unlike vibe coding, which requires massive effort to grok the generated output).
I refuse to leave my “cyber-worn,” heavily seasoned Neovim environment for my primary languages.
- I have invested years into configuring various plugins, LSPs, and debuggers.
- I don’t want to switch to VSCode or other IDEs just for AI assistance, even if they offer official support.

Naturally, these needs could be met by purchasing APIs from major model providers, but that presents a significant hurdle:

Cost. If used frequently for chat and tool calling, the expenses can skyrocket.

Admittedly, cost is the only real issue, but it’s sufficient to deter me from using those raw APIs. For GitHub Copilot, however, student discounts provide unlimited access to base models like GPT-4.5GPT-4.5 and a generous quota for premium models.

Billing Model

The keyword here is “request quota.” GitHub Copilot’s billing model differs from typical provider APIs. While providers like OpenAI or Anthropic usually charge based on token count, GitHub Copilot (for individuals/students) effectively operates on a “per-request” basis (refer to Billing Standards). This means for complex tasks—like code completion or snippet generation—Copilot’s cost is much lower than token-based APIs, which might consume thousands of tokens per turn. (Of course, I wouldn’t use a coding-oriented Copilot account for casual life chats).

In this “per-request” framework, my goal is to use GitHub Copilot in Neovim to balance utility (accessing high-end models like Claude-Opus-4.5 or GPT-5.2-codex for complex problems) with cost-efficiency.

The Solution

While GitHub does not provide an official Copilot Chat plugin for Neovim (only the autocomplete plugin exists, with community lualua implementations), the community has stepped up with excellent alternatives like CopilotChat.nvim and avante.nvim. These plugins generally follow a consistent approach:

Use public GitHub OAuth authentication to log into Copilot services.
Reverse-engineer the Copilot Chat API used in VSCode to implement a Neovim client.

Since these plugins aim to support multiple backends, they are abstracted into:

Frontend: Neovim UI.
Adapter Layer: An abstracted structure for sending and receiving messages.
Backend: Specific providers (OpenAI, Anthropic, GitHub Copilot, etc.).

This architecture allows for flexibility and code reuse; once the Copilot Chat API is reversed, as long as an OAuth token is obtained, the adapter and frontend logic remain the same.

However, because Copilot Chat’s billing model (per-request) differs from standard APIs (per-token), specific issues arise.

Issues

I personally prefer using CopilotChat.nvim as my client due to its clean design, restrained functionality (focusing on coding assistance rather than being a full agent), and support for the latest tool-calling APIs.

Dynamic Model Selection

The first issue involves “Dynamic Model Selection.” As a hub for various models, GitHub introduced this feature in the latest Copilot Chat API to encourage the use of advanced models while managing server load (see Auto Model Selection). Its current behavior is:

The user selects the AutoAuto option in model selection.
The server evaluates current usage and task complexity to return a specific model (e.g., GPT-5.2, Claude-Sonnet-4.5).
The client uses this model for that specific session request.

Remark

GitHub has stated that this routing will become increasingly intelligent, evaluating task complexity to select the most appropriate model. In the source code of the latest vscode-copilot-chat, we can already see AutoModeRouterUrlAutoModeRouterUrl configurations marked as advanceadvance or experimentalexperimental (source), along with related logic and issues. Since these are experimental and subject to change, we will set them aside for now.

Remark

This behavior is unique to GitHub Copilot. Standard API providers don’t dynamically schedule models like this. For Copilot users, this feature is a vital “cost-saving” mechanism because premium models often complete tasks in fewer turns, and diverted premium models are billed at a discount compared to explicitly selecting them.

Analysis and Solution

Because standard APIs lack this feature, community Neovim plugins often overlook it, preventing users from benefiting from dynamic model selection.

The fix is simple: patch the plugin. For the frontend, I suggest mirroring VSCode’s behavior by adding autoauto as a model option. For the backend, we need to analyze VSCode’s network traffic to understand the request flow.

Warning

Reverse-engineering “Dynamic Model Selection” may violate GitHub Copilot’s Terms of Service. Please assess the risks yourself. This article is for technical research and educational purposes only and does not constitute legal advice.

Warning

Capturing VSCode packets is straightforward: set Http: ProxyHttp: Proxy to a tool like Charles, Fiddler, or Proxyman, and set Http: Proxy SupportHttp: Proxy Support to onon.

Through packet analysis, we see the workflow (excluding daemon requests):

First, a POSTPOST request to /models/session/models/session with the body:
{

"auto_mode": {

"model_hints": ["auto"]

}

}
{

"auto_mode": {

"model_hints": ["auto"]

}

}
{

"auto_mode": {

"model_hints": ["auto"]

}

}
{

"auto_mode": {

"model_hints": ["auto"]

}

}
The response looks like:
{

"available_models": ["gpt-4.1", "gpt-4o", "claude-sonnet-4.5", ...],

"selected_model": "claude-haiku-4.5",

"session_token": "xxx",

"expires_at": 1770013986,

"discounted_costs": { "claude-haiku-4.5": 0.1, ... }

}
{

"available_models": ["gpt-4.1", "gpt-4o", "claude-sonnet-4.5", ...],

"selected_model": "claude-haiku-4.5",

"session_token": "xxx",

"expires_at": 1770013986,

"discounted_costs": { "claude-haiku-4.5": 0.1, ... }

}
{

"available_models": ["gpt-4.1", "gpt-4o", "claude-sonnet-4.5", ...],

"selected_model": "claude-haiku-4.5",

"session_token": "xxx",

"expires_at": 1770013986,

"discounted_costs": { "claude-haiku-4.5": 0.1, ... }

}
{

"available_models": ["gpt-4.1", "gpt-4o", "claude-sonnet-4.5", ...],

"selected_model": "claude-haiku-4.5",

"session_token": "xxx",

"expires_at": 1770013986,

"discounted_costs": { "claude-haiku-4.5": 0.1, ... }

}
After receiving the selected_modelselected_model and session_tokensession_token, the client sends a POSTPOST to /chat/completion/chat/completion. The headers include the session_tokensession_token (though tests show it works without it), and the body specifies the model:
{

"model": "claude-haiku-4.5",

...

}
{

"model": "claude-haiku-4.5",

...

}
{

"model": "claude-haiku-4.5",

...

}
{

"model": "claude-haiku-4.5",

...

}
The server returns the AI’s response, including any tool calls.

By replicating this sequence—requesting /models/session/models/session during initialization and caching the selected_modelselected_model—we can support dynamic selection in Neovim. I have submitted this modification as a Pull Request to CopilotChat.nvim, which has been merged.

Tool Calling

The second issue concerns Tool Calling. This mechanism allows AI to execute external tools (like searching code or running scripts). While supported in VSCode and the latest Copilot API, I noticed a discrepancy: why does using tools in CopilotChat.nvim consume significantly more quota than in VSCode?

Back to the VSCode source code. The logic resides in toolCallingLoop.tstoolCallingLoop.ts. The plugin uses a boolean to track the request initiator:

/** Indicates if the request was user-initiated */

userInitiatedRequest?: boolean;

/** Indicates if the request was user-initiated */

userInitiatedRequest?: boolean;

/** Indicates if the request was user-initiated */

userInitiatedRequest?: boolean;

/** Indicates if the request was user-initiated */

userInitiatedRequest?: boolean;

It checks this value to decide whether to bill the request:

User initiates a request: userInitiatedRequestuserInitiatedRequest is truetrue, and it counts against the quota.
AI returns a tool call, the user confirms/executes it, and the results are sent back: userInitiatedRequestuserInitiatedRequest is falsefalse, and the request is not billed.

This allows VSCode users to only pay for their own prompts, not the internal overhead of tool calling.

This mechanism is absent in token-based APIs because they don’t distinguish between user and agent requests—everything is just tokens. Consequently, community plugins often miss this, causing all tool-related turns to burn user quota.

Remark

During my research, I found a Zhihu article on this. At first, the tone sounded like AI-generated fluff, but upon closer inspection, it cited specific OpenCodeOpenCode implementations and verified technical details. It goes to show that while AI-generated content can feel unreliable, if you focus on the results rather than the tone, even “misleading” information can lead to “enlightenment.”

Remark

In the final /chat/completion/chat/completion request, userInitiatedRequestuserInitiatedRequest is passed in the x-initiatorx-initiator header. When x-initiatorx-initiator is useruser, it counts; when it is agentagent (for tool result returns), it does not.

Implementing this in CopilotChat.nvim was simple: detect if a request is a tool response and set the x-initiatorx-initiator header accordingly. This fix was merged in PR 1520.

Back to all articles