The Week That Cost £600

You are deep in a Playwright test automation project for a corporate system that has more layers than an onion company’s organogram. The agent session has been running for four hours, refactoring test fixtures, generating page objects, and retrying flaky assertions across three different browser contexts. You glance at your GitHub Copilot dashboard and notice something unexpected: your monthly AI credits are nearly gone, and it is only Tuesday.

The question that keeps developers awake at night is straightforward: at what point does buying a gaming rig and running an open model locally become the cheaper option?

Understanding Copilot Plans, Credits and Costs

Before diving into the numbers, it helps to understand what GitHub Copilot actually offers and how its credit system works in practice.

The Plan Options

GitHub currently offers six Copilot plans:

Plan Monthly Price (UK) AI Credits Effective Credit Value Best For
Copilot Free £0 ~£0.40 ~£0.40 Occasional users, students
Copilot Pro £8 ($10) £12 (1,500 credits) £15 (1,500 credits) Individual developers light usage
Copilot Pro+ £31 ($39) £56 (7,000 credits) £70 (7,000 credits) Power users needing more credits
Copilot Max £80 ($100) £160 (20,000 credits) £200 (20,000 credits) Heavy agentic coders like the author
Copilot Business £15 ($19)/user £15/user (1,900 credits) £15/user + promo Teams needing org-level features
Copilot Enterprise £31 ($39)/user £31/user (3,900 credits) £31/user + promo Large organisations

The key thing to understand is that credits are a currency, not a cap. Each plan includes a monthly credit allowance whose dollar value exceeds the subscription price – Pro gets £12 of credits for £8, Max gets £160 for £80. This subsidy makes even heavy usage feel like a deal, at least until you see how fast agentic sessions burn through them.

What AI Credits Actually Buy You

One credit equals $0.01 USD (approximately £0.008). Crucially, code completions – the tab-completion autocomplete most people associate with Copilot – remain free and unlimited on all paid plans. They do not consume credits at all.

Credits are only consumed by chat interactions, agentic coding sessions, and premium model access. Here is what different credit amounts translate into for agentic coding workflows:

Credits Spent GPT-5.4 nano Output Claude Sonnet 4.6 Output Claude Opus 4.8 Output
£0.40 (40 credits) 32M output tokens 6.7M output tokens 1.6M output tokens
£1.00 (100 credits) 80M output tokens 16.7M output tokens 4M output tokens
£5.00 (500 credits) 400M output tokens 83.3M output tokens 20M output tokens
£10.00 (1,000 credits) 800M output tokens 167M output tokens 40M output tokens

These numbers look enormous until you consider what a single agentic session actually consumes. A moderately complex coding agent step – reading a codebase section, generating refactored code, and writing it back – typically involves 10K input tokens plus 2K output tokens. On Claude Sonnet 4.6 that costs roughly 6 credits per step. Run ten such steps during a focused agentic session and you have spent £0.48. A full day of aggressive agentic work – fifty steps across multiple files – could consume 300 credits (£2.40) in a single day.

This is the maths behind the headline numbers. A Max subscriber burning through 20,000 credits in five days is not an anomaly: it is what happens when you run dozens of agentic coding sessions daily on frontier models across a large corporate codebase. Each session reads files, analyses context, generates code, and writes changes – multiplying token consumption exponentially compared to a simple chat prompt.

The June 2026 Billing Revolution

On 1 June 2026, GitHub Copilot fundamentally changed how it charges for AI assistance. The old Premium Request Unit system was replaced with GitHub AI Credits – a token-based billing model where one credit equals $0.01 USD (approximately £0.008). The base subscription prices remained unchanged, but the economics beneath them shifted dramatically.

Your Copilot Max Allowance

The Copilot Max plan costs $100 per month (approximately £80 for UK subscribers) and includes 20,000 AI credits – comprising 10,000 base credits plus 10,000 flex credits. At face value, this represents $200 (£160) in credit value, effectively subsidising half your usage.

However, the subsidy vanishes quickly when you are running frontier models in agentic sessions. The flex component is also subject to change at GitHub’s discretion, introducing a layer of uncertainty into any cost planning.

The Per-Token Cost Matrix

The critical insight from June’s billing change is that model choice now dominates your entire bill. Here are the published rates:

Model Input per 1M tokens Cached Input per 1M tokens Output per 1M tokens
GPT-5.5 $5.00 (£4.00) $0.50 (£0.40) $30.00 (£24.00)
Claude Sonnet 4.6 $3.00 (£2.40) $0.30 (£0.24) $15.00 (£12.00)
Claude Opus 4.8 $5.00 (£4.00) $0.50 (£0.40) $25.00 (£20.00)
MAI-Code-1-Flash $0.75 (£0.60) $0.075 (£0.06) $4.50 (£3.60)

The spread is staggering. GPT-5.5 output costs 24 times more per million tokens than GPT-5.4 nano. A developer who switches between models without tracking consumption is effectively setting fire to their budget.

What Agentic Coding Actually Costs

For a full-time developer running agentic coding sessions – the kind of workflow where you direct an AI agent to explore codebases, generate tests, and refactor architecture across large repositories – here is what individual operations consume:

Task Token Shape Claude Sonnet 4.6 Claude Opus 4.8
Small bug fix 3K in / 1K out 2.4 credits 4.0 credits
Medium agent step 10K in / 2K out 6.0 credits 10.0 credits
Large repo context pass 80K in / 5K out 31.5 credits 52.5 credits
Heavy agent iteration 250K in / 20K out 105 credits 175 credits
Review-heavy task 100K in / 40K out 90 credits 150 credits

A single heavy agentic iteration with Claude Opus costs 175 credits – that is $1.75 (£1.40) from your monthly allowance for one operation. For complex Playwright test generation across a large codebase, where the agent must repeatedly read test results, analyse failures, modify page objects, update fixtures, and regenerate assertions, you are easily executing dozens of heavy iterations per session.

The developer experience that prompted this analysis confirmed the mathematics in practice: 20,000 credits consumed within five working days of full-time agentic development. That is approximately 4,000 credits per day, or roughly £32 per day solely for AI assistance on top of the base subscription.

At that burn rate, the 20,000 credits last one week. The remaining three weeks require either purchasing additional credits at $0.01 each or accepting blocked usage depending on organisational policy. The realistic monthly cost for this developer, running Claude Sonnet and Opus models in agentic sessions, is approximately £560 ($700) per month.

To put that in perspective: the AI assistance costs nearly seven times the base subscription price.

A Broader Developer Experience

My experience is not unique. Since the June 1 billing change, developers across Reddit, X, and GitHub forums have documented a wide spectrum of outcomes – from those who barely notice the change to others whose bills have inflated tenfold. The TechTimes reported projected cost increases of 10x to 50x for power users running agentic coding sessions. On Reddit, one developer estimated their company’s Copilot bill would jump from $29 (£23) per month to nearly $750 (£600) per month, while another projected $50 (£40) to around $3,000 (£2,400). GitHub’s own community FAQ thread accumulated 435 comments with 904 downvotes – one of the most lopsided reactions in the forum’s history.

Septim Labs published a detailed calculator analysing three representative developer profiles using the Copilot Pro plan (£8/month / $10 per month, 1,000 credits included), which provides useful comparative benchmarks even for Max subscribers:

The Light User – A Non-Event

A developer running 150 chat sessions per month on GPT-5 mini (the cheapest model) at 800 input and 400 output tokens each consumes just 15 credits total – roughly 1.5% of the Pro plan’s 1,000 credit allotment. For this developer, the June change is invisible. This profile represents the majority of Copilot’s user base by most estimates: tab-completion-heavy users who ask occasional questions and rely on the free autocomplete feature that remains unlimited across all plans.

The Moderate User – Manageable with Care

A developer mixing daily chat with four agentic sessions per week on Claude Sonnet and eight code reviews monthly uses approximately 192 credits – 19% of their allowance. The weekly agentic work stays well within budget because the session frequency is low enough to monitor. However, add just two more agentic sessions per week and this profile crosses into the heavy category entirely.

The Heavy User – Where Credits Become a Ceiling

A developer using agentic techniques against a complex codebase – directing Copilot to explore codebases, generate tests, refactor architecture across large repositories with daily Sonnet sessions, chats, Opus brainstorm sessions for architectural decisions, and team pull-request reviews – consumes a lot of credits. Scale this to a team and the numbers become unsustainable very quickly.

This is the scenario I lived through: using agentic techniques to work on a complex corporate codebase, getting through 20,000 credits in a single week of full-time development with Claude Sonnet and Opus models powering the agent interactions within Visual Studio Code.

The critical insight from these profiles is that there is no universal answer to whether Copilot remains cost-effective after June 2026. It entirely depends on your workflow profile. Light users save money compared to pre-June because completions are still free and chat on cheap models costs pennies. Heavy agentic users face a fundamentally different product – one where every interaction has a visible token cost and the safety net of unlimited usage is gone.

GitHub’s own product team acknowledged this transformation. Mario Rodriguez, Chief Product Officer, wrote that “Copilot is not the same product it was a year ago.” On Microsoft’s most recent earnings call, CEO Satya Nadella declared that every per-user business at Microsoft – whether productivity, coding, or security – would become a per-user and usage-based business.

The Local Alternative: Hardware Upfront, Pennies Ongoing

The counter-proposal from the open-source camp is straightforward: buy the hardware, run the models locally, pay nothing per token thereafter.

The Hardware Investment

An NVIDIA RTX 4090 with 24GB of VRAM is the minimum viable GPU for running quantised versions of capable coding models locally. Here is the UK pricing as of June 2026:

Component Cost
NVIDIA RTX 4090 24GB £1,600-1,800
System upgrades (CPU/RAM/PSU if needed) £300-500
Total one-time investment £1,900-2,300

The RTX 4090’s 24GB VRAM is the critical specification. It can run Qwen 3.6 at Q4 quantisation (requiring approximately 18-20GB VRAM) with room for context windows, or comfortably fit smaller variants at higher quantisation levels with significant headroom for extended context.

The Software Stack

Ollama provides the local inference server, completely free and open-source. The Qwen models are similarly free under their open licence. The Cline extension for Visual Studio Code routes your agentic coding requests to the local Ollama instance instead of GitHub’s servers.

Every token processed costs nothing beyond electricity. A gaming PC running a 32B model locally might draw an additional 300-400 watts under sustained load. At UK electricity rates of approximately £0.25 per kWh, running this hardware for eight hours daily costs roughly £15 per month.

The Capability Question

This is where the debate becomes genuinely interesting. A locally run Qwen 3.6 model, while impressive and rapidly improving, does not match Claude Opus 4.8 or GPT-5.5 in reasoning capability. There is a real capability gap between open-weight models running on consumer hardware and frontier models that have hundreds of billions of parameters.

However, Qwen’s development has been steep. For many coding tasks – particularly those within familiar codebases where context window retention provides significant advantage – the local model can be surprisingly effective. The developer in this analysis found that for Playwright test generation on known systems, the local model handled routine patterns well while reserving Copilot sessions for genuinely complex reasoning tasks.

The Breakeven Calculation

This is the number every developer wants to know: when does the hardware investment pay for itself?

Scenario Analysis

Monthly Copilot Spend Breakeven Period Monthly Savings After Breakeven
£200/month (light agentic use) 12.5 months £185/month
£400/month (moderate agentic use) 5.9 months £385/month
£560/month (heavy agentic use - author’s experience) 4.1 months £545/month
£800/month (extensive Opus usage) 2.9 months £785/month

The calculation assumes:

  • Hardware cost of £2,300 (upper estimate including system upgrades)
  • Monthly electricity cost of £15 for local model inference
  • Continued Copilot Pro base subscription of £10/month for completions and lightweight tasks
  • No deprecation cost for the RTX 4090 (it retains value as a general-purpose GPU)

The Capability Adjustment

The table above assumes equivalent capability between local and cloud models, which is not quite accurate. If the local model handles only 70% of tasks effectively – requiring Copilot fallback for the remaining 30% – the numbers change:

Monthly Copilot Spend (full) Adjusted Copilot Cost (30% fallback) Breakeven Period
£200/month £60 + £10 = £70 30 months
£400/month £120 + £10 = £130 19 months
£560/month £168 + £10 = £178 13.6 months
£800/month £240 + £10 = £250 9.6 months

This adjustment is where the decision becomes genuinely personal. If your work involves complex reasoning across unfamiliar domains, the capability gap matters more. If you are working within established codebases – which describes much enterprise software development – the local model’s familiarity with your patterns becomes a genuine advantage.

The RTX 4090 Retention Factor

An important consideration often omitted from this calculation is that the RTX 4090 is not a sunk cost. It retains significant resale value and serves general-purpose GPU workloads beyond AI inference: video editing, rendering, machine learning experimentation, and potentially future model runs as open models grow more efficient.

If the GPU retains 50% of its value after two years (a reasonable assumption given the GPU market trajectory), the effective hardware cost becomes £950-1,150 rather than £1,900-2,300. This shifts breakeven forward by approximately six months across all scenarios.

The Hidden Costs Nobody Talks About

Context Window Economics

One advantage of local models that does not appear in any pricing table is context continuity. When running Qwen locally via Ollama, the entire conversation history, codebase analysis, and architectural decisions remain in your GPU’s VRAM – free, instant, and always available. Cloud agentic sessions accumulate token costs precisely because each interaction requires re-transmitting context or paying for cached context windows.

A single heavy agent iteration (250K input tokens) with Claude Opus costs 175 credits. Run the same operation locally and the marginal cost is zero. For developers running dozens of such iterations daily – as is typical in agentic workflows – this is not a marginal saving.

The Model Auto-Selection Problem

Copilot’s new billing introduces another subtlety: model auto-selection. Without explicit model controls, the interface may route requests to higher-cost models when cheaper alternatives would suffice. A developer focused on writing code rather than monitoring credit burn rates might easily run frontier models on tasks that a lightweight model could handle adequately.

The local approach eliminates this problem entirely. You choose the model, it runs locally, and there is no incentive to downgrade because the marginal cost is identical regardless of model size.

The Energy Externalities

Running a RTX 4090 under sustained AI load consumes approximately 300-400 watts additional to your baseline system draw. For eight hours of daily agentic coding, this adds approximately 72-96 kWh monthly – roughly £18 at UK rates. While not free, this is trivially small compared to the £200-£600 monthly Copilot surcharge it replaces.

From an environmental perspective, a home GPU’s additional draw compares favourably to the energy consumption of cloud data centres processing equivalent inference workloads for thousands of developers simultaneously. The per-inference efficiency of local GPU inference remains superior for regular users.

What This Means for Different Developer Profiles

The Hobbyist Weekend Coder

If you experiment with AI assistance for a few hours weekly, copilot subscription alone is your optimal path. The hardware investment cannot justify itself on 5-10 hours of monthly usage. Stick with Copilot Pro’s 1,500 credits and use lightweight models aggressively.

The Full-Time Enterprise Developer

For developers working eight hours daily on complex corporate systems – the scenario this analysis describes – the math overwhelmingly favours local inference for the majority of work. Even accounting for capability gaps, running Qwen locally via Ollama for routine coding tasks while reserving Copilot Max for genuine frontier-model requirements represents the rational economic choice.

The Small Team (2-10 developers)

For teams, the calculation shifts slightly. Shared infrastructure costs are lower per person, but the collective credit burn across multiple agentic sessions can be substantial. A team of five full-time developers each burning £560/month on Copilot represents £2,800 monthly – approximately £33,600 annually. The hardware investment for five RTX 4090 systems would be approximately £10,000-12,000, paying for itself in under four months at these usage levels.

The Hybrid Approach That Makes Sense

After three months of daily comparison between GitHub Copilot’s Claude Sonnet/Opus agentic sessions and local Qwen via Ollama with Cline, the author’s recommended approach is neither pure cloud nor pure local:

Route by complexity. Use lightweight models (MAI-Code-1-Flash or GPT-5.4 nano) for routine completions and simple chat within Copilot – these consume so few credits they are effectively free. Reserve Claude Sonnet 4.6 for moderately complex agentic tasks where its reasoning advantage matters. Deploy Claude Opus 4.8 only when the task genuinely requires frontier-level reasoning – architectural decisions, complex algorithm design, or debugging deeply intertwined systems.

Simultaneously, run Qwen locally via Ollama for all routine coding, testing patterns, code review suggestions, and documentation generation. The Cline extension makes this routing seamless – your agentic workflow continues identically; only the model backing it changes.

This hybrid approach reduced the author’s effective Copilot costs from approximately £560/month to roughly £80-120/month while maintaining development productivity. The local model handles perhaps 60-70% of coding tasks adequately, and for the remaining work, you pay only for the lightweight models that are genuinely cost-effective.

The Trajectory That Matters

The most important variable in this calculation is not today’s prices but tomorrow’s trajectory.

Open-weight models are improving rapidly. Qwen’s development has been particularly aggressive, with each iteration closing the capability gap with frontier proprietary models. A 32B model in late 2026 may rival a 72B model from early 2026, and the trend suggests continued convergence.

Meanwhile, GitHub’s flex credit component is subject to change at Microsoft’s discretion. The current subsidy (where Max plan holders receive $200 (£160) of credits for $100 (£80) monthly) may not persist indefinitely. If flex credits are reduced or eliminated, the effective cost of agentic coding on Copilot rises proportionally.

The breakeven calculation that looks like 12 months today could become six months tomorrow if GitHub reduces included credits. The hardware investment becomes more attractive with each passing month as local models improve and cloud pricing remains fixed or increases.

Data Governance: The Hidden Cost Nobody Talks About

Beyond the spreadsheet numbers lies a factor that matters enormously for enterprise developers – data governance, intellectual property protection, and compliance risk. This is where local AI’s advantage is not just economic but structural.

The Cloud Data Problem

When you use GitHub Copilot with Claude Sonnet or Opus, every line of code your agent reads, every proprietary API specification your session analyses, and every architectural decision discussed gets transmitted to a cloud server operated by Anthropic and processed in their infrastructure. Your codebases are sent across the internet. Sensitive internal systems pass through frontier model providers’ data centres.

This is not theoretical – it is an inevitability of how LLM inference works. The model needs your context. For a developer working on proprietary software, confidential client systems, or regulated financial infrastructure, every agentic session represents a potential data exfiltration vector:

  • Training data concerns: Even if Anthropic states they do not train on Max plan data, their terms can change at any time. Once you have shipped it across the internet, control is no longer yours.
  • Audit trail gaps: Cloud AI interactions leave opaque logging trails. When a regulated organisation needs to know exactly what data touched which systems, proprietary AI vendors provide black boxes.
  • Compliance exposure: GDPR, HIPAA, SOC 2, PCI-DSS – each compliance framework has specific requirements about where personal and sensitive data can reside. Sending code containing customer identifiers, internal architecture diagrams, or infrastructure configurations to cloud AI providers may violate these obligations depending on jurisdiction and industry sector.
  • Supply chain risk: Cloud AI adds another vendor to your supply chain. If Anthropic experiences an outage, a breach, or policy change affecting your Copilot access, you have zero control over the resolution timeline.

The Local Model Advantage

A locally run model via Ollama on your own hardware has zero data exfiltration risk by design. Your code never leaves your network. Your architecture diagrams, API specifications, and business logic remain entirely under your control. There is no vendor term that can change this – it is a fundamental property of running inference on your own GPU.

For enterprise developers working on complex corporate systems with sensitive infrastructure, compliance requirements, or confidential client data, this is not a marginal benefit. It is decisive.

No amount of cost savings justifies sending proprietary source code across the internet to an external provider – and local AI delivers both governance certainty and economic sense simultaneously.

The Verdict

For the full-time developer doing complex agentic coding work, using Claude Sonnet and Opus models within GitHub Copilot at the usage levels this analysis describes, the NVIDIA RTX 4090 hardware investment pays for itself in under five months – and potentially in under four months when the resale value is factored in.

The hybrid approach – local Qwen via Ollama for the majority of work, selective Copilot usage for tasks requiring frontier models – delivers the best of both worlds: the capability of frontier AI where it matters combined with the economics of local inference everywhere else.

The weeks that cost £600 do not need to define your relationship with AI assistance. The hardware sits on the shelf ready to be plugged in. The software is free and waiting. The question is simply whether you will keep renting intelligence or start owning it.

Comments