GPT-2 Announced
OpenAI announces GPT-2, a large language model that generates coherent paragraphs of text, but withholds the full weights due to safety concerns.
"This was the spark. It demonstrated that simply scaling up parameters and data (1.5B parameters) resulted in emergent capabilities."
Deep TabNine (GPT-2)
APPS Benchmark Released
GitHub Copilot Technical Preview
GitHub Copilot technical preview announced by GitHub and OpenAI.
"Copilot is the moment coding assistance moves from "tool" to "companion". Shipping inside the editor rewires habits."
Codex & HumanEval Paper
Codex + HumanEval introduced in "Evaluating Large Language Models Trained on Code" (arXiv:2107.03374).
"Two things land at once: a model that can write real code, and a yardstick that makes progress legible."
OpenAI Codex API Beta
OpenAI Codex API private beta announced (natural language → code).
"This turns code generation from a demo into an ingredient. APIs are how ideas escape labs and become defaults."
AlphaCode (DeepMind)
GitHub Copilot GA
GitHub Copilot becomes generally available (paid subscription).
"GA means the weirdness is over: it's now normal to pay for a machine pair programmer."
Amazon CodeWhisperer Preview
ReAct Paper (Reason+Act)
ReAct (Reason + Act) introduced (arXiv:2210.03629).
"ReAct is the blueprint for agents: think, do, observe, repeat. It's the moment "chatbot" starts becoming "worker"."
Replit Ghostwriter
ChatGPT Launch
OpenAI launches ChatGPT (GPT-3.5). Chat-driven programming workflows take off.
"ChatGPT makes prompting a new programming interface. "Chat-Driven Development" becomes a cultural default."
Cursor IDE (Alpha)
Anysphere launches Cursor, a fork of VS Code designed to be "AI-native".
"The Editor itself must evolve. Cursor proved that plugins are insufficient for true agentic integration by indexing the local filesystem."
GPT-4 Released
GPT-4 released with major jump in coding + reasoning capability.
"The leap isn't just better code—it's fewer hallucinated steps and more coherent plans. That's what makes multi-step coding feasible."
GitHub Copilot X
GitHub Copilot X announced (chat + PRs + GPT-4-powered experience).
"This is the pivot from inline suggestions to end-to-end workflow help. It starts to own outcomes, not keystrokes."
ChatGPT Plugins
ChatGPT plugins announced (tool use becomes mainstream).
"Tools are what turn language into leverage. The moment models can call systems, "write code" becomes "ship work"."
AutoGPT Released
StarCoder Released
StarCoder released (open-access code LLM) by ServiceNow and Hugging Face.
"Open models are how a category gets commoditized. StarCoder makes "coding LLM" something builders can actually own and modify."
OpenAI Function Calling
Function calling added to OpenAI API models.
"This is the missing primitive for agents: structured intent. It makes "call the tool" a dependable move instead of a prompt hack."
Code Llama Released
Code Llama released (coding-focused LLM family) by Meta.
"Another major lab shipping code models accelerates commoditization. When supply increases, people stop rationing usage."
SWE-bench Introduced
SWE-bench introduced (real GitHub issues as software-engineering eval).
"This is the benchmark that forces honesty. Real issues collapse the gap between "writes code" and "fixes bugs"."
v0.dev by Vercel
Assistants API
Assistants API announced at OpenAI DevDay.
"The SDK moment: threads, tools, and retrieval become off-the-shelf. Platforms win when they turn hard problems into defaults."
Copilot Chat GA
GitHub Copilot Chat becomes generally available.
"Chat turns the assistant into a place you go, not a thing that occasionally interrupts you."
Gemini 1.5 Pro
Google releases Gemini 1.5 Pro with a 1-2 million token context window.
"Infinite Context changes the architecture of coding tools. Instead of complex RAG, you can now dump the entire codebase into the prompt."
Devin (Cognition)
Devin announced as an autonomous AI software engineer; evaluated on SWE-bench.
"Devin is the marketing name for a real transition: agents that own a backlog item end-to-end. The bar just moved."
SWE-agent Released
SWE-agent released with strong SWE-bench results.
"This is the proof that the loop works: read repo, plan edits, run tests, patch. It's less magic, more engineering."
Copilot Workspace Preview
GitHub Copilot Workspace technical preview.
"Workspace is "Spec to PR" as a product. It's an admission that the unit of work isn't a function—it's a task."
Claude 3.5 Sonnet
Claude 3.5 Sonnet released with "Artifacts" workflow.
"This is a quality jump that shows "coding model" isn't a monoculture. Multiple frontier models means assistants become a layer you can swap."
DeepSeek Coder V2
DeepSeek releases Coder V2, the first open-weights model to rival GPT-4 Turbo in coding.
"State-of-the-art coding intelligence becomes a commodity. This fuels the "Local AI" boom and puts pressure on paid API pricing."
SWE-bench Verified
SWE-bench Verified released (human-verified solvable subset).
"Verified eliminates the "maybe it's impossible" excuse. Once the target is clean, leaderboards become meaningful—and investment follows."
Replit Agent
Replit integrates an autonomous agent that can plan, scaffold, and deploy full-stack apps with database provisioning.
"App development becomes accessible to non-coders on mobile devices. The "Hosting Environment" meets the "Agent"."
OpenAI o1-preview
OpenAI releases o1 (Strawberry), the first "Reasoning" model.
"Agents gain the ability to "Plan" effectively, reducing logic bugs in complex systems. It marked the shift from "fast thinking" to "slow thinking"."
Qwen 2.5 Coder
Alibaba Cloud releases Qwen 2.5 Coder. The 32B model brings GPT-4o level coding to consumer hardware.
"Local AI coding becomes viable for serious work. Privacy-conscious developers get a powerful open alternative to cloud APIs."
OpenAI Canvas
OpenAI introduces Canvas, a collaborative interface for coding and writing.
"The "Chat" interface is officially insufficient for complex work. The industry shifts toward "Artifacts" and "Canvas" styles of interaction."
Lovable
Bolt.new
Claude "Computer Use"
Claude "computer use" capability introduced.
"GUIs are the surface area of the world. If an agent can click and type like a human, it can work anywhere without an API."
Copilot Multi-Model
GitHub Copilot adds multi-model choice (Claude + Gemini + OpenAI models).
"This is the "browser wars" moment for coding models. Once the platform supports multiple engines, the assistant layer becomes permanent."
Windsurf Editor
Model Context Protocol
Model Context Protocol (MCP) introduced by Anthropic.
"Standards are how ecosystems form. MCP is a bet that context and tools will be as modular as libraries."
The "Vibe Coding" Era
Andrej Karpathy crystallizes the industry feeling with a tweet about "Vibe Coding".
"Coding transitions from a typing task to a managerial task. The "Vibe Check" became the new Code Review."
Copilot Agent Mode
Copilot Agent Mode preview ("The agent awakens").
"This is GitHub saying the assistant should act, not just suggest. When the default tool becomes an agent, teams reorganize around delegation."
Claude 3.7 Sonnet + Claude Code
Claude 3.7 Sonnet + Claude Code announced (agentic coding tool).
"CLI-native agents feel like "real programmers" because they can run tools and keep state. This is the path from chat to craftsmanship."
Replit Agent v2
Replit releases Agent v2 with "real-time app design preview" and improved autonomous hypothesis formation.
"The agent now sees what it builds as it builds it. Design and logic loops tighten, reducing the "blind coding" errors of previous generations."
Continue 1.0
OpenAI o3
OpenAI o3 becomes generally available, setting new records on SWE-bench Verified (69.1%).
"Reasoning models mature. With 69% on SWE-bench, o3 proves that "slow thinking" is the key to solving complex engineering tasks autonomously."
Codex CLI Open Source
OpenAI Codex CLI open-sourced (local terminal coding agent).
"Open-sourcing the client makes agents feel like tooling, not a website. The terminal is where developers already live."
Codex Agent Preview
OpenAI launches Codex (cloud-based software engineering agent) research preview.
"This is "agent as a teammate": async work, long-running tasks, and PR-shaped output. Once agents can wait and retry, they look like employees."
Copilot Coding Agent
GitHub announces Copilot Coding Agent (async agent that opens PRs).
"The PR is the unit of integration. When an agent can open one, it's no longer helping you code—it's shipping code into your process."
OpenCode AI
OpenCode AI launched as an open-source, terminal-native AI coding agent supporting 75+ LLMs.
"The terminal becomes the agentic workspace. OpenCode proves that developers want powerful, flexible AI tools that live where they code."
Sourcegraph Amp
Sourcegraph launches Amp, its next-generation AI coding assistant succeeding Cody, designed for enterprise-scale codebase reasoning.
"Enterprise context is the moat. Amp leverages knowledge graphs to understand massive repositories better than generic models."
Copilot Agent GA
Copilot Coding Agent becomes generally available.
"GA is where experiments become budget lines. Once agents are paid-for defaults, management starts asking what else can be delegated."
Beads Memory System
Steve Yegge releases Beads, a Git-backed memory and task-tracking system for coding agents. Issues stored as JSONL, hash-based IDs prevent merge collisions, and semantic "memory decay" summarizes old tasks.
""Your agents simply cannot keep track of work using Markdown files." Beads solves the long-horizon planning problem by giving agents a structured place to track state across sessions."
Amp Free (Ad-Supported)
Sourcegraph launches Amp Free, an ad-supported tier making agentic coding accessible to everyone. Ads appear discreetly at bottom of editor/CLI; code snippets never shared with ad partners.
""Agentic coding is now free for everyone." The Netflix/Spotify model comes to dev tools—ads cover costs without compromising agent behavior or code privacy."
GitHub Agent HQ
GitHub introduces Agent HQ to orchestrate multiple third-party coding agents.
"This is the "app store" move: the platform becomes the place agents run. Once orchestration is centralized, agents become plugins."
Claude Opus 4.5
Anthropic releases Claude Opus 4.5, completing the 4.5 model family. Leads on SWE-bench Verified and introduces an "effort parameter" for computation tradeoffs.
"Opus 4.5 sets a new bar for agentic coding: 80.9% on SWE-bench Verified. The effort parameter signals a shift toward adaptive compute—models that think harder when tasks demand it."
The Ralph Wiggum Technique
Geoffrey Huntley's "Ralph Wiggum" technique goes mainstream—a Bash loop that iteratively feeds prompts to Claude Code until tasks complete autonomously.
""Ralph Wiggum + Opus 4.5 is really, really good." — Matt Pocock. The technique reduces software costs dramatically by letting agents fail, learn from git history, and retry indefinitely."
GPT-5.2
GPT-5.2 released (Instant, Thinking, and Pro variants) as a frontier agentic coding model line.
"Naming a model line after the job is a tell: "coding agent" is no longer an application, it's a first-class product."
Gas Town (Yegge)
Steve Yegge releases Gas Town, a multi-agent orchestrator for coordinating 20-30 concurrent Claude Code agents. Named after Mad Max's oil refinery citadel.
""The biggest problem with Claude Code is that it ends." GUPP (Gastown Universal Propulsion Principle) solves this: "If there is work on your hook, YOU MUST RUN IT.""
Cursor: Scaling Multi-Agent Coding
Cursor publishes research on running hundreds of concurrent agents for weeks—building a web browser (1M+ lines), migrating Solid to React (+266K/-193K edits over 3 weeks), and a Windows 7 emulator (14.6K commits, 1.2M lines).
""The harness and models matter, but the prompts matter more." Flat hierarchies failed when 20 agents slowed to the throughput of 2-3 due to lock contention. The fix: separating planners (explore, spawn tasks) from workers (grind until done)."
End of Report