Copilot’s Fits and Starts

Useful in the Workflow, Outmatched at the Frontier

Dec 19, 2025

Two years into broad availability, Microsoft 365 Copilot remains a product that can feel genuinely helpful on Tuesday and strangely disposable by Friday. Understanding why requires looking at what Copilot actually is, what it’s being compared to, and why the gap between demo and production keeps tripping up enterprise buyers.

Copilot performs well inside the Microsoft stack, especially when the work is bounded and the source material already lives in Microsoft 365. But put it next to frontier model experiences like ChatGPT, Claude, and Gemini for open-ended reasoning and sustained analysis, and it feels incredibly constrained. Microsoft’s own leadership has effectively acknowledged this dynamic … even while arguing Copilot has differentiated strengths.

Copilot isn’t competing in the same arena as the frontier models, which are evaluated on unconstrained reasoning in a clean prompt window. Copilot, conversely, is generally evaluated on whether it can retrieve, summarize, and act safely inside your tenant, with all the permissions, policy constraints, and document sprawl that entails. This is by design, but that architectural reality is why Copilot can feel “worse” in side-by-side comparisons even when it is solving a different problem.

Thanks for reading Deep Finance Dispatch! This post is public so feel free to share it.

Where Copilot Actually Works

Let’s start with what Copilot does well, because there is real value here, and copilot users aren’t imagining it.

Copilot’s advantage has never been its raw intelligence. The advantage is its integration into the Microsoft universe. The tool is embedded directly into Word, Excel, PowerPoint, Outlook, Teams, and the content layer beneath them. Microsoft set out to build something that drafts, summarizes, rewrites, and composes inside the workflow people already use every day. That’s a fundamentally different bet than building the smartest model.

In finance work, the strongest “positive surprise” I keep hearing about is Copilot’s performance in Excel, especially for the unglamorous work that actually eats time: drafting variance commentary, pressure-testing driver narratives, and turning a messy close package into a board-ready story. Microsoft has been steadily deepening Copilot’s Excel surface area, including rolling out the COPILOT function to bring natural-language prompting into the grid itself for licensed users. This is significant because spreadsheets are where finance teams spend their time and validate outputs. You can tie out a narrative to a table, sanity-check a classification, then rerun and compare. Copilot doesn’t need to be a frontier model to be useful here. It needs to be good enough, integrated enough, and consistent enough to reduce time-to-first-draft. For finance, the bar is not “sounds right.” It’s whether the output is reproducible, tie-outs cleanly, and can be defended in review or audit, which is why Excel is one of the few places Copilot can earn trust quickly.

*Integrated, useful, and a little outmatched.*

SharePoint and OneDrive are the other quiet win, but with a caveat: only when those file systems are treated as a real knowledge layer instead of a dumping ground. Microsoft’s recent SharePoint messaging leans hard into the idea that Copilot is reasoning over the richness of Office documents, pages, and metadata, including content protected with sensitivity labels. If your finance team actually lives in SharePoint libraries with sane permissions and current content, Copilot starts to feel less like a chatbot and more like an accelerator for how work flows through the organization.

Where Copilot Falls Short

In direct comparison with frontier models, Copilot often feels weaker at raw reasoning, at workshopping through ambiguous questions, and at sustained analytical back-and-forth. This is due to Microsoft’s design of the tool … Partially model choice, some is policy and guardrails, and some is architecture. Copilot is constantly trying to be enterprise-safe, tenant-aware, and permission-correct. Those constraints are rational, but they can also make the tool feel less capable than a clean frontier chat interface where the only context is what you paste in the chat.

Many users experience Copilot as “fine for drafts,” but nothing they reach for when the thinking requirements get more complex. The most damaging part is that this perception becomes sticky and spreads across organizations, regardless of whether it reflects Copilot’s actual capabilities in a given workflow.

The Adoption Problem

Earlier this month, Reuters reported that Microsoft lowered sales growth targets for certain AI products after customers resisted adoption and sales staff failed to meet targets. One example: Carlyle reduced spending on Copilot Studio because of data integration challenges. Microsoft publicly disputed that it lowered overall targets, but the broader signal is clear. The pilot-to-scale curve is not compounding fast enough.

This is a value-realization issue, not just a go-to-market problem. In June 2025, BBB National Programs’ National Advertising Division reviewed Microsoft’s Copilot claims, found some supported, and recommended others be modified or discontinued. For CFOs, the subtext is familiar: if the ROI story is mostly survey sentiment, procurement will eventually ask for harder proof.

Microsoft is clearly shipping. Their own Copilot updates for November and December 2025 call out quality and performance improvements in Copilot Chat, navigation and search improvements in the Copilot app, and continued expansion of features across Teams, Excel, and PowerPoint. But most enterprises do not adopt AI based on feature velocity. They adopt when the tool becomes reliable inside a handful of repeatable workflows and when someone can show measured impact.

Copilot’s adoption ceiling is set by the quality of the tenant it is dropped into. If SharePoint and OneDrive are disorganized, if permissions are chaotic, if the “final” deck is really twelve decks, and if the close package lives in email attachments, Copilot will amplify all of it. The tool looks “dumb” because it’s grounded in a messy, inconsistent corpus. Frontier chat models often look better because the user is curating the context manually, avoiding the enterprise substrate problem entirely. This is why Copilot can feel simultaneously impressive in a demo and disappointing in production.

The Investment Thesis

Microsoft is spending like a company that expects Copilot to become a default layer of work. The company disclosed total funding commitments of $13 billion to OpenAI in its filings, and Reuters reported Microsoft’s plan to invest about $80 billion in AI-enabled data centers in fiscal 2025. That level of commitment only makes sense if Microsoft can convert AI infrastructure into durable, repeatable enterprise value across the suite.

I expect Microsoft will come out of this period by doing what it historically does well: turning capabilities into a coherent enterprise operating layer. The next phase has to look less like Copilot-as-a-feature and more like Copilot-as-a-system. That means tighter instrumentation of ROI by workflow, better administrative controls that make adoption safer and more predictable, and more opinionated guidance to make SharePoint and OneDrive “Copilot-ready” by default. It also means continuing to invest where finance actually lives, which right now is Excel.

Competition will accelerate these choices. Google has been pushing Gemini directly into Workspace surfaces and bundling AI features more broadly into Workspace plans, reducing procurement friction. Apple is resetting user expectations entirely with on-device intelligence and privacy guarantees baked into its architecture. Neither is building the same enterprise seat business as Microsoft, but both are shaping what users expect AI to feel like. Those expectations bleed into the workplace.

The Bottom Line

Copilot is a workflow tool that performs best when the work is structured, the content is governed, and the output can be verified. Treat it like a frontier model substitute and it will disappoint. Treat it like a finance productivity layer that lives inside Excel and a clean SharePoint knowledge base, and it can be genuinely useful.

The deeper lesson for enterprise buyers is that no AI tool will paper over bad information architecture. Microsoft is trying to industrialize frontier-model capability inside messy enterprises, under real security constraints, with CFO-grade expectations for reliability. The companies that get value from Copilot will be the ones that do the unglamorous work of making their own data ready for it.

Upgrade to Pro to receive the finance Copilot playbook, complete with a 30-day rollout plan, Excel and SharePoint workflow templates, and an ROI tracker designed for CFO-grade accountability.

Glenn Hopper

Discussion about this post

Ready for more?