Do we need MCP anymore?
by Dev
TL;DR
MCP has a bad reputation, and for good reason. Most MCP servers dump dozens of tools into the context window before doing anything useful. But that's a bad implementation, not a bad protocol. Instead of switching to the CLI, we should learn from it and rethink MCP.
There's a recent wave of MCP hate online. As an MCP-skeptic, I'm happy about that. So I decided to benchmark Corsair's MCP and CLI performance to join in on the hate. Before you do an experiment, it's best not to have a bias. So step one of my experiment failed, because I am very biased. On to step two.
Corsair has a simple internal structure. You just call any integration's API using Corsair's ORM. Every integration has the same interface so an agent just knows what to do without needing too much upfront knowledge. I wrapped Corsair's internal structure with an MCP and a CLI. The MCP received 4 tools, and the CLI was given 4 commands. This keeps everything as similar as possible, so we can literally just see how the MCP architecture fares against the CLI architecture.
Here's a of how the Corsair MCP and CLI work. You don't need this context to read the article, but it helps.
I completed auth through 6 different integrations and ran 3 tasks. One relatively simple one and two more difficult ones. I ran 3 tasks for the MCP and 3 tasks for the CLI. This is a total of 18 runs (3 CLI tasks × 3 runs each + 3 MCP tasks × 3 runs each).
One thing to note: I tested stdio MCP, which represents the ceiling of MCP performance. Stdio means the MCP server is run as a local subprocess with no network in the path. This keeps the experiment more comparable, as an HTTP MCP adds a second uncontrolled variable that the CLI is not subject to.
Task 1 — Email Summarization
Can you pull the last 10 emails from new york times and summarize the news for me?
Uses: Gmail
Task 2 — New Sales Lead
Send an email to [email_address] at Brick Corp telling him I look forward to our call at 2:30 PM EDT next Thursday. Then send him the calendar invite as well.
Update the CRM as well.
In preparation for the meeting with Mukul, we need to finish up 3 things. Create three Linear issues with highest priority and assign them all to Dev:
- Finish Asana integration and make sure we're pulling company data properly
- Update invoicing so we're charging him on net-90 terms
- Reach out to their new CTO and send over the SOC II cert info
Once you've created all three of those Linear issues, I want you to send them in a Slack DM to Dev telling him to do these before my meeting with Mukul.
Uses: Gmail, Google Calendar, Google Sheets (CRM Link), Linear, Slack
Task 3 — Inventory Update
Please pull the latest inventory from Airtable. Send Dev an email (dev@corsair.dev) telling him everything that's in Warehouse B and the current stock there right now. If anything in Warehouse B is under 50 units in quantity, I want you to also send Dev a DM on Slack alerting him of this so he can place a new order from the distributor.
If quantity is less than 50, leave a note in Airtable after you've messaged Dev recording that you've done so.
I also want you to create a Sheet called "Warehouse A Data" and put the items in warehouse A in that. Don't include pricing since that's sensitive. Send a follow-up email to Dev with a link to that sheet. I don't want to share our main Airtable with him since that has sensitive information.
Uses: Airtable (Inventory Link), Gmail, Slack, Google Sheets
The chart below lets you explore how the two approaches compare across each metric. Switch between runs or use the averages.
Once both interfaces share the same execution model, the numbers converge. CLI edges out slightly on output tokens across all three tasks, but that gap doesn't hold on cost.[4]There's no consistent winner. What this tells you is that the protocol isn't the variable. When the underlying tool is the same, MCP and CLI produce the same result.
In this experiment, the agent has to introspect the endpoints it has access to, inspect the schema of an endpoint, and call that endpoint with the proper schema. It has to do this for both MCP and CLI. The result is that MCP and CLI offer a similar performance. This is because the Corsair MCP is structured the way every MCP should be: there's no relationship between the number of integrations and the number of MCP tools. The tools the MCP receives are solely for introspection and execution. If you think about it, that's how a CLI natively works with a --help command. It's a good combination of introspection and execution.
This is not a perfect experiment. More tasks across a larger range of integrations would have eliminated variance across individual runs. Other experiments preload dozens of tools into the MCP context window before getting started. For example, the GitHub MCP loads 43 tools. Those same tools have CLI equivalents that the LLM has been trained on. The LLM natively knows that a GitHub command starts with git, and it's likely seen multiple variants of those 43 tool calls in CLI form while training. This means many online experiments are flawed from the beginning. They're set up so the CLI will win, and then they're mad at MCP when the CLI inevitably wins.
I started this experiment as a skeptic looking for data to confirm more token-efficient CLI performance. The data didn't cooperate. When I gave CLI and MCP the same execution model, the gap I expected to prove wasn't there. That's because we ended up building a better MCP server.
MCP scaled faster than the understanding of how to build it well. Servers got shipped half-baked because the window to claim MCP support felt like it was closing. The backlash is a natural response to that, but it's not the end. MCP is in its second phase now: being questioned, benchmarked, and rebuilt. Within a few months I think the sentiment flips. The lesson is to build the integration layer properly first, and the MCP becomes trivial. When we did that, MCP and CLI performed identically. The protocol was never the problem.