Picture this: you need to write a proposal, research a recent competitor move, generate a social visual, and debug a spreadsheet formula — all before a 2pm call. You open ChatGPT and start typing. An hour later, you have something. It’s fine. But it doesn’t feel sharp, and you’re not entirely sure why.
The problem isn’t AI. It’s that each of those tasks has a different requirement:
depth of reasoning
access to current data
visual output
structured execution
And no single model handles all four at the same level. Using one model for everything is like using a screwdriver for every job because it handled the last screw well.
The concept is called model routing: matching each task type to the model built for it. It sounds like extra work. In practice, once you have the habit, it takes about five seconds and produces noticeably better results. This guide gives you the exact framework — with the copy-paste prompts we use ourselves — so you can start today.
TL;DR — Too Long Didn’t Read
The problem: Most people use one AI model for everything, treating all models as interchangeable. They're not. Each is optimized for a different profile of work.
The solution: Match the task to the model's actual strengths. This is called model routing, and it's the single biggest lever most people haven't pulled yet.
The models that matter right now: Claude for writing and complex reasoning, ChatGPT for versatility and visuals, Gemini for live research and Google Workspace, Grok for real-time social data, Microsoft Copilot, if you’re on MS 365.
The workflow in one sentence: Identify what your task actually requires — depth of reasoning, real-time data, visual output, or autonomous execution — then send it to the model built for that.
The honest caveat: This landscape is moving fast. Model rankings shift with every release, and the "best" model for coding or writing today may not hold that position in three months. What doesn't change is the framework: task type drives model choice.
Table of Contents
1. What We Use At Practicaly AI
Before the breakdown, here is our actual stack. No caveats, no ‘it depends.’
For writing anything that matters — newsletters, client reports, proposals — we use Claude. Not because it wins every benchmark, but because when we read the output back, it sounds like a person wrote it. That is harder to achieve than it looks, and it’s the thing that matters most when a real human is going to read your work.
For research on anything time-sensitive, we use Gemini’s Deep Research mode. It runs dozens of searches, synthesises the results, and cites its sources. A research task that would take 45 minutes of manual reading comes back in 15 minutes, fully cited. The free tier handles most of it. (We also use Perplexity but Gemini is better for the work we do)
For images alongside text — social posts, visual mockups, anything where we want a layout fast — we use ChatGPT. The image generation is the only reason we keep a Plus subscription, and it earns it.
We don’t pay for Grok or Copilot as standalone subscriptions. Grok comes bundled with X Premium+, which some of us have for other reasons. Copilot is bundled into Microsoft 365.
The two-model minimum we’d recommend to anyone doing serious knowledge work: Claude + Gemini. Together they cover 80% of what most people need. Add ChatGPT if you produce visual content. Add Grok if you monitor brands or track social trends professionally.
2. The Test: Same Prompt, Five Models, Real Outputs
We ran the same writing prompt through Claude Sonnet 4.6, ChatGPT GPT-5.5, Gemini Pro, Grok, and Microsoft Copilot to see what actually comes out. The prompt was deliberately specific: it required a clear voice, precise constraints, and no filler.
Here it is:
Write the opening paragraph for a newsletter about why most knowledge workers are using AI tools wrong. The reader uses at least one AI tool daily. Do not start with a question. Do not use the word 'revolutionize'. Sound like a smart friend givingadvice, not a blog post. 120 words max.


Claude tends to open with a concrete observation that pulls you in immediately. The constraint about "sounding like a smart friend" is treated as the primary instruction, not a secondary note. Voice and specificity show up in the first sentence. When you read it aloud, you don't have to make adjustments.


ChatGPT produces clean, correct writing. The constraint is followed technically but the result often reads slightly more like a polished article than a conversation. It satisfies the brief without fully inhabiting it. Good for general use; less good for writing that needs a specific person's voice.


Grok is fast and can be punchy. When the constraint is factual (list, summary, data) it performs well. When the constraint is voice-based — as this one is — it more often defaults to a generic opener that technically avoids the stated prohibitions but doesn't nail the tone.


Gemini is accurate and well-structured. It tends toward slightly more formal register even when asked not to. The output is useful as a first draft but often needs one editing pass to bring it down to the conversational level the prompt asked for.


Copilot produces competent, safe writing that reads like internal comms. It's unlikely to embarrass you; it's also unlikely to make anyone lean forward.
The key insight: The gap between models is subtle when you read one output in isolation. It becomes obvious when you read all five back to back. "Writing quality" isn't a benchmark number — it's the paragraph you read and think: that's exactly how I'd say it.
3. Claude: For Writing, Reasoning, and Anything Read by a Human
One-line verdict:
The strongest model for writing that needs to sound like a person wrote it, and for multi-step reasoning tasks where instruction-following over a long conversation matters.
Why Claude for writing
Claude's output has a characteristic quality that's difficult to articulate but immediately recognizable: it doesn't pad. Most AI writing has filler — transitional phrases that exist to reach a word count, hedges that exist to cover all bases, openers that warm up before saying anything. Claude tends to cut to the thing more quickly and hold a more consistent tone across a long piece.
This matters more than it sounds. If you're writing something a client, manager, or audience will read, the margin between "fine" and "good" is felt even by people who can't name what's different.
SAMPLE PROMPT: Refining Copywriting to Improve Tone of Voice
This section isn't landing. Here's what I'm trying to say: [your intent in plain language].
Here's what I wrote: [paste section]
Rewrite it. Don't add length. Don't add bullet points unless I've already used them.
Preserve any phrases I've underlined or marked [keep].This is how we tested this:

When NOT to use Claude
Claude doesn't have real-time web access in its base form (though it can use tools). For anything requiring current information — recent news, live prices, what happened this week — use Gemini or Grok first, then bring the research to Claude for the writing pass.
For image generation, Claude is not the right tool. Use ChatGPT.
For tasks where you need to paste in a very long document (100k+ tokens) and ask detailed questions about specific sections, Claude handles this well but Gemini's context window may give you more headroom on extremely long inputs.
Best for: Any writing that will be read by a human, document review, multi-step reasoning, email editing, decision analysis, complex instruction-following
Avoid for: Real-time information, image generation, tasks requiring live web data
Pricing: Claude Pro — $20/month. Teams and Enterprise tiers available. API from ~$3/M input tokens for Sonnet.
4. ChatGPT: For Versatility, Visuals, and Broad Execution
One-line verdict:
The most versatile model available — and the only major one that generates strong images alongside text in a single workflow. If your work combines writing and visuals, this is your tool.
What the benchmarks actually mean in plain English
GPT-5.4 scores 92.8% on GPQA (graduate-level factual accuracy) and 97% on SimpleQA (direct factual questions).
In practical terms, it is the most reliably accurate model for factual recall of the major group. If you ask it a question with a known, verifiable answer, it is less likely to hallucinate a confident-sounding wrong answer than most of its competitors. It also leads on LongBench v2 with a score of 95, which measures performance over very long documents. For research synthesis or legal review where you’re feeding in long source material, that’s worth knowing.
The prompt style that works best with ChatGPT
ChatGPT handles open-ended, exploratory prompts more flexibly than Claude. It’s more forgiving with loose instructions and will usually produce something useful even from an underspecified brief. For image generation specifically, the key is to describe the style, mood, and layout explicitly — vague image prompts produce generic stock-photo results.
SAMPLE PROMPT: Image + social caption in one go
Create an image for a LinkedIn post about [your topic]. Style: clean, minimal, modern — like a high-end tech brand launch visual, not a stock photo. Include the text '[your headline]' in the image in a bold sans-serif font. Background: [describe — e.g. 'deep navy with subtle geometric shapes']. After generating the image, write a 3-sentence LinkedIn caption to accompany it, ending with a question to drive comments.This is how we tested this:

When NOT to use ChatGPT
For detailed multi-part prompts that require strict instruction-following across a long conversation, GPT-5.4 can drift from earlier constraints. Teams doing precision work — legal drafting, compliance review, structured documentation — often notice this more than casual users. For that kind of work, Claude’s instruction-following tends to be tighter. For live social data or breaking news, Grok and Gemini are better choices.
Best for: Image generation, general daily use, broad factual questions, brainstorming, social content
Avoid for: Precision-format long documents, real-time information, complex agentic tasks
Pricing: ChatGPT Plus — $20/month. Lighter tier ~$8/month. API from $2.50/M tokens
5. Gemini: For Research That Needs to Be Current
One-line verdict:
The best tool for research where current information matters, and the only major model that natively lives inside Google Workspace. Its free tier is surprisingly capable.
What the benchmarks actually mean in plain English
Gemini 3.1 Pro leads with an overall composite at 93. What that score doesn’t tell you is why: Gemini’s strength is breadth, speed, and live data access — not necessarily depth of reasoning on complex problems. It returns answers fast, handles a standard 1 million-token context window, and its multilingual capability is the strongest of the group across 100+ languages.
But the real differentiator is Deep Research mode. This feature runs hundreds of searches across the web, synthesises the results, and produces a fully cited report. A topic that would take a skilled researcher 3–4 hours of reading and note-taking can come back in 15–20 minutes with sources you can verify. No other consumer facing model does this at the same depth.
The prompt style that works best with Gemini
For Deep Research, be explicit about time frame and source quality. Gemini will search broadly by default — if you want recent developments, say so. If you want academic or primary sources rather than blog posts, say so. The tighter the scope, the more useful the citations.
SAMPLE PROMPT: Deep Research report (enable the toggle first)
Using Deep Research: write a comprehensive report on [topic]. Focus on developments from the last 6 months. Prioritise primary sources, official announcements, and reputable publications — avoid opinion pieces and personal blogs. Format: 200-word executive summary first, then key findings with sources cited inline. Flag any claims where sources contradict each other.We tested this out for ourselves and Gemini of course generated a detailed report which included the latest information, detailed breakdowns, impact numbers and tables, and citations.

When NOT to use Gemini
For final written documents, Gemini’s output is accurate but functional — it reads more like a report than a piece of writing. If the document needs to sound polished and human, take Gemini’s research output and pass it to Claude for the final draft. That two-step workflow (Gemini researches, Claude writes) is more powerful than either model used alone, and it’s how we produce most of the research-heavy content in this newsletter.
Best for: Current-events research, Google Workspace tasks, multilingual content, high-volume API work
Avoid for: Final written documents with voice, tasks requiring strict instruction adherence
Pricing: Gemini Advanced — $20/month via Google One AI Premium. Free tier is capable. API from $1.25/M tokens
6. Grok: For Real-Time Social Intelligence
One-line verdict:
The only major model with real-time access to X (Twitter) data. If you need to know what people are saying about something right now — not six months ago — there is no close second.
What the benchmarks actually mean in plain English
Grok 4.20’s defining specification isn’t a benchmark score — it’s the data feed. Around 68 million English-language X posts per day flow into its context in real time. On any topic where recency and social sentiment matter, Grok is answering from data that is hours old, not months old. That is a structural advantage no amount of general benchmark improvement can replicate. On pure generation speed, Grok 4.20 runs at around 235 tokens per second — roughly three times faster than GPT-5.4. For high-throughput use cases, that matters. For typical daily work, the live data feed is the more relevant advantage.
The prompt style that works best with Grok
Be explicit about wanting real-time data. Ask for what people are saying ‘right now’ or ‘in the last 48 hours’. Without that framing, Grok may draw on general training knowledge rather than its live feed — which defeats the entire purpose.
SAMPLE PROMPT: Deep Research report (enable the toggle first)
Search X for posts from the last 48 hours mentioning [brand or topic or keyword]. Tell me: (1) the dominant sentiment — positive, negative, or mixed, (2) the top 3 specific themes or complaints appearing most frequently, (3) any breaking news or announcements driving the conversation. Include 2–3 direct quotes from representative posts, with approximate timestamps.When we tested this, we were aware that ChatGPT Codex released its Pets feature on 2nd May 2026 and this was Grok’s response when we queried on 4th May 2026. The results are accurate:

When NOT to use Grok
Grok’s real-time advantage is specific to the X ecosystem. For live web data more broadly, Gemini’s Deep Research is more comprehensive. For writing quality and complex reasoning, Claude and ChatGPT are noticeably stronger. Grok is a specialist — use it when current social sentiment or X-specific trends are the actual thing you need, not as a general-purpose substitute.
Best for: Brand monitoring, trending topic research, real-time social sentiment, X-specific intelligence
Avoid for: Long-form writing, complex reasoning, web-wide research beyond X
Pricing: SuperGrok — $30/month or $300/year. Included with X Premium+ at $40/month
7. Microsoft Copilot: Only if You Live in Microsoft 365
One-line verdict:
Strong inside Teams and Outlook; mediocre outside them. If your organisation runs on Microsoft 365 and you’re in back-to-back meetings, it earns its place. For most other situations, standalone alternatives are cheaper and better.
Copilot’s clearest win is Teams meeting summaries. It can join a meeting, summarize what was discussed, list action items, and give a catch-up to anyone who joined late — without leaving the Teams interface. For organisations in relentless back-to-back meetings, that is genuinely useful and saves real time.
Outside that specific use case, Copilot’s writing quality doesn’t match Claude, its reasoning doesn’t match ChatGPT, and its research capability doesn’t match Gemini. The real combined cost — $30/user/month for Copilot on top of an existing Microsoft 365 subscription — runs $50–80+ per user per month before implementation overhead.
One development worth knowing: as of Copilot Wave 3 (March 2026), Microsoft added multi-model selection to Copilot Studio, including access to Claude Opus 4.6 for demanding reasoning and writing tasks.
If your organisation is already running Copilot at enterprise scale, you may be able to route your most demanding work to Claude without leaving the Microsoft ecosystem.
Best for: Teams meeting summaries, Outlook drafts, Excel formula help — for teams already in M365
Avoid for: Anything outside M365 where standalone tools are cheaper and higher quality
Pricing: $30/user/month, requires existing M365 subscription — $50–80+ combined per user
8. The Complete Routing Table
Keep this open the first two weeks. The habit forms faster than you'd expect.
Task | Model | Prompt structure |
|---|---|---|
Long-form writing | Claude | Document type + audience + tone + what to avoid + structure |
Contract / doc review | Claude | Flag risk + summarize obligations + what to push back on |
Email editing | Claude | Keep voice + cut filler + sharpen ask + don't over-formalise |
Decision analysis | Claude | Full context + optimise for what? + assumptions + recommendation |
Scenario stress-test | Claude | Plan + three failure modes + most fragile assumption |
Deep research report | Gemini Deep Research | Topic + timeframe + source quality + format + flag contradictions |
Competitive intelligence | Gemini Deep Research | Company + recent moves + strengths + weaknesses + customer sentiment |
Google Workspace task | Gemini | App + task + recipient + tone + constraints |
Long document Q&A | Gemini | Quote before analysis + say if not in doc + flag contradictions |
Image + social caption | ChatGPT | Platform + style + text in image + colour + caption requirements |
Data analysis | ChatGPT | Paste data + 3 insights + anomalies + what data can't answer |
Brainstorming | ChatGPT | 10 ideas + constraint + for each: description + why it might fail |
Real-time brand monitoring | Grok | Last 48 hours + sentiment + top themes + breaking news + quotes |
Trending content angles | Grok | Last 24 hours + industry + why trending + content angle |
Competitor social monitoring | Grok | 7 days + praise + complaints + announcements + sentiment |
Crisis detection | Grok | Last 6 hours + developing situation? + scale + tone |
Teams meeting summary | Copilot | Overview + action items with owners + open questions + escalations |
Outlook email draft | Copilot | My position + tone + length + what not to do |
Excel formula help | Copilot | What it should do + data structure + edge cases |
Research → polished doc | Gemini → Claude | Research with Gemini, write with Claude (Workflow 1) |
Social content with visual | Grok → Claude → ChatGPT | Angle from Grok, copy from Claude, image from ChatGPT (Workflow 2) |
Proposal from scratch | Claude (multi-turn) | Questions first, then structure, then draft (Workflow 3) |
9. Advanced Multi-Model Workflows
Workflow 1: Research-to-Report (45 minutes → 15 minutes)
Use when: You need to produce a well-researched, well-written document on a topic you don't know deeply — competitor landscape, market analysis, policy brief, industry overview.
Step 1 — Gemini Deep Research
Using Deep Research: produce a comprehensive briefing on [topic] covering:
- The current state as of [month/year]
- Key players / stakeholders and their positions
- Recent developments (last 6 months)
- Contested areas where evidence or opinion is divided
- Gaps — what the research doesn't resolve
Cite inline. Flag low-confidence claims. Format: bullet points for scanning, not prose.Step 2 — Claude: Turn research into a document
I have a research briefing. Turn it into a [document type — e.g. "5-page strategy memo"] for [audience].
Audience context: [who they are, what they know, what they care about]
Tone: [e.g. "confident and direct, not academic"]
Structure: [your preferred structure, or "you decide"]
Length: [range]
Things that must be in the final document: [list]
Things to cut or deprioritise: [list]
Do not invent anything not in the research. If something needs a number or fact you don't have, leave a [bracket placeholder].
Research: [paste Gemini output]Step 3 (optional) — Claude: Final edit pass
Read this back as the target reader.
Tell me:
1. The one paragraph that loses them
2. The one sentence that needs to be the headline but isn't
3. Anything that reads like AI wrote it
Then make those fixes.Workflow 2: Social Content Engine (Research → Post → Image)
Want the full breakdown?
This is where you get real AI workflows, prompts, and systems you can use to automate your work. If you're serious about using tools like Claude to grow your business, this is for you.
Unlock full access today