Step 1 — Define the prompt panel
The prompt panel is the list of queries you'll measure against. It should contain 100–500 queries depending on your category size. Get the queries from three sources: your sales team (what do real prospects ask on discovery calls), your support team (what do customers ask post-sale), and your own synthetic rewrites of keyword research into natural-language questions. Avoid keyword lists translated mechanically into questions — prompt panels that look like SEO keyword lists don't measure AI behavior well, because real LLM users ask in complete sentences with context.
Step 2 — Run the panel across six engines
Every prompt gets run in Google AI Overviews, Perplexity, ChatGPT (both memory-only and ChatGPT Search), Gemini, Claude, and Microsoft Copilot. That's six surfaces, six runs per prompt. For 200 prompts that's 1,200 query executions. This is where specialist tooling earns its keep — running manually is slow and introduces bias. We use a combination of API access (where available) and browser automation. Record the full response, the cited sources, and the position of every mention.
Step 3 — Score citation presence
For each prompt, record: was the brand named at all, at what position in the response, with what surrounding context (neutral / positive / negative), alongside which competitors, and with what attribution (direct link, indirect mention, implied reference). Aggregate into share-of-voice metrics per engine and overall. This gives you the baseline: today, for this prompt panel, what percentage of answers name your brand.
Step 4 — Analyze by competitor and topic
Cross-tabulate the results. Which competitors win which topics? Are there engines where you dominate and engines where you're absent? Are there topic clusters where no brand dominates (blue-ocean GEO opportunities)? Are there engines where a specific competitor has an unreasonable share (suggesting they did focused GEO work there)? Answers to these questions tell you where to allocate the next 90 days of work.
Step 5 — Flag the top leverage points
The audit should end with a ranked list of concrete actions. Not 40 findings — 10 leverage points, ranked by expected impact and effort. Examples: "Fix entity ambiguity in Wikidata (high impact, low effort)." "Rewrite /solutions hub for extractability (high impact, medium effort)." "Publish pillar article on [topic X] where no brand currently dominates (high impact, high effort, longest payback)."
What to do with the audit afterward
Run the same panel monthly. Nothing else you do in GEO matters if you're not measuring the same thing consistently. The delta month over month is what tells you whether the work is working. And repeat the panel refresh every quarter — buyer questions evolve, and so should the measurement.
If this sounds like a lot, it is — but it's also a one-week project for an experienced practitioner, and the output shapes every downstream decision. We deliver this as the first phase of every GEO engagement, and many clients subscribe to an ongoing version as a standalone measurement service.