Why schema matters more in the AI era, not less
There's a persistent myth that because LLMs can parse unstructured text, schema is obsolete. The opposite is true. LLMs parse unstructured text generatively — which means they also generate hallucinations, ambiguities and confident wrong answers. Schema is how you give the engine a machine-readable fact-check of what the prose is claiming. It's the difference between "the model thinks your brand is a pharma company" and "the model knows your brand is a pharma company because the schema says so and the prose confirms it." Every engine we've measured rewards the coherent pair.
The schema types that actually matter
Organization — one per site, canonical node. Include name, url, logo, sameAs (pointing to Wikidata, LinkedIn, Crunchbase, etc), foundingDate, address, and contactPoint. This is the root node every other schema block should reference.
ProfessionalService / Service / Product — for every offering. Name, description, serviceType, provider (pointing back to the Organization by @id), areaServed, offers. This is where most sites are either absent or shipping junk.
Article — on every editorial page. Headline, description, datePublished, dateModified, author (with @id), publisher (with @id pointing to the Organization), mainEntityOfPage. Critical for Article citations.
FAQPage — on service and article pages that answer questions. Each Question gets its Answer explicitly, structured, and matching what the prose says. Do not use FAQPage as a keyword-stuffing vehicle. Engines will ignore or demote.
BreadcrumbList — on every page. Unglamorous. High compounding. It tells the engine where the page sits in the site's hierarchy.
HowTo — on step-based content. Heavily used by Google AI Overviews for "how do I" queries.
The high-leverage move: stable @id chaining
The move that separates serious schema from decorative schema is using stable @id references to build one coherent graph across the whole site. Every page's Organization block uses the same @id (e.g. https://yoursite.com/#organization). Every Service block references that same @id as its provider. Every Article references the same Organization as publisher. The result: an answer engine parsing any single page can reconstruct the whole knowledge graph of the brand. This is what LLMs reward — because it's how their internal models of entities are structured anyway.
Three mistakes that kill citation share
Mistake 1 — shipping schema that contradicts the prose. If your schema says "founded in 2015" and your About page says "founded in 2018," the engine drops trust in both. Consistency is the whole game.
Mistake 2 — keyword-stuffing FAQPage blocks. Every modern answer engine has filters for this. Stuffing FAQPage with marketing questions that aren't really questions will get your schema ignored or the page demoted. Only ship FAQ schema for questions real users actually ask.
Mistake 3 — unstable or unresolvable @ids. If your @ids don't resolve to real URL fragments, or change between page loads, the engine cannot link nodes into a graph. Pick a convention (e.g. https://yoursite.com/#organization) and use it everywhere.
A minimal schema architecture to start from
Every page: BreadcrumbList. Every homepage: Organization + WebSite. Every service page: Service + FAQPage + BreadcrumbList. Every article: Article + FAQPage + BreadcrumbList. Every product page: Product + FAQPage + BreadcrumbList. This covers 95% of the GEO-relevant cases for a typical B2B site and takes one senior engineer about two weeks to implement cleanly.