← ArcSurf Lab

Why AI search engines quote some paragraphs and skip others

April 29, 2026 · Sichao

If you've spent time watching what AI search engines actually quote, you've probably noticed the same thing I have: the engines are surprisingly picky about which paragraph on a page ends up in the answer.

A page can be the first or second citation in a Perplexity response, and yet the quoted sentence will come from paragraph eleven rather than paragraph one. A ChatGPT answer might pull a single line from the middle of a long post and ignore everything around it. The page was "chosen." The paragraph that represented it was not the obvious one.

After watching this happen across a few hundred queries over the past few months, the pattern starts to look like this.

Short, self-contained paragraphs win

The paragraphs that get quoted are almost always ones that could stand on their own. Usually 1–3 sentences. They don't start with "This means that..." or "As mentioned above..." They don't rely on setup earlier in the post. Pull them out of context and they still make sense.

This isn't surprising once you think about how a retrieval pipeline actually works. The engine grabs a chunk — usually a few hundred tokens — and feeds it to the model as context. If the chunk assumes context the model doesn't have, the model won't quote it. It has no way to resolve "that approach" or "these issues" back to what they originally referred to.

The result: a 2,000-word explainer with one self-contained paragraph buried halfway through often has that one paragraph quoted, and the rest ignored.

Definitional shapes beat narrative shapes

Engines seem to prefer paragraphs that answer in a definitional shape:

  • "X is Y that does Z."
  • "The main reason X happens is Y."
  • "To do X, you need to do Y."

Narrative-shaped paragraphs — "When we were building the system, we ran into a problem..." — almost never get quoted, even on pages that are otherwise well-ranked. Narrative is great for humans reading linearly. It's terrible for models sampling a chunk at random.

Bullets and short lists get lifted cleanly

If a page answers a how-to query and has a clean 4-step list somewhere, that list tends to get quoted as a unit. Sometimes the model restates it in its own words; sometimes it lifts it verbatim. Either way the structure carries over.

This is easy to test yourself. Find a page with a numbered list answering a common how-to question, then ask Perplexity or ChatGPT the same question. Odds are the list structure — sometimes the exact wording — shows up in the answer.

First paragraphs rarely win, and that surprised me

This one surprised me. I had assumed, from SEO intuition, that the opening paragraph would be the most frequently quoted. In practice it's closer to the opposite. First paragraphs in published articles are usually narrative ledes — scene-setting, framing the question, introducing the author's angle. Those are bad candidates for a retrieval pipeline. They don't contain the answer; they contain the path toward the answer.

The paragraphs that actually get quoted tend to look like they belong in a TL;DR box, a "key takeaways" section, or a direct answer under a subheading. Which suggests that if you want a paragraph to have a chance of being quoted, you should write it as if it will appear in isolation — because, one way or another, it will.

What this means if you're writing for AI engines

Three implications, none of them surprising once you accept that the engines sample at the chunk level:

  1. Write at least one paragraph per page that could stand completely alone. No "this," no "above," no "as we discussed." Short, declarative, self-contained.
  2. Put the direct answer early. Not because the first paragraph is automatically preferred, but because if nothing else on the page contains a self-contained answer, the engine has nothing to work with.
  3. Don't waste the space under a question-form heading. The paragraph immediately under an H2 phrased as a question is prime real estate. Make it a direct answer, not a transition.

None of this is a guarantee. Engines are stochastic, and what gets quoted varies run to run. But the pattern has been consistent enough that it's changed how I write every long-form piece — somewhere in the middle there is always a block that would survive being pulled out of the page entirely.