Methodology

How the Standout Score works

A deterministic 0-100 design quality score. The same page always gets the same score: it is computed from measurements and a versioned rulebook, never judged by a language model. That is what makes it comparable across sites, over time, and worth putting on a report.

What feeds the score

Every audited page goes through the same three inputs:

A rendered audit. The page loads in a real headless browser and is measured, not eyeballed: computed WCAG contrast ratios, mobile layout at 360, 390, and 768 pixel widths, readability floors, and generic-hero detection, with phone and desktop screenshots.
The taste rulebook. 73 rules distilled from real production sites, from typography and color discipline to the AI tells (rulebook v2026.07, changes below). Each detection deducts from its category by severity.
Category basics for the site type. A restaurant page without a visible menu path, or a clinic without a booking path, is missing something a real visitor needs; the score knows what kind of site it is looking at.

The eight categories and their weights

Category	Weight	What it measures
Visual quality	16%	Does the design look intentional: committed palette, real hierarchy, one focal moment.
Conversion clarity	16%	One clear promise, one primary action, proof where the decision happens.
Mobile quality	14%	Measured at real phone widths: gutters, tap targets, nothing clipped or overlapping.
Accessibility	14%	Computed WCAG contrast ratios, readable text sizes, focus states, reduced motion.
Brand distinctiveness	12%	Would you recognize this site tomorrow, or is it any template.
Trust & proof	10%	Real specifics instead of claims: named results, verifiable details.
AI-slop	10%	The tells that read as AI-built: gradient text, glow blobs, template skins, stock phrases.
Context fit	8%	Does the design fit what this kind of business actually needs to convince its visitors.

These are the live weights the production scorer uses; this page imports them from the same module, so it cannot drift from reality.

The 85 bar, and why a high score can still be blocked

Client-ready starts at 85. But the verdict is a gate, not just a number: a page is only called client-ready when it scores 85 or higher and has no kill-severity detections, no missing category basics, no ship-blocking tells (like gradient headline text), and a real successful render behind the measurement. A 96 with a banned tell on it is still blocked.

Honesty rule: if the visual render fails, the score is capped at 60 and labeled a markup-only estimate. A clean-compiling page must never read as visually verified when nothing was actually seen.

Held to our own medicine

Our own homepage is scored by the same critic, with nothing relaxed. It blocked us at 83 until every finding was fixed; the public report of the current page stands at 99/100. If we ever soften a rule to make our own page pass, this score means nothing; so we do not.

Rulebook changes (current: v2026.07)

v2026.07

Six new template-skin tells from the 2026 wave of AI-built sites, calibrated both ways against live production pages: zero false fires on clean sites, confirmed positives on known offenders.

emoji used as UI
the fake browser-mockup hero
glassmorphism cards on a void
the raw shadcn default skin
glow blobs on a dark void
the faded grid backdrop

v2026.06

The base rulebook: taste rules across typography, color, layout, imagery, motion, copy, trust, and the classic AI tells (the purple gradient, gradient headline text, three identical cards, centered template heroes), plus the rendered audit: computed WCAG contrast, mobile layout at real phone widths, and readability floors.

Scores are comparable within a rulebook version. When the rulebook tightens, a page's score can drop without the page changing; that is the score keeping up with what reads as AI-built, not noise.

See your own score

Paste any public URL and get the full 8-category breakdown free.

Run a free audit