Automated Visible Regression Testing With Playwright

Evaluating visible artifacts generally is a highly effective, if fickle, strategy to automated testing. Playwright makes this appear easy for web sites, however the particulars may take slightly finessing.

Latest downtime prompted me to scratch an itch that had been plaguing me for some time: The model sheet of an internet site I keep has grown just a bit unwieldy as we’ve been including code whereas exploring new options. Now that we have now a greater concept of the necessities, it’s time for inner CSS refactoring to pay down a few of our technical debt, profiting from trendy CSS options (like utilizing CSS nesting for extra apparent construction). Extra importantly, a cleaner basis ought to make it simpler to introduce that darkish mode function we’re sorely missing so we are able to lastly respect customers’ most popular colour scheme.

Nevertheless, being of the apprehensive persuasion, I used to be reluctant to make giant modifications for worry of unwittingly introducing bugs. I wanted one thing to protect in opposition to visible regressions whereas refactoring — besides which means snapshot testing, which is notoriously gradual and brittle.

On this context, snapshot testing means taking screenshots to ascertain a dependable baseline in opposition to which we are able to evaluate future outcomes. As we’ll see, these artifacts are influenced by a mess of things which may not all the time be totally controllable (e.g. timing, variable {hardware} sources, or randomized content material). We even have to take care of state between check runs, i.e. save these screenshots, which complicates the setup and means our check code alone doesn’t totally describe expectations.

Having procrastinated with no extra agreeable answer revealing itself, I lastly got down to create what I assumed could be a fast spike. In spite of everything, this wouldn’t be a part of the common check suite; only a one-off utility for this specific refactoring process.

Luckily, I had obscure recollections of previous analysis and shortly rediscovered Playwright’s built-in visible comparability function. As a result of I attempt to choose dependencies rigorously, I used to be glad to see that Playwright appears to not depend on many exterior packages.

Setup

The really helpful setup with npm init playwright@newest does an honest job, however my minimalist style had me set every thing up from scratch as a substitute. This do-it-yourself strategy additionally helped me perceive how the completely different items match collectively.

On condition that I anticipate snapshot testing to solely be used on uncommon events, I needed to isolate every thing in a devoted subdirectory, referred to as check/visible; that can be our working listing from right here on out. We’ll begin with package deal.json to declare our dependencies, including just a few helper scripts (spoiler!) whereas we’re at it:

{
  "scripts":  true"
  ,
  "devDependencies": {
    "@playwright/check": "^1.49.1"
  }
}

In the event you don’t need node_modules hidden in some subdirectory but in addition don’t need to burden the basis mission with this rarely-used dependency, you may resort to manually invoking npm set up --no-save @playwright/check within the root listing when wanted.

With that in place, npm set up downloads Playwright. Afterwards, npx playwright set up downloads a spread of headless browsers. (We’ll use npm right here, however you may choose a unique package deal supervisor and process runner.)

We outline our check surroundings through playwright.config.js with a few dozen primary Playwright settings:

import { defineConfig, gadgets } from "@playwright/check";

let BROWSERS = ["Desktop Firefox", "Desktop Chrome", "Desktop Safari"];
let BASE_URL = "http://localhost:8000";
let SERVER = "cd ../../dist && python3 -m http.server";

let IS_CI = !!course of.env.CI;

export default defineConfig({
  testDir: "./",
  fullyParallel: true,
  forbidOnly: IS_CI,
  retries: 2,
  employees: IS_CI ? 1 : undefined,
  reporter: "html",
  webServer: {
    command: SERVER,
    url: BASE_URL,
    reuseExistingServer: !IS_CI
  },
  use: {
    baseURL: BASE_URL,
    hint: "on-first-retry"
  },
  tasks: BROWSERS.map(ua => ({
    identify: ua.toLowerCase().replaceAll(" ", "-"),
    use: { ...gadgets[ua] }
  }))
});

Right here we anticipate our static web site to already reside throughout the root listing’s dist folder and to be served at localhost:8000 (see SERVER; I choose Python there as a result of it’s extensively out there). I’ve included a number of browsers for illustration functions. Nonetheless, we’d scale back that quantity to hurry issues up (thus our easy BROWSERS record, which we then map to Playwright’s extra elaborate tasks knowledge construction). Equally, steady integration is YAGNI for my specific state of affairs, in order that complete IS_CI dance could possibly be discarded.

Seize and evaluate

Let’s flip to the precise exams, beginning with a minimal pattern.check.js file:

import { check, anticipate } from "@playwright/check";

check("house web page", async ({ web page }) => {
  await web page.goto("https://css-tricks.com/");
  await anticipate(web page).toHaveScreenshot();
});

npm check executes this little check suite (based mostly on file-name conventions). The preliminary run all the time fails as a result of it first must create baseline snapshots in opposition to which subsequent runs evaluate their outcomes. Invoking npm check as soon as extra ought to report a passing check.

Altering our web site, e.g. by recklessly messing with construct artifacts in dist, ought to make the check fail once more. Such failures will supply numerous choices to check anticipated and precise visuals:

Failing test with slightly different screenshots side by side

We will additionally examine these baseline snapshots instantly: Playwright creates a folder for screenshots named after the check file (pattern.check.js-snapshots on this case), with file names derived from the respective check’s title (e.g. home-page-desktop-firefox.png).

Producing exams

Getting again to our unique motivation, what we wish is a check for each web page. As an alternative of arduously writing and sustaining repetitive exams, we’ll create a easy net crawler for our web site and have exams generated mechanically; one for every URL we’ve recognized.

Playwright’s world setup permits us to carry out preparatory work earlier than check discovery begins: Decide these URLs and write them to a file. Afterward, we are able to dynamically generate our exams at runtime.

Whereas there are different methods to go knowledge between the setup and test-discovery phases, having a file on disk makes it simple to change the record of URLs earlier than check runs (e.g. briefly ignoring irrelevant pages).

Web site map

Step one is to increase playwright.config.js by inserting globalSetup and exporting two of our configuration values:

export let BROWSERS = ["Desktop Firefox", "Desktop Chrome", "Desktop Safari"];
export let BASE_URL = "http://localhost:8000";

// and so forth.

export default defineConfig({
  // and so forth.
  globalSetup: require.resolve("./setup.js")
});

Though we’re utilizing ES modules right here, we are able to nonetheless depend on CommonJS-specific APIs like require.resolve and __dirname. It seems there’s some Babel transpilation occurring within the background, so what’s really being executed might be CommonJS? Such nuances typically confuse me as a result of it isn’t all the time apparent what’s being executed the place.

We will now reuse these exported values inside a newly created setup.js, which spins up a headless browser to crawl our web site (simply because that’s simpler right here than utilizing a separate HTML parser):

import { BASE_URL, BROWSERS } from "./playwright.config.js";
import { createSiteMap, readSiteMap } from "./sitemap.js";
import playwright from "@playwright/check";

export default async operate globalSetup(config) {
  // solely create web site map if it does not exist already
  attempt {
    readSiteMap();
    return;
  } catch(err) {}

  // launch browser and provoke crawler
  let browser = playwright.gadgets[BROWSERS[0]].defaultBrowserType;
  browser = await playwright[browser].launch();
  let web page = await browser.newPage();
  await createSiteMap(BASE_URL, web page);
  await browser.shut();
}

That is pretty boring glue code; the precise crawling is going on inside sitemap.js:

createSiteMap determines URLs and writes them to disk.
readSiteMap merely reads any beforehand created web site map from disk. This can be our basis for dynamically producing exams. (We’ll see later why this must be synchronous.)

Luckily, the web site in query supplies a complete index of all pages, so my crawler solely wants to gather distinctive native URLs from that index web page:

operate extractLocalLinks(baseURL) {
  let urls = new Set();
  let offset = baseURL.size;
  for(let { href } of doc.hyperlinks) {
    if(href.startsWith(baseURL)) {
      let path = href.slice(offset);
      urls.add(path);
    }
  }
  return Array.from(urls);
}

Wrapping that in a extra boring glue code provides us our sitemap.js:

import { readFileSync, writeFileSync } from "node:fs";
import { be a part of } from "node:path";

let ENTRY_POINT = "/subjects";
let SITEMAP = be a part of(__dirname, "./sitemap.json");

export async operate createSiteMap(baseURL, web page) {
  await web page.goto(baseURL + ENTRY_POINT);
  let urls = await web page.consider(extractLocalLinks, baseURL);
  let knowledge = JSON.stringify(urls, null, 4);
  writeFileSync(SITEMAP, knowledge, { encoding: "utf-8" });
}

export operate readSiteMap() {
  attempt {
    var knowledge = readFileSync(SITEMAP, { encoding: "utf-8" });
  } catch(err) {
    if(err.code === "ENOENT") {
      throw new Error("lacking web site map");
    }
    throw err;
  }
  return JSON.parse(knowledge);
}

operate extractLocalLinks(baseURL) {
  // and so forth.
}

The fascinating bit right here is that extractLocalLinks is evaluated throughout the browser context — thus we are able to depend on DOM APIs, notably doc.hyperlinks — whereas the remaining is executed throughout the Playwright surroundings (i.e. Node).

Assessments

Now that we have now our record of URLs, we mainly simply want a check file with a easy loop to dynamically generate corresponding exams:

for(let url of readSiteMap()) {
  check(`web page at ${url}`, async ({ web page }) => {
    await web page.goto(url);
    await anticipate(web page).toHaveScreenshot();
  });
}

For this reason readSiteMap needed to be synchronous above: Playwright doesn’t at the moment help top-level await inside check recordsdata.

In observe, we’ll need higher error reporting for when the location map doesn’t exist but. Let’s name our precise check file viz.check.js:

import { readSiteMap } from "./sitemap.js";
import { check, anticipate } from "@playwright/check";

let sitemap = [];
attempt {
  sitemap = readSiteMap();
} catch(err) {
  check("web site map", ({ web page }) => {
    throw new Error("lacking web site map");
  });
}

for(let url of sitemap) {
  check(`web page at ${url}`, async ({ web page }) => {
    await web page.goto(url);
    await anticipate(web page).toHaveScreenshot();
  });
}

Getting right here was a little bit of a journey, however we’re just about accomplished… except we have now to take care of actuality, which generally takes a bit extra tweaking.

Exceptions

As a result of visible testing is inherently flaky, we typically must compensate through particular casing. Playwright lets us inject customized CSS, which is usually the simplest and best strategy. Tweaking viz.check.js…

// and so forth.
import { be a part of } from "node:path";

let OPTIONS = {
  stylePath: be a part of(__dirname, "./viz.tweaks.css")
};

// and so forth.
  await anticipate(web page).toHaveScreenshot(OPTIONS);
// and so forth.

… permits us to outline exceptions in viz.tweaks.css:

/* suppress state */
essential a:visited {
  colour: var(--color-link);
}

/* suppress randomness */
iframe[src$="/articles/signals-reactivity/demo.html"] {
  visibility: hidden;
}

/* suppress flakiness */
physique:has(h1 a[href="https://css-tricks.com/wip/unicode-symbols/"]) {
  essential tbody > tr:last-child > td:first-child {
    font-size: 0;
    visibility: hidden;
  }
}

:has() strikes once more!

Web page vs. viewport

At this level, every thing appeared hunky-dory to me, till I spotted that my exams didn’t really fail after I had modified some styling. That’s not good! What I hadn’t taken under consideration is that .toHaveScreenshot solely captures the viewport slightly than your entire web page. We will rectify that by additional extending playwright.config.js.

export let WIDTH = 800;
export let HEIGHT = WIDTH;

// and so forth.

  tasks: BROWSERS.map(ua => ({
    identify: ua.toLowerCase().replaceAll(" ", "-"),
    use: {
      ...gadgets[ua],
      viewport: {
        width: WIDTH,
        top: HEIGHT
      }
    }
  }))

…after which by adjusting viz.check.js‘s test-generating loop:

import { WIDTH, HEIGHT } from "./playwright.config.js";

// and so forth.

for(let url of sitemap) {
  check(`web page at ${url}`, async ({ web page }) => {
    checkSnapshot(url, web page);
  });
}

async operate checkSnapshot(url, web page) {
  // decide web page top with default viewport
  await web page.setViewportSize({
    width: WIDTH,
    top: HEIGHT
  });
  await web page.goto(url);
  await web page.waitForLoadState("networkidle");
  let top = await web page.consider(getFullHeight);

  // resize viewport for earlier than snapshotting
  await web page.setViewportSize({
    width: WIDTH,
    top: Math.ceil(top)
  });
  await web page.waitForLoadState("networkidle");
  await anticipate(web page).toHaveScreenshot(OPTIONS);
}

operate getFullHeight() {
  return doc.documentElement.getBoundingClientRect().top;
}

Observe that we’ve additionally launched a ready situation, holding till there’s no community site visitors for some time in a crude try and account for stuff like lazy-loading photos.

Remember that capturing your entire web page is extra resource-intensive and doesn’t all the time work reliably: You may need to take care of structure shifts or run into timeouts for lengthy or asset-heavy pages. In different phrases: This dangers exacerbating flakiness.

Conclusion

A lot for that fast spike. Whereas it took extra effort than anticipated (I imagine that’s referred to as “software program improvement”), this may really clear up my unique drawback now (not a standard function of software program today). In fact, shaving this yak nonetheless leaves me itchy, as I’ve but to do the precise work of scratching CSS with out breaking something. Then comes the true problem: Retrofitting darkish mode to an present web site. I simply may want extra downtime.