This article is part 3 of the series itch v25 postmortem:
We’ve just taken a look at the development for v25 of the itch app, so it’s time to take a look at some testing infrastructure.
When I started working on the app, I was happy to get anything running at all. But it became clear pretty quickly that without proper testing, regressions would keep happening.
But unit testing that beast is pretty hard - for three major reasons:
- It’s all written in TypeScript
- A lot of it assumes we have an UI present (React components etc.)
- A lot of it assumes we’re running in an electron environment
- (which has additional APIs compared to regular node.js / a regular browser)
The TypeScript meant, for example, that at some point, the entire app’s source was compiled before running tests. Unit tests should be fast, but that incurred a fixed cost of.. a few seconds, at best.
For a long while I tried to make the tests happy with a regular node.js environment, so that it would be easy to run on CI. So I spent a bunch of time stubbing APIs - returning fixed paths, turning GUI operations into no-ops, etc.
I tried to unit test React components at some point, because some tools showed great promise in doing so - but it ended up being a big waste of time. So much time wasted trying to make this work.
Also, I’ve had problems with almost all the unit test solutions I tried with:
- Collecting code coverage
- (the amount of code that’s actually executed while tests are running)
- Getting proper stack traces
- Computing code coverage for the original TypeScript code.
The way most code coverage tools work is: they transform the code to “instrument” it. They’ll have maybe a global object, that keeps track of all lines/statements/branches executed. Then the code gets littered with writes to that object. That means we now have to deal with three forms of the codebase:
- A) The original TypeScript (what I write)
- Which has a sourcemap for…
- Which has a sourcemap for…
And back then, TypeScript was still pretty fresh, and its ecosystem was really separate from babel’s1. Tests were run by node.js in C) form, so both stack traces and coverage info needed to be transformed twice to make sense.
I ended up going pretty deep into this - I wrote a custom test runner tapping into nyc’s undocumented APIs directly, and using third-party libraries to merge source maps together.
So, I had somewhat slow unit tests, that sometimes broke without a proper stack trace, and inaccurate coverage info. Great!
Three things ended up happening.
First, I threw away most of the tests, because most of the code ended up rewritten in Golang as part of butler anyway. Since I had refined that code a bunch of times already (and had gotten increasingly frustrated with node.js APIs), I had a good idea how to make it happen in a more modular fashion - which is easier to test.
Second, I stopped trying to mock electron APIs. Electron has its own unit tests, so it’s not up to me to check that they work as advertised (I find bugs occasionally, report them, and document workarounds, but those are edge cases). The parts I need to unit test shouldn’t rely on GUI at all: they are number and currency formatting routines, state transitions, internationalization, stuff like that. We can do that in regular node.js.
Third, I threw away all the custom testing machinery and fell back to mocha and chai, using ts-node to seamlessly compile the required TypeScript files. While I was busy doing something else, those had improved to a point where they would all work together, even letting me test asynchronous code. The output is not quite as nice as my custom runner was, but I’m always happy to throw out custom code when it becomes unecessary.
Today, there’s only a handful of unit tests left, and they’re all pretty simple:
Other components however, all have their own test suite! wharf, which powers all the diffing and patching throughout itch.io infrastructure, was test-driven from the start - you can’t mess with data integrity. The same goes for savior, which handles resumable file decompression - it was a hard problem to solve, but it was also very clearly delimited, so it was easy to test, and test well.
I haven’t talked about Golang unit tests a lot here, but there’s not much to say: the testing facilities come with the language, and they’re adequate. Everybody’s doing it, coverage tooling is there, there’s even a race detector, tests are very quick to run2. It’s just good.
I mentioned regressions earlier - it’s not uncommon for a feature to break while all unit tests are still green. For that, you have integration tests.
I started with a few assumptions, or rather, admissions:
- Integration tests will be slower to run. It’s not something you can trigger every time you save in your code editor.
- Integration tests won’t cover everything. You cover the happy path (what users do with the app 90% of the time), and the rest is careful development and QA rounds.
- Integration tests will be flaky on occasion. There’ll be network connectivity drops, there’ll be full disks, CI workers will run out of RAM, someone changed a setting - it happens.
I started out by using spectron, the recommended toolkit for writing electron integration tests. I was expecting some flakiness, but not that much. Sometimes, it failed to start the application altogether. Some other times, a test would just hang forever. When it timed out, the application wasn’t closed properly, mostly on Windows.
Soon, I was spending more time reading spectron & friend’s source to determine what could possibly causing these intermittent failures, than I was writing integration tests, or, you know, actually developing the application.
So I decided to throw everything away and use only parts I understood. Electron is powered in part by Chrome, which is a browser. Most browsers can be controlled with WebDriver, which is a W3C standard. They usually need a third-party program that translates WebDriver commands to their own protocol - for Chrome, that’s chromedriver. In fact, electron releases come with their own chromedriver builds3.
WebDriver is just standard commands over http, so you speak it with any programming language. I picked a clean WebDriver client implementation in Golang, and went on my merry way. Sure, I had to fork it at some point to add some features I needed, but doing so was much easier than just using spectron4.
Writing the test runner required a bit of research, but everything made sense. First we spin up chromedriver, connect to it, create a session and specify how it should launch our app. Then we can send commands, get their results, and so on. When you’re the one sending commands, timeouts are easy to do. When you’re the one controlling the driver, it’s easy to shut down. It’s easy to take screnshots on failure. It’s not buried deep into a dependency of a dependency of a dependency, two of which are compatibility layers.
Also, I was finally able to take care of the “zombie process” problem I’d been experiencing. When you start a process with node.js on Windows, it has its own process group. Well, on Windows, they’re actually called Job Objects. If the parent dies… the child keeps on living (by default). This caused additional issues because the app is single-instance: if a test failed and left a process behind, the next pipeline would also fail because it would try to wake up the process from the previous run.
But creating process is like 30% of my job, so I had already encountered job objects, and written Go code to use them properly. Since my new runner was also written in Go, it was trivial to re-use that code.5
Nowadays, our integration tests are pretty stable. Pipelines fail more often while running webpack than they do performing webdriver magic, which I feel is a pretty solid accomplishment.
Another thing I was able to do was run integration tests on packaged versions of the app. In other words, it’s the final, ready-for-release version of the app that’s being tested. Not the weird development environment with its peculiarities.
After a long and painful stint with Jenkins, we’ve been happy users of Gitlab CI for a while now. For itch particular, on every commit we:
- Compile TypeScript code once
- Run a custom script that checks all the i18n translation strings we use are defined
- Run unit tests on all 3 platforms
- Package the app for all 5 platform/arch combos, and run integration test for each of these
When integration tests fail, screenshots are uploaded as artifacts, and we can view them directly from Gitlab CI’s interface. The integration tests are directly in the ‘package’ phase, and that entire phase usually finishes under 3 minutes.
I haven’t managed to make integration tests work headless - so they still need some
sort of graphical server running. That was no issue for the Windows CI workers, nor
for the macOS CI workers, as long as gitlab-runner is started from a graphical session.
xvfb-run does the job quite well. Beware, though - it defaults to a measly 800x600!
QA / Human testing
On top of all that, we obviously run the app ourselves. Jesus and I do most of the heavy lifting. We use a mix of personal computers, virtual machines, remote physical machines, and baiting friends into trying new features first.
We find it useful to have a two completely separate versions of the app installed alongside (much like Chrome and Firefox have three channels). In our case, the beta version is called kitch - and if you’ve been anticipating itch v25’s release, chances are you’ve been running it!
It’s a completely separate app, with its own name, shortcut, icon, settings, library, etc. It doesn’t clash with the stable app at all, which is perfect for what we want.
The machines we test on are not necessarily beefy - I often run the app on a 2011 Lenovo X200 running Lubuntu (from which I’m writing this article), and I was happy to find that itch v25 runs quite well on it!
Issue reports come to us via.. so many channels. Some people mention me or @itchio on Twitter, or they go for direct messages6. Some people manage to find the right GitHub issue tracker and open an issue there (thank you!). Some people shoot at our support e-mail, or track down my personal email somehow. Some people use the built-in feedback system, which includes (if they let it) information about their system, and detailed logs - this is probably the most efficient way to get something resolved. And then there’s the good old “by the way…” at real-life events when you’re trying to listen to a conference speaker.
But it’s all good! In the end itch v25 shipped, and we couldn’t have done it without all our testers and all our contributors, some of which are listed on this page. I definitely owe all of you a drink, so please start organizing yourselves to avoid scheduling conflicts.
I’m not sure what other postmortem articles I’m still going to write for v25, but there’s a lot of ground I haven’t covered yet. If you have suggestions, please let me know by replying to the social media post on which you found this article :)
- Things have changed since, Babel 7 comes with a transformer for TypeScript code! [return]
- Unless you bring cgo into the mix. But that exception holds true for pretty much all of Golang’s benefits. It’s safe, unless cgo. It’s portable, unless cgo. etc. [return]
- Even though, most of the time, the stock chromedriver releases from Google work fine with electron. You just have to be careful about versions: electron is consistently behind by a few Chrome versions, so if you just grab the latest chromedriver, you’re in for a few surprises. [return]
- Maybe it’s my fault, maybe I’m weird, I don’t know. I just couldn’t get spectron to behave. I don’t mean to discredit the developers’ effort here. [return]
- It’s all in github.com/ox, if you want to take a look! [return]
- Even though I’ve disabled Twitter notification months ago! GitHub is a much better way to reach me. [return]
This article is part 3 of the series itch v25 postmortem:
If you liked this article, please support my work on Patreon!