Benchmark Testing
Running longitudinal usability tests

At Zapier, we utilized A/B testing as we shipped and iterated on features. While helpful, this focuses attention on single features and a few backstop metrics.
Benchmark usability testing helps you monitor macro-level changes to the user experience of a product.
Planning
Benchmark testing demands repeatabilty. You want to be able to compare results from one interval to another, in order to see how things are changing. This involved:
I. Tasks
Creating a set of tasks that any user could complete.
II. Rubric
To limit researcher bias, we developed a standard tagging taxonomy with strict definitions.
III. Tooling
We built tools to catalog observations.


Airtable was used to tally and code the responses.


Report
Based on Airtable summaries, it was easy to create a one-pager scorecard.

I followed this up with a report, describing the qualitative changes we saw. This provided a great opportunity to highlight recent changes that alleviated prior issues.

