At Zapier, we utilized A/B testing as we shipped and iterated on features. While helpful, this focuses attention on single features and a few backstop metrics.
Benchmark usability testing helps you monitor macro-level changes to the user experience of a product.
Benchmark testing demands repeatabilty. You want to be able to compare results from one interval to another, in order to see how things are changing. This involved:
Creating a set of tasks that any user could complete.
To limit researcher bias, we developed a standard tagging taxonomy with strict definitions.
We built tools to catalog observations.
Airtable was used to tally and code the responses.
Based on Airtable summaries, it was easy to create a one-pager scorecard.
I followed this up with a report, describing the qualitative changes we saw. This provided a great opportunity to highlight recent changes that alleviated prior issues.