Prior Writings

2021-08-04

Between Honeywell and Qualtrics, I founded a company called ℤ→ℤ Technologies. The first thing I tried was a steganography plug-in for Firefox. The plug-in attempted to hide some text using a user-provided corpus to compute a statistical language model. I documented the early research in emulating tweets. Since that wasn’t fruitful, I pivoted towards an automated statistical analysis service based on the idea of switching the problem from a high-skill process of selecting what processing to do to a low-skill process of searching through the results for something valuable. (I never found a solution to prevent this from being p-hacking as a service.) When I found that integration tests were taking far longer than on a single computer, simple batch job system, I developed a distrust in big data technologies that required high startup costs.

The analysis service project led to two side-efforts: a functional reimplementation of the square treemap visualization and a Python library for inferring the format of a date. (Although I forgot about this library and failed to support it for seven years, it has seen a number of forks and improvements by others.)

At Qualtrics, I wrote two articles for our engineering blog. One of the more popular articles was about using wargames for training. A wargame, in this sense, is a simulation of one or more scenarios which requires the trainees to respond using real-world tools. For example, causing high load on a box and having the trainees diagnose and kill the malignant process. I was often asked if we still used this practice. The answer is yes, multiple teams did create artificial outages or degraded situations in our staging environments to train new on-call engineers. Over time, the training tended to become more specialized. As the Qualtrics infrastructure moved from virtual machines to orchestration environments, we stopped training for certain failure modes. For instance, operators did not need to train on restarting processes as the orchestrator handled that for them.

The less popular article was a discussion on design of experiments, a technique I learned and used at Honeywell. Design of Experiments is a way to find what parameters matter most for a system. DoE is meant to be used by humans, not computers, so many features of the practice are meant to keep the costs down. (DoE is often used in domains with destructive testing.) Those in the machine learning community would see this as a type of hill-climbing algorithm.

Although I was not the author, one of my co-workers wrote a detailed article on our language processing and indexing pipeline which serves as the de-facto public technical document on the subject. As such, it is worthy of another link.