Interviewing Questions

Hacker News likes to complain about interviewing, and, well, there is a lot to complain about. One of the recent links on the matter was from Kislay Verma “Competitive programming is useless”. Kislay decries an over-emphasis on competitive coding questions informing interview performance — i.e., asking increasingly obscure and trivial algorithmic and data structures questions rather than focusing on fundamentals and questions to engage and measure the breadth of the candidate’s experience and talent.

When I went to college, ACM Programming Competitions were the primary form of competitive competition. In contrast to today’s Hacker Rank and Leetcode, these were team events. Over a period of six hours, a team would share a single computer and seek to solve the most questions in the least amount of time. Questions were written in a narrative style, so contestants had to recognize what approach to take; these were not “implement methods for a priority queue” but solve a problem where a priority queue is a convenient and appropriate data structure. Questions were graded objectively by judging software; they were purely about generating the right output (within some processing time limit) rather than about code quality itself. In the lower competitions, there was usually one question that required graph or dynamic programming and the other questions were more straight-forward and (in my opinion) could have appeared as homework for sophomores and juniors. The higher competitions usually required combining multiple algorithms and targeted those who had mastered a senior-level algorithms course (2018 example). As I recall, these were usually open-book, although you weren’t allowed any digital sources.

In my experience, these competitions actually encouraged good form. A knowledge of idioms and the standard library reduced the amount of time writing code due to muscle memory and re-use. Since the team was restricted to a single computer, some teams used pair programmers or you might have to double up to debug a program, both of which encouraged good variable naming and functional decomposition.

That said, I’m sympathetic to the notion that tricky algorithmic questions do not yield good signal on candidates. Instead, I sought signal on how they would handle every day problems.

At Qualtrics, I’ve conducted over 300 interviews with candidates that ranged from interns to experienced (20+ years) hires. In the early days, I usually asked questions that required dynamic programming, but I eventually switched to questions drawn from my day-to-day experiences. (The Qualtrics interviewing philosophy is to ask “straight-forward” questions. This does not mean easy questions, but rather questions that do not involve any tricks or obscure knowledge. The blog post references an example clock-hand question; this question was a typical starter problem at the lower-level ACM competitions. Personally, I think both of the example questions provide too few opportunities for signal within a software developer interview.)

To provide good signal, a problem requires some algorithm applied to a non-trivial data structure or data object. I tried to target problems that required a screens worth of code in order to provide opportunities for the candidate to show how they would structure a solution (functional decomposition). As an example, one of my preferred questions required modeling a color value in rgb space. Since this was an internal data object, candidates had a lot of freedom in their choice. Their choice provided good signal on how they approached trade-offs between convenience, efficiency, and legibility.

I ran the interviews as open-book as this was also a source of signal. A typical sign of a poor candidate was using examples of code and attempting to copy-paste the code into something that worked; positive signal was a candidate that used the official API documentation and understood it. One candidate provided very positive signal after first following some code from StackOverflow, pausing, and then discussing why the code was bad and then changing their approach.

I preferred problems where the candidate had to fully run their code because it provided signal in how they debugged problems and dealt with compiler errors.

Once candidates completed the exercise (I tried to push candidates to a successful completion), I asked them to perform a code review of their code, and then I asked about the scaling characteristics of the code. I think the code review is an important part of the process as it 1) allowed candidates to acknowledge and suggest improvements to any bad code they might have written and 2) reveal signal about what they expect from good code. In some cases, I had to suggest a change in the code to obtain signal. Most candidates noted a lack of comments in their samples and thought naming could be improved (although rarely had any specific recommendations). This was weak signal. There was better signal through discussions of idiomatic compliance, memory lifecycles, and alternative approaches to data modeling and functional decomposition. Memorably, a few candidates provided excellent signal by describing how the compiler would interpret various approaches or compared solutions in multiple languages. (The best candidates had fun with the problem.)

My intent for asking about the scaling characteristics of their code was not to get a recital of Big-O (although many candidates provided that data anyway without me prompting) but rather a discussion on what to do if the inputs were millions or billions larger. I sought two different kinds of signal: 1) at what level of scale do candidates feel they need to complicate the solution 2) what range of options do candidates consider for scaling? For 1), poor candidates would reach for a distributed map-reduce solution when the number of inputs was in the hundreds of thousands (processing time per record was less than a millisecond). Better candidates would probe the use case (turning this coding exercise into a mini design exercise) or discuss the point where the gain from parallelization exceeded the required overhead. For 2), I found candidates could rarely differentiate themselves in discussing technical options but the “median” candidate could suggest one appropriate strategy or technology.

Ultimately, the pass-fail was based on a mixture of these various signals, rather than a single measure. For instance, I passed a number of candidates that failed to pass all tests within the interview time, if there was sufficient positive signal elsewhere. Good problems provide opportunities to gather a broad range of signal.