Page Ranges

In my earlier post on interviewing, I discussed aspects of a coding interview question that I thought provided better signal on the candidate than algorithmic-focused questions. In this post, I’ll provide an example fairly open-ended interview question, a solution, and a transcript between the interviewer and interviewee as an example for analyzing signal.

Page Ranges, a Question

When a user prints a document, they might not want to print every page. The user interface allows them to write a ‘page range specification’ to express the range of pages to print. For example, 1-3, 5 will print pages 1, 2, 3, and 5. Your task is to write code that will:

  1. Parse a string containing a page range specification
  2. Report if the string is valid or invalid
  3. Include functionality to report if a page should be printed or not

You do not need to write unit tests, debug your code, or write documentation.

After you write your code, you will perform a code review of the code. Provide positive and negative feedback. Requirements, design, and implementation are all fair game.

Detailed Specs

Pages may be numbered from 1 to 999999. Your program may be given the maximum number of pages within the document, which will not exceed 999999.

Ranges are inclusive, i.e. 1-3 means print page 1, 2, and 3.

A valid specification string will only contain white space, digits, and commas. Commas separate ranges only; numbers do not include separators.

We will assume the code will be part of a library or a larger program, so you do not need to write a main or the calling code.

A Solution

In Scala:


import scala.util.Try

trait PageRange {
  def contains(pageNo: Int): Boolean
}

case class DiscretePageRange(ranges: Vector[Range]) extends PageRange {
  override def contains(pageNo: Int): Boolean = {
    ranges.exists { _.contains(pageNo) }
  }
}

object PageRange {
  private val singleNumber = """\s*([0-9]+)\s*""".r
  private val pairNumber = """\s*([0-9]+)\s*-\s*([0-9]+)\s*""".r

  private def checkRange(page: Int, maxPageNo: Int): Unit = {
    if (page < 1) {
      throw new Exception("page must be at least 1")
    }
    if (page > maxPageNo) {
      throw new Exception(s"page must be equal or less than $maxPageNo")
    }
  }

  def from(spec: String, maxPageNo: Int): Try[PageRange] = Try {
    val ranges = spec.split(',').map { s =>
      (pairNumber.findFirstMatchIn(s), singleNumber.findFirstMatchIn(s)) match {
        case (Some(pair), _) =>
          val pageL = Integer.parseInt(pair.group(1))
          val pageR = Integer.parseInt(pair.group(2))
          checkRange(pageL, maxPageNo)
          checkRange(pageR, maxPageNo)

          Range.inclusive(pageL, pageR)
        case (_, Some(single)) =>
          val page = Integer.parseInt(single.group(1))
          checkRange(page, maxPageNo)

          Range.inclusive(page, page)
        case (None, None) =>
          throw new Exception(s"Bad format for $s")
      }
    }.toVector

    DiscretePageRange(ranges)
  }
}

Transcript

(I is the interviewer, S is the subject or interviewee)

I: Now that you have a solution, can you act as a code reviewer? Consider not only the code, but also the design and requirements.

S: From the requirements side, there seems to be an assumption that the caller will query each page of the document. A more natural API may be for the caller to get an iterator that returns each page number to be printed.

S: Also, the specification is very open. Ranges may overlap, may be in any order, and there is no constraint on the length. If the interface was more restrictive, users might be better protected from their own mistakes.

S: From a class design perspective, we have an interface, a concrete implementation, and a companion object to handle construction. Since the strings may be invalid, the companion object is an idiomatic way to separate the construction concerns from the ‘runtime’ concerns. Having an interface may be overkill; it is uncertain if there would be need for multiple implementations.

I: The user interface may provide convenience options to print all even pages or odd pages. Does that influence your opinion?

S: Maybe. Having odd and even methods on the companion object can return the same concrete class, as Range supports steps — odd() is Range(1, maxPageNo, 2). Specializing the class for a vector with a single element would just eliminate a single memory indirection.

I: Okay, what else in the code review?

S: The contains method does a linear search of the ranges for the first inclusive item, or returns false. Positively, the code is very short and direct. Negatively, this can be a O(P*N) algorithm where P is the number of pages in the document and N is the number of ranges.

S: One solution would be to change the interface to an iterator like design so it just returns the next valid page. That’ll be linear. Another solution would be to sort the ranges and use a binary search, although if the number of ranges is small the cost will be effectively the same. Third, we could change the data structure to allow a constant time lookup. A bitset, where each page is represented by a single bit, would take very little memory and support efficient look-ups and scans. Fourth, there are data structures meant for storing ranges, but that seems overkill for this application unless we had one at hand.

I: How about the construction code?

S: Since this is user-facing code, the error messages don’t say how to fix the problem and don’t include information about the location of the error. They can be improved, but rich error reporting is hard to include in an hour of coding.

S: In Scala, companion objects often build the object via an apply() method so it will look like normal object creation. In this case, since we are returning a Try of the object, to signify that construction may fail, I think it is appropriate to use a named function instead.

S: The parsing code goes through the string multiple times — we first split, and then apply two regular expressions to each character. The regular expression could be written to handle all the groups in a single invocation, although that might become hard to read. The [0-9] in the regular expressions could be replaced by \d. The regular expressions could also be dropped and the code could use a simple state machine. It would be more code, but error reporting could be better since it would retain all the context. A parser library could also be used, but adding a dependency to handle this simple of a format is likely overkill, unless we already had it.

S: Storing a vector of ranges is pretty intensive memory-wise. There are more efficient ways to store the data; bitsets or even arrays of numbers, but page ranges are likely going to be short-lived and, outside of very long examples, we’re talking hundreds of kilobytes or less of RAM.

I: To summarize, how well does this code meet the requirements and use case?

S: Overall, the requirements and use case don’t seem that performance sensitive, so my code review would focus on UX, clarity, and maintainability. If this was meant for a feature that the business wanted to invest a lot in to be very polished, then we’d want more specific error messages. That could lead to a change in approach for parsing. Similarly, if the use case routinely handled very complex page ranges, then performance may become important. Otherwise, the design is fairly modular and separates concerns. Unit testing will be straight-forward and it doesn’t involve anything outside the standard library. So, it’s close to approved.