Monthly Archives: March 2014

Regarding Survey Paths

The key invariant we wish to preserve throughout the survey is that an individual’s answers are the same, no matter which version of the survey they see. That is, if the same respondent were to take the survey twice, beginning to end, the total number of questions seen and answers given would remain the same. We’ll see in this post how this invariant influences enumerating and computing metrics on paths. In a later post, we’ll see how this invariant leads us to analyze survey “bugs.”

We use paths through the survey to aid in the debugging process and to automate some parameters to the crowdsourcing backend. Let’s take a moment to look at how some of the design choices we’ve made impact possible paths through the survey:

Top-level blocks

At the top level, a survey is composed of blocks. If there are no blocks (i.e. the survey is completely flat), we can say that all questions belong to a single block.

Top-level blocks have the following branch requirements:

  1. There is only one “true” branch question per block.
  2. Branch question destinations must all be top-level blocks.

If there is a branch question, it may appear in any position, subject to any blocking constraints. The branching action isn’t evaluated until every question that we plan to execute in the block is executed.

Randomizable blocks

Top-level randomizable blocks are not allowed to have branch questions. They also cannot be branch destinations. Permitting branching to or from randomizable blocks could cause us to have a loop in the program. For example, if block 3 branches to some randomizable block we’ll call X, and X happens to appear before block 1, we have a problem, since we will have already executed X. Similarly, if we branch from X to 3 and X appears as the last block in a survey, we will also have a loop.

In addition to violating our DAG guarantee, branching to or from randomizable blocks could cause nondeterminism in the number of questions asked. If block 3 branches to X, but X is the last block in the survey for one person, but immediately follows 3 for another, we will be allowing randomness to have an influence on the answer set.

Note that if we are using branching for sampling, the above doesn’t apply.

On the other hand, randomizable subblocks may be a branch source. If their branch destination is ALL, we have no conflicts, since we will choose the next (randomly determined) sequential question as usual. If their branch paradigm is ONE, then their branch question becomes the one branching question for the topmost enclosing block. This is exactly the same as for non-randomizable subblocks.

Enumerating paths

We stated at the beginning of this post that answer sets should be invariant for the version of the survey seen. In the previous post, we described how path data can be used for static analysis.

When we view a path through the survey as the series of questions seen and answered, the computation of all such paths is intractable. Consider a short survey we recently ran that had 16 top-level randomizable blocks, each having four variants and no branching. We have 16! total block orderings. Then, since we choose from four different questions, we end up with $$2^{16!}$$ unique paths.

Fortunately, the computations we mentioned in the static analysis post do not require knowing the contents of the path. They only require that we know the size of the path. Since we have required that every appropriate question in a top level block be executed, we can treat top level blocks as nodes in the possible survey DAG. The weight of these nodes (i.e. the size of the block) can be computed statically. Then we only need to consider the branching between these top level blocks.

Static Analysis

Tell a PL researcher that you have a new programming language and one of the first things they want to know is whether you can do static analysis on it. Lucky for us, we can! But first, some background in this space:

Prior work on static analysis for surveys

I mentioned earlier that there is some prior work on using programming languages to address survey problems.

Static analysis in Topsl is constrained by the fact that Topsl permits users to define questions whose text depends on the answers to previous questions. Matthews is primarily concerned that each question is asked. While it can support randomization, this feature belongs to the containing language, Scheme, and so is not considered part of of the Topsl core. Although the 2004 paper mentions randomization as a feature that can be implemented, there is no formal treatment in any of the three Topsl-related publications.

If we also consider type-checking answer fields or enforce constraints on responses as static analysis, then most online survey tools, Blaise, QPL, and probably many more tools and services perform some kind of static analysis.

Static Analysis in SurveyMan

Surveys are interpreted by a finite state machine implemented in Javascript. To ensure that we don’t cause undefined behavior, we must check that the csv is well-formed. After parsing, the SurveyMan runtime performs the following checks:

  • Forward Branch Since we require the survey to be a DAG, all branching must be forward. We can do this easily by comparing branch destination ids with the ids of the branch question’s containing block. This check takes time linear in the block’s depth.
  • Top Level Branch At the moment, we only allow branching to top level blocks. This check is constant.
  • Consistent Branch Paradigms Having one “true” branch question per top level block is critically important for our survey language. Every block has a flag for its “branch paradigm.” This flag indicates whether there is no branching in the block, one branch question, or whether the block should emulate sampling behavior. This check ensures that for every block, if it has a parent or siblings the following relations hold:

    My Branch Paradigm(s)Parent's Branch Paradigm(s)Sibling's Branch Paradigm(s)Children's Branch Paradigm(s)Additional Sibling Constraint
    NONENONE, ONENONE, ONE, SAMPLENONE, SAMPLEIf my parent is ONE, then only one of my siblings is ONE.
    SAMPLENONE, ONENONE, ONE, SAMPLE--If my parent is ONE, then only one of my siblings is ONE.

    I cannot have children.
    ONEONENONE, SAMPLEONE, SAMPLE, NONEIf I immediately contain the branch question, then my children cannot be ONE. Otherwise, exactly one of my descendants must be ONE.

    We use the following classification algorithm to assign branch paradigms:

    function getAllQuestionsForBlock
      input : a block
      output : a list of questions
    begin
      questions <- this block's questions
      if this block's branch paradigm is SAMPLE
        q <- randomly select one of questions
        return [ q ]
      else
        qList <- shuffled questions
        for each subblock b in this block
        begin
          bqList <- getAllQuestionsForBlock(b)
          append bqList to qList
        end
        return qList
      fi
    end


    function classifyBlockParadigm
      input : a block
      output : one of { NONE, ONE, SAMPLE }
    begin
      questions <- this block's questions
      subblocks <- this block's subblocks
      if every q in questions has a branch map
        branchMap <- select one branch map
        if every target block in branchMap is NULL
          return SAMPLE
        else
          return ONE
      else if one q in questions has a branch map
        return ONE
      else if there is no q in questions having a branch map
        for each b in subblocks
        begin
          paradigm <- classifyBlockParadigm(b)
          if paradigm is ONE
            return ONE
        end
        return NONE
      fi
    end

    The ONE branch paradigm gets propagated up the block hierarchy and pushes constraints down the block hierarchy. The SAMPLE branch paradigm pushes constraints down the block hierarchy, but has no impact on parent blocks. Finally, NONE is the weakest paradigm, as it imposes no constraints on its parents or children. All blocks are set to NONE by default and are overwritten by ONE when appropriate.

  • No Duplicates We check the features of each question to ensure that there are no duplicate questions. Duplicates can creep in if surveys are constructed programmatically.
  • Compactness Checks whether we have any gaps in the order of blocks. Block ordering begins at 1. This check is linear in the total number of blocks. (This check should be deprecated, due to randomized blocks.)
  • No branching to or from top level randomized blocks We can think of the survey as a container for blocks, and blocks as containers for questions and other blocks. The top-level blocks are those immediately contained by the survey. While we permit branching from randomizable blocks that are sub-blocks of some containing block, we do not allow branching to or from top-level randomized blocks. The check for whether we have branched from a top-level block is linear in the number of top-level blocks. The check for whether we try branching to a top-level randomizable block is linear in the number of questions.
  • Branch map uniformity If a block has more than one branch question, then all of its questions must be branch questions. The options and their targets are expected to be aligned. In the notation we used in a previous post, for a survey csv $$S$$, if one question, $$q_1$$ in the block spans indices $$i,…,j$$ and another question $$q_2$$ spans indices $$k,…,l$$, then we expect $$S[BLOCK][i] = S[BLOCK][k] \wedge … \wedge S[BLOCK][j] = S[BLOCK][l]$$.
  • Exclusive Branching Branching is only permitted on questions where EXCLUSIVE is true.

The above are required for correctness of the input program. We also report back the following information, which can be used to estimate some of the dynamic behavior of the survey:

  • Maximum Entropy
    For a survey of $$n$$ questions each having $$m_i$$ responses, we can trivially obtain a gross upper bound on the entropy on the survey : $$n \log_2 (\text{max}(\lbrace m_i : 1 \leq i \leq n \rbrace))$$.
  • Path Lengths
    • Minimum Path Length Branching in surveys is typically related to some kind of division in the underlying population. In the previous post, we showed how branching could be used to run two versions of a survey at the same time. More often branching is used to restrict questions by relevance. It will be appropriate for some subpopulations to see certain questions, but not others.

      When surveys have sufficient branching, it may be possible for some respondents to answer far fewer questions than the survey designer intended — they may “short circuit” the survey. Sometimes this is by design; if we are running a survey but are only interested in curly-haired respondents, we have no way to screen the population over the web. We may design the survey so that answering “no” to “Do you have curly hair?” sends the respondent straight to the end. In other cases, this is not the intended effect and is either a typographical error or is a case of poor survey design.

      We can compute minimum path length using a greedy algorithm:

      function minPathLength
        input : survey
        output : minimum path length through the survey
      begin
        size <- 0
        randomizableBlocks <- randomizable top level blocks
        staticBlocks <- static top level blocks
        branchDestination <- stores block that we will branch to
        for block b in randomizableBlocks
        begin
          size <- size + length(getAllQuestionsForBlock(b))
        end
        for block b in staticBlocks
        begin
          paradigm <- b's branch paradigm
          if branchDestination is initialized but b is not branchDestination
            continue
          fi
          size <- size + length(getAllQuestionsForBlock(b))
          if branchDestination is initialized and set to b
            unset branchDestination
          fi
          if paradigm = ONE
            branchMapDestinations <- b's branch question's branch map values
            possibleBlockDestinations <- sort branchMapDestinations ascending
            branchDestination <- last(possibleBlockDestinations)
          fi
        end
        return size
      end

    • Maximum Path Length Maximum path length can be used to estimate breakoff. This information may also be of interest to the survey designer — surveys that are too long may require additional analysis for inattentive and lazy responses.

      Maximum path length is computed using almost exactly the same algorithm as min path length, except we choose the first block in the list of branch targets, rather than the last.

      We verified these algorithms empirically by simulating 1,000 random respondents and returning the minimum path.

    • Average Path Length Backends such as AMT require the requester to provide a time limit on surveys and a payment. While we currently cannot compute the optimal payment in SurveyMan, we can use the average path length through the survey to estimate the time it would take to complete, and from that compute the baseline payment.

      The average path length is computed empirically. We have implemented a random respondent that chooses paths on the basis of positional preferences. One of the profiles implemented is a Uniform respondent; this profile simply selects one of the options with uniform probability over the total possible options. We run 5,000 iterations of this respondent to compute the average path length.

      Note that average path length is the average over possible paths; the true average path length will depend upon the preferences of the underlying population.

Creating Sophisticated Online Surveys : A Case Study.

As discussed in yesterday’s post, the SurveyMan language allows sophisticated survey and experimental design. Since we’re about to launch the full version of Sara Kingsley’s wage survey, I thought I’d step through the process of designing the branching and blocking components.

The survey in question is completely flat (i.e. no blocking, no branching) and contains a variety of ordered and unordered radio (exclusive) and checkbox (not exclusive) questions. We would like to be able to run a version that is completely randomized and a version that’s totally static and based on conventional survey methodology. Ideally we’d like to be able to run both surveys under the same set of conditions.

We could run both versions of the survey at the same time on AMT. However, we’d run into trouble with people who try to take it twice. To solve this problem, we could run one version, issue “reverse qualifications” a la Automan to the people who answered the first one, and then run the second version. This would entail some extra coding on the side, and while support for longitudinal studies and repeated independent experiments would require keeping a database of past participants and flagging them as uniquely qualified or disqualified for a survey, this feature is not currently supported in SurveyMan. What we’d really like to do, though, is run the two surveys concurrently, in the same HIT such that the process looks something like this :

Fortunately for us, we can implement both versions of the survey as a single survey using the underlying language.

Let’s start by looking at what the two versions would look like if we were to run them as separate surveys :
twoVersions2
The survey on the left groups all of the questions into one block. The survey on the right places each question into its own block. In order to make the survey on the right truly full static, we would also need to set the RANDOMIZE column to false.*

As we’ll see in my future blog post on the runtime system, the evaluation strategy forces all paths in the survey to join. If there isn’t a semantically meaningful way to join, this can be done trivially by creating a final block that contains an instructional “question” (e.g. “Thank you for taking the time to complete our survey”) and a submit button.

Now, in order to represent two separate paths, we will need to place the ordered survey questions inside a container block:
twoVersions3

The three orange blocks are top-level: let the fully randomized version be Block 1, the fully static version be Block 2, and the joining “question” Block 3. If we were to simply annotated them according to this scheme, we would end up having every respondent take two versions of the survey. We now need some way of ensuring that the two versions of the survey represent parallel paths through the survey such that the expected number of respondents through each path is 50% of the sample size.

Let’s work backwards from Block 3. Assume that the respondent is already on one of the parallel paths. We can ensure that Block 3 follows immediately from both Block 1 and Block 2 by using the BRANCH column. We add branching to Block 3 for every options in the last question of the fully static survey and do the same for that question’s equivalent in the fully randomized survey (both of which are not contained in different blocks in our current survey). Clearly we will branch to Block 3 after answer the last question in the fully static version. Because we require that the respondent answer every question in a block before evaluating a branch, the respondent will also answer every question in the randomized version.

twoVersions4

Finally, we need to be able to randomly assign respondents to each of the paths. At first, I thought this would be a cute example of randomized blocks, where we could change Block 1 to Block _1. However, this would not give us an even split in the sample population. There are three possible orderings of the blocks : [_1 2 3], [2 _1 3], [2 3 _1]. 2/3 of the population would then see Block 2 first.**

We could instead overwrite the setFirstQuestion method of the SurveyMan Javascript option in custom.js to randomly assign the first question in the appropriate block.

We could also just add what we’ll call a fork question at the beginning of the survey, asking them to choose between two options:twoVersions5

* Looks like I finally found a use-case for that RANDOMIZE column — the experimental control! Maybe I shouldn’t deprecate after all…
** As an aside, I want to point out that this particular example illustrates a case for Presley‘s suggestion that we interpret the available indices for randomization over the indices of the blocks marked as randomizable. Here we would have marked both Block 1 and Block 2 as randomizable. I’m still not sure if this would be overly constrained and will discuss further in a future blog post.