Monthly Archives: March 2014

On Keeping the Survey a DAG

A topic that came up during my SurveyMan lab talk in October was our lack of support for looping questions. Yuriy had raised the objection that there will be cases where we will want to repeat a question, such as providing information on employment. We argued that, since we were emulating paper surveys (at the time), the user could provide an upper bound on the number of entries and ask the user whether they wanted to add another entry for a category. A concern I had was that, since we’re interested in role of survey length in the quality of responses, and since we allow breakoff, when we have a loop in a question, it becomes much more difficult to tell whether the question is a problem or if the length of the survey is a problem. Where previously we treated each question as a random variable, we would now need to model a repeating question as an unknown sum of random variables.

The probability model of a survey with a loop differs from the model of a survey without one.

The probability model of a survey with a loop differs from the model of a survey without one. Note that while both random variables corresponding to the responses to question Q2 may be modeled by the same distribution, they will have different parameters.

This issue came up again during the OBT talk. The expanded version of Topsl that appeared in the PLT Redex book described a semantics for a survey that was allowed to have these kinds of repeated questions.

We do not think it is appropriate to model such questions as loops. Loops are fundamentally necessary to express computable functions. Since the kinds of questions these loops are modeling are more accurately described as having finite, unknown length, we do not want to encode the ability to loop forever.

Aside from this semantic difference, we see another problem with the potentially perpetual loop. Consider the use-case for such a question: in the case of the lab talk, it was Yuriy’s suggestion that we allow people to enter an employment history of unknown length. In the case of Topsl, it was self-reporting relationship history. If a respondent’s employment or relationship history is very long, they may be tempted to under-report the number of instances. This might be curtailed if the respondent is required to first answer* a question that asks for the number of jobs or relationships they** have had. Then responses in the loop could be correlated with the previous question, or the length of the loop could be bounded. In our setting, where we do not respondents to skip questions, the former would need to be implemented if we were to allow loops at all.

Alternatively, instead of presenting each response to what is semantically the same question as if it were a separate question, we could first ask the question for the number of jobs or relationships, and then ask a followup question on a page that takes the response to the previous question, and displays that number of text boxes on the page. We would still bound the total number of responses, but instead of presenting each question separately, we would present them as a single question.

In the analysis of a survey we ran, we found statistically significant breakoff at the freetext question. We’d like to test whether freetext questions in general are correlated with high breakoff. If this is the case, we believe it provides further evidence that the approach to “loop questions” is better implemented using our approach.

* I just wanted to note that I love splitting infinitives.
** While I’m at it, I also support gender-neutral pronouns. Political grammar FTW!

Observational Studies, Surveys, Quasi-Experiments, and Experiments

Across the sciences, researchers use a spectrum of tools or “instruments” to collect information and then make inferences about human preferences and behavior. These tools vary in the degree of control the researcher traditionally has had over the conditions of data collection. Surveys are an instance of such an instrument. Though widely used across social science, business, and even in computer science as user studies, surveys are known to have bugs. Although there are many tools for designing web surveys, few address known problems in survey design.

They have also traditionally varied in their media and the conditions under which they are administered. Some tools we consider are:

Observational studies

Allowing no control over how data are gathered, observational studies are analogous to data mining — if the information is not readily available, the researcher simply cannot get it.

Surveys

The next best approach is to run a survey. Surveys have similar intent as observational studies, in that they are not meant to have an impact on the subject(s) being studied. However, surveys are known to have flaws that bias results. These flaws are typically related to the language of individual survey questions and the structure and control flow of the survey instrument itself.

True Experiments

If a research is in the position of having a high degree of control over all variables of the experiment, they can randomly assign treatments and perform what is known as a “true experiment”. These experiments require little modeling, since the researcher can simply using hypothesis testing to distinguish between effect and noise.

Quasi-Experiments

Quasi-experiments are similar to true experiments, except they relax some of the requirements of true experiments and are typically concerned with understanding causality.

In the past, there has been little fluidity between these four approaches to data collection, since the media used to implement each was dramatically different. However, with the proliferation of data on the web and the ease of issuing questionnaires on such platforms as facebook, SurveyMonkey, and Mechanical Turk, the implementation of these studies of human preferences and behavior have come to share many core features.

Despite similarities between these tools, quality control techniques for experiments have been largely absent from the design and deployment of surveys. There has been an outpouring of web tools and services for designing and hosting web surveys, aimed at non-programmers. While there are some tools and services available for experiments, they tend to be domain-specific and targeted to niche populations of researchers. The robust statistical approaches used in experimental design should inform survey design, and the general, programmatic approaches to web survey design should be available for experimental design.

Surveys : A History

What *is* a survey?

Everyone has seen a survey — we’ve all had the customer satisfaction pop-up on a webpage, or have been asked by a college student PIRG worker to answer some questions about the environment. We tend to think of surveys as a series of questions designed to gauge opinion on a topic. Sometimes the answers are drawn from a pre-specific list of options (e.g. the so-called Likert Scales); sometimes the response is free-form.

What distinguishes surveys from other, similar “instruments” is that surveys (a) typically return a distribution of valid responses and (b) surveys are observational. Or rather, surveys are supposed to be observational. That is, surveys are meant to reveal preferences or underlying assumptions, behaviors, etc. and not sway the respondent to answer in one way or another. There are some similar-looking instruments that are not meant to be observational. Some of these instruments fall under the umbrella of what’s called an “experiment” in the statistics literature.

Why surveys?

Perhaps in the future we won’t have a need for surveys anymore — all of our data will be floating around on the web, free to anyone who wants to analyze it. If, after all, surveys are really about observational studies, we should be able to just apply some clustering, learn a model, do some k-fold validation, etc.

There are many problems when attempting to just use data available in the wild. First of all, though we may be in the era of “big data,” there are plenty of cases where the specific data you want is sparse. A worse situation is when the sparsity can be characterized by a Zipfian distribution – depending upon how you set up your study and what your prior information is, it’s possible that you will never sample from the tail of this distribution and may never know that it is sparse. This leads us to a second problem with simply mining data : we cannot control the conditions under which the data is obtained. When conducting a survey, researchers typically use probability sampling (the popularity of convenience sampling for web surveys will be discussed in a later post). This allows them to adequately estimate the denominator and estimate error due to people opting out of the survey (so-called “unit nonresponse”).

Finally, conducting a survey explicitly, rather than mining data gives the researcher a more complete view of the context of the survey. As we will discuss later, understanding context is critical and can lead to unpredictable responses.

A brief history of survey modes

While simple surveys such as a census have been around for thousands of years, the customer satisfaction or politic survey of today is a more recent development. Market research and political forecasting are products of capitalism and require access to resources to conduct and make use of surveys. Survey research is intrinsically tied to the technologies used to conduct that research and the statistical methodologies that are available and understood gat the time the survey is conducted. Before mail service, a survey would have to have been conducted in person. Although random sampling is a very old idea, it was not until Laplace that tight bounds were calculated on the number of samples needed to estimate a parameter of a population.

Centralized mail service helped lower the cost of conducting surveys. The response time for surveys was lowered with widespread adoption of telephones. Mail and telephone surveys dominated survey modes in the latter half of the 20th century. Since landline telephones are associated with an address, these now-traditional survey modes relied on accurate demographic information. The introduction of the World Wide Web and increasingly wide-spread use of cellular phones prompted survey designers to reconsider traditional instruments in favor of ones better suited to growing technologies.

A 2002 paper from the RAND corporation describes growing interest among researchers about using the Web to conduct surveys. The paper addresses the assertion that internet surveys have higher response rates than traditional mail or telephone surveys. They found this to not be the case, except for technologically savvy populations (e.g. employees of Bell Labs). However, they noted that the web is only going to become more pervasive and they recommended that survey designers keep the web in mind.

The RAND paper describes web surveys not as web forms, but simply as online-distributed paper surveys. The surveys were sent over email, in a model that exactly mirrors mail surveys. They noted that spam could be an issue over time and specifically stated that it was possible that the populations with higher response rates to emailed surveys were those who may have had a lower junk-to-relevant mail ratio for email than for snail mail.

The view of web surveys from twelve years ago predated “web 2.0”. It also predated the widespread use of cellular phones. Six years ago Pew Research published an article on the growing proportion of cell-phone-only households. They found that this population still only comprised a small proportion of the total US population, and so for polls that targeted the entire US population, unit nonresponse from cell phone users could be explained by typical error estimates. However, cell phone users were found to have a distinct population profile from the total US population. Therefore, any stratified sampling needed to take cell phone users more seriously.

Why Web Surveys?

As technology changes, the mode used to collect survey information changes. Clearly the rising use of smart phones makes web surveys increasing attractive for researchers.

On top of the obvious appeal of being able to reach more people, web surveys afford researchers unique advantages that other modes do not allow, or are prohibitively expensive to implement. Web surveys do not require people to administer them (while many organizations use automated calling services for phone surveys, the is still an associated cost for the service, as well as growing discontent for robocalls (okay, that still supports argument one).

Web surveys allow for rapid design modification, cheap pilot studies, and (what we believe to be most important) the ability to control for known problems in survey design. What problems could there be, other than not being able to reach people? Consider the dominant view of survey design:

The goal is to present a uniform stimulus to respondents so that their responses are comparable. Research showing that small changes in question wording or order can substantially affect responses has reinforced the assumption that questions must be asked exactly as worded, and in the same order, to produce comparable data. (Martin 2006 )

We believe that this view — this static view — of survey design leads to overly complicated models and cumbersome statistical analyses that arise solely from only being able to perform post-hoc data analysis. Our view is that, since there are so many variables that may affect the outcome of a particular survey response, we should not try to control for everything, since controlling for everything is impossible. Instead, we randomize aspects of the instrument over a population, promote a “debug phase” akin to pilot studies, and encourage easy replication experiments. If we can reduce known biases in survey design to noise, we can perform more robust analyses. Now what do more robust analyses give us? Better science! Who wouldn’t want that?