Dear Internet Diary,
This past weekend, we presented the SurveyMan work for the first time, at the Off the Beaten Track workshop at POPL. I first want to say that PLASMA seriously represented. We had talks in each of the sessions. Though I didn’t have the chance to see Charlie‘s talk on Causal Profiling, Dan said it definitely engendered discussion and that people in the audience were “nodding vigorously” in response to the work. Dimitar presented Data Debugging, which people clearly found provocative.
I was surprised by the audience’s response to my talk; I know Emery had said that people whom he talked to were excited about this space, but sometime that’s hard to believe when you’re a grad student chugging away at the implementation and theory behind the work. It was invigorating to be able to describe what we’ve done so far and hear enthusiastic feedback. In all my practice talks, I had focused on the language itself, but for OBT, at the behest of my colleagues, I took the debugging angle instead. Most of the people in the audience had used surveys for their research and were quite familiar with these problems. While language designers have tried to tackle surveys before, they frequently come from the perspective of embedding it in a language *they* already use. The approach we take leverages tools that our target audience uses. We limit the expressivity of the language and make statistical guarantees, which is what our users care about the most.
I had a few really interesting questions about system features. Someone made the point that bias cannot be entirely removed through redundancy — that we can’t know if we’ve found enough ways of expressing a question to control for the underlying different interpretations. In response, I suggested that we could think about using approaches from cross-language models to determine whether we have categorically the same questions. The idea is that if a set of questions produces the same distribution of responses, it is sufficiently similar. Of course, this approach neglects the non-local effects of question wording. Whether or not this can be controlled through question order randomization is something I’ll have to think about more.
As a followup question, I was also asked if we could reverse-engineer the distributions we get from the variants to identify different concepts. This was definitely not something I had considered before. I wasn’t sure we would, in practice, have sufficient variants and responses to produce meaningful results, but it’s something to consider as future work.
A lot of the other questions I had were about features of the system that I did not highlight. For example, I did not go into any detail about the language and its control flow. I was also asked if we were considering adding clustering and other automated domain-independent analyses, which I am working on right now. Quite a few of the concerns are addressed by our preference for breakoff over item-nonresponse. There was also an interesting ethics question about using our system to manipulate results. Of course, SurveyMan requires active participation from the survey designer; the idea is not to prevent the end-user from adding bias, but to illuminate its presence.
“As a followup question, I was also asked if we could reverse-engineer the distributions we get from the variants to identify different concepts. This was definitely not something I had considered before. I wasn’t sure we would, in practice, have sufficient variants and responses to produce meaningful results, but it’s something to consider as future work.”
Maybe we should think about this as a possible data science application of SurveyMan. It’s still a way to get training labels, but there’s a distinction between tapping into respondents’ conscious knowledge and tapping into the way they’re biased by question wording, question order, etc. SurveyMan is good for the former because it controls for bias better than other survey systems, and it’s good for the latter because the ability to test those things is built right in.
Yeah, I think this sort of thing is ripe for knowledge discovery. I would have to look more into the literature to say anything worthwhile about it. I know Emery’s been discussing some of this work with David Jensen, who does work in causality, and he’s quite interested in what we’re doing.
The above suggestion also dovetails with what Dan plans to work on over the summer. A key feature that the MSR people want is the ability to search through space of hypotheses, generate new hypotheses automatically, and test them automatically. Dan was taking his AutoMan work in this direction, trying to do open-ended search. It’s definitely a hard problem, and one where you easily get stuck in a basin of doom. Generating hypotheses via clustering in the survey (use the extra columns as features), or learning latent variables across questions that are allegedly semantically equivalent seems like it would be a huge win for knowledge discovery, and data science (by extension).