Comps reading, Interactive IR, Pandora for PubMed.... and more!
This comps reading brought to you separately, because it is directly relevant to an interesting conversation happening on friendfeed. (what luck!)
First, friendfeed.
In
this string, led by
Mr. Gunn, we have comments on how new article alerts should take what you already know by looking at a collection you give it (possibly from your bibliographic manager - like EndNote, BibTeX, Refworks), and then suggest others, not based on full content, but based on human-assigned metadata like Pandora. (an important part of pandora, IMHO, is being able to tune it by skipping some - because there are different facets in the metadata, you might want to be related in one facet, and not another... anyhoo...)
In
this string, based on an
older blog post by Martin Fenner, but just picked up again by
Andrew Perry (liked by
Richard Akerman), we talk a little more about how people find articles, suggesting filtering by papers you or others read.
Now, happy coincidence, a piece of this morning's comps readings.
Kelly, D., & Fu, X. (2007). Eliciting better information need descriptions from users of information search systems. Information Processing & Management, 43(1), 30-46. DOI: 10.1016/j.ipm.2006.03.006 (can't immediately find an e-print free, but you can at least read the abstract on Science Direct)
Given that
1) users have a difficult time articulating information needs (think anomalous states of knowledge, Belkin)
2) users tend to use really short queries because
a) they don't necessarily know what to put in (see 1))
and
b) the interfaces encourage them to do so
3) longer queries usually result in better retrieval performance
there is a serious mismatch.
This mismatch has been addressed in various studies using a couple different things.
1) query expansion (for non-IR folks out there, system adds additional terms to the search)
a) automatic - the system expands your search either using a thesaurus or maybe a spell checker or by terms found in top matching results
b) interactive - the system asks the users which terms to use and sometimes where to get additional terms.
2) polyrepresentation (Ingwersen 1996)- this tries to imitate what a good reference librarian does. This uses multiple representations of the information need including representations of the user's
a) prior knowledge
b) goals or why the user wants the information
As Kelly and Fu say, the idea is that the user has a lot more information about their query than they give to the system. Part of this goes back to Taylor (1968 - of course,
I always go back to Taylor, 1968!) and his 4 levels of information need: visceral, conscious, formalized, compromised. The point is that users have a model of what the IR system can do, and they pose their query accordingly - they use different terms, shorter queries, etc.
This article presents part of their work on the TREC 2004 track on High Accuracy Retrieval from Documents. So there's definitely some experimental design weirdness. They get the standard TREC query information and then they come back to the user to ask for q2) what the user already knows q3) why the user wants to know q4) some additional keywords. Consult the paper to read about various issues based on the way TREC works, but the upshot is adding all three q2, q3, q4 together was best by far. Q2 was the best single one, pseudorelevance wasn't so hot at all for these queries and this corpus, and longer queries did much better than shorter ones. (oh, better is mean average precision, relevance in binary judgements from the 13 people who wrote the topics/questions).
Now we bring it all together.
Why not enable users to identify a collection of documents stored in their reference manager as what they already know. Ask the users for what they want to know... then as the alert comes in from week to week, allow the users to tune like Pandora. The system should also tune, based on items saved out of the alert, which become things that the user knows....
Full text is one way, but actually, can just use MeSH, abstracts, and titles...
Is anyone already doing this? Why not?
Labels: comps