Christina's LIS Rant: Comps readings this week

Christina's LIS Rant

Sunday, April 26, 2009

Comps readings this week

Joho, H., & Jose, J. M. (2006). Slicing and dicing the information space using local contexts. IIiX: Proceedings of the 1st international conference on Information interaction in context, Copenhagen, Denmark. 66-74. (available from: http://eprints.gla.ac.uk/3521/1/slice_and_dice..pdf)
In this article they test a couple of different things about the information interaction in search. They look at having a workspace in the interface and pseudo-facets by co-occurrence (not the typical clustering). There were several tasks of low and high complexity - defined as how much information is given and needed about an imposed task. Participants were much happier with the workspace than the baseline layout and they also did better at identifying relevant pages using the workspace for complex tasks.

Wacholder, N., & Liu, L. (2008). Assessing term effectiveness in the interactive information access process. Information Processing & Management, 44(3), 1022-1031.
Started reading this and then I took a detour to quickly read through: Wacholder, N., & Liu, L. (2006). User preference: A measure of query-term quality. Journal of the American Society for Information Science and Technology, 57(12), 1566-1580. doi:10.1002/asi.20315 - that article describes the experimental setup.
I just am having a really hard time telling the difference between these two articles. I guess the JASIST article is about what the user prefers and the IP&M article is about how effective these are at retrieving the correct result. The set up is that there's an electronic edition of the book. The investigators create a bunch of questions that can be answered with it. They have 3 indexes - the back of the book and two ways of doing noun phrases. One way keeps 2 phrases if they have the second term in common and the other keeps a phrase if the same word appears as the head of 2 or more phrases. They had questions that were easier or harder and created a test interface to show the query terms to the user. The user selects one and can see a bit of the text, which they can cut and paste or type into the answer block. Users preferred human terms - not surprising. The head sorting terms had a slight edge on the human terms for effectiveness with the TEC terms not doing nearly as well.

White, M. D. (1998). Questions in reference interviews. Journal of Documentation, 54, 443-465.
Looked at 12 pre-search interviews (recall in this time period when you wanted to do a literature search using an electronic database, you filled out a form, then made an appointment with a librarian, and then she mailed you a set of citations - or you picked them up a few days later). These interviews would be after the librarian had reviewed the form but before she'd done any real searching. Out of these 12 interviews, there were 600 questions (from both sides) using apparently a common set of rules as to what is a question... None of this seems earth shattering now. Oh well.

Lee, J. H., Renear, A., & Smith, L. C. (2006). Known-Item Search: Variations on a Concept. Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin, TX. , 43. Also available from : http://eprints.rclis.org/8353/
We always talk about known item search, but everyone defines it differently...

Green, R. (1995). Topical Relevance Relationships I. Why Topic Matching Fails. Journal of the American Society for Information Science, 46(9), 646-653.
There are ideal information retrieval systems goals and operational system design. Ideally, relevance, in a strong sense, means that the document/information retrieved helps the user with his or her information need. To get this done in systems, we make some assumptions. Namely, need can be represented by terms, documents can be represented by terms, that the system can retrieve documents based on input terms. So the weaker version of relevance that we use is matching term to term. But there are lots of things that are helpful or relevant that don't match term for term - like things that are up or down the hierarchy (you search for em radiation, microwave thing not returned even though it is a specific type of). She then goes wayyy into linguistics stuff (as is her specialty) about types of relationships...

Huang, X., & Soergel, D. (2006). An evidence perspective on topical relevance types and its implications for exploratory and task-based retrieval. Information Research, 12(1), paper 281. Retrieved from http://informationr.net/ir/12-1/paper281.html
This article follows closely on the previous (if not temporally then topically - ha!). The authors used relevance assessments from the MALACH project to further define various topical relevance types. The MALACH project has oral histories from Holocaust survivors. Graduate students in history assessed segments for matching with given topics and then provided their reasoning for doing so.
Direct - says precisely what the person asked
Indirect - provides evidence so that you can infer the answer
types within these
- generic - at the point but missing a piece
- backward inference or abduction - you have the result or a later event and you can infer what happened before
- forward inference or deduction - you have preceding event or cause
- from cases
Context - provides context for the topic like environmental, social, or cultural setting
Comparison - provides similar information about another person, or another time, or another place

So you can see how these are all very important and how a good exploratory search would help with this. As it is now, you have to manually figure out all of the various things to look for - even if the system perfectly matches your query terms, it's not enough! Also, they discuss the importance if you're trying to build an argument, how you need different types of evidence at different stages. Good stuff (and not just 'cause colleague and advisor as authors)
(so there's a situation at work, where I've been trying to bring some folks into this point of view - they can only see direct match - but I contend that a new/good info retrieval system should do more)

Wang, P., & Soergel, D. (1998). A Cognitive Model of Document Use during a Research Project. Study I. Document Selection. Journal of the American Society for Information Science, 49(2), 115-133
This was based on Wang's dissertation work - while she worked at a campus library for agricultural economics, she did searching using DIALOG. For these bunch, she had them read aloud and think aloud while they went through the results she retrieved to pick out the ones they wanted in full text. She recorded this and then coded it. From that she pulled out what document elements they looked at and how they selected documents. I mostly talk about this study in terms of pointing out the document elements that are important (like Engineering Village is spot on with the author and affiliation first), but the decision theory stuff is interesting too. In addition to topicality, their criteria include recency, authority, relationship to author (went to school with him), citedness, novelty, level, requisites (need to read Japanese), availability, discipline, expected quality, reading time...

I figured while I'm in the relevance section - onward! (with all the cooper, wilson, and kemp stuff... i'm not sure i get it so much.. i'm really not about tricky arguments or nuanced ... as in the Patrick O'Brian novels, I go straight at 'em - even when i read one of these and get completely unscrewed - 5 minutes later I'm confused again)

Cooper, W. S. (1971). A Definition of Relevance for Information Retrieval. Information Storage and Retrieval, 7(1), 19-37. DOI: 10.1016/0020-0271(71)90024-6
this pdf might be corrupted on ScienceDirect... I'll have to check from another machine - (no, it's fine from work). In the mean time I had to - dramatic sigh - get this out of my binder from the information retrieval doctoral seminar. Logical relevance has to do with topic appropriateness. It is the relationship between stored information and information need. Information need is a "psychological state" that is not directly observable - hope to express it in words, but that's not the same thing. The query is a first approximation of a representation of an information need. The request is what the system actually gets (is this sounding a bit like Taylor '68?). So when he's doing his own definition, he looks at a very limited situation - a system that answers yes or no questions. (here's where I get into trouble). He defines a premiss set for a component statement of information need as the group of system statements that are a logical consequence of the component (minimal means as small as possible). A statement is "logically relevant to (a representation of) and information need iff it is a member of some minimal premiss set." He later goes on to say that for topical information needs, you can create a component statement tree and get to something similar to Xiaoli & Dagobert's indirect topical relevance. Interestingly, his definition specifically doesn't include things like credibility and utility where other versions of relevance do, even while maybe only developing topical relevance.

Wilson, P. (1973). Situational relevance. Information Storage and Retrieval, 9, 457-471. doi:10.1016/0020-0271(73)90096-X
Wilson also notes the difference between psychological relevance - what someone does do, or does perceive to be relevant - and a broader view of logical relevance - something can be relevant whether or not the person noticed it. Wilson is interested in logical relevance. Within logical relevance, there's a narrower logical relevance (elsewhere direct) and evidential relevance. Something is evidentially relevant if it strengthens or adds to the argument/case. Situational relevance deals with things that are of concern or things that matter, not just things you're mildly interested in. Something is situationally relevant if, when put together with your entire stock of knowledge, it is logically or evidentially relevant to some question of concern. Something is directly relevant if it's relevant to something in the concern set and indirectly situationally relevant if it's relevant to something that isn't part of the concern set. Wilson's situational relevance is time sensitive and person sensitive - what is of concern depends on who you ask. Within all this there are preferences, degree, practicality, etc.

Kemp, D.A.(1974) Relevance, Pertinence, and Information System Development. Information Storage and Retrieval 10, 37-47.
In which we lead back to Kuhn again (all roads lead back to Kuhn and Ziman if you travel them far enough :) Kemp defines pertinence as a subjective measure of utility for the actual person with the information need, while relevance is something that can be judged more objectively, by others who can compare the expressed information request with the documents retrieved. He compares this to public vs. private knowledge (Ziman, and Foskett), denotation vs. connotation, semantics vs. pragmatics. Along the way, he provides a definition of informal vs. formal communication - but this is really much more complex now. His definition of informal is that it "does not result in the creation of a permanent record, or if it does, then that record is not available for general consultation" (p.40). Of course our informal communication may last well after you'd like it to and is certainly retrievable! His view is that pertinence is ephemeral - but I guess now we would say that it's situated.

Kwasnik, B. H. (1999). The Role of Classification in Knowledge Representation and Discovery. Library Trends, 48(1), 22.
(btw the scans of this in both EbscoHost and Proquest aren't so hot - they're legible, but a little rough) This is a classic article for a reason... like this paragraph

The process of knowledge discovery and creation in science has traditionally followed the path of systematic exploration, observation, description, analysis, and synthesis and testing of phenomena and facts, all conducted within the communication framework of a particular research community with its accepted methodology and set of techniques. We know the process is not entirely rational but often is sparked and then fueled by insight, hunches, and leaps of faith (Bronowski, 1978). Moreover, research is always conducted within a particular political and cultural reality (Olson, 1998). Each researcher and, on a larger scale, each research community at various points must gather up the disparate pieces and in some way communicate what is known, expressing it in such a way as to be useful for further discovery and understanding. A variety of formats exist for the expression of knowledge--e.g., theories, models, formulas, descriptive reportage of many sorts, and polemical essays.

Just sums up all of scholarly communication in a few sentences. "Classification is the meaningful clustering of experience" - and it can be used in a formative way while making new knowledge and to build theories. Then she describes different classification schemes:
Hierarchies have these properties: inclusiveness, species/differentia (luckily she translates that for us - is-a relationships), inheritance, transivity, systematic and predictable rules for association and distinction, mutual exclusivity, and necessary and sufficient criteria. People like hierarchical systems because they're pretty comprehensive, they're economical because of inheritance and all, they allow for inferences, etc. But these don't always work because of multiple hierarchies, multiple relationships, transivity breaks down, we don't have comprehensive knowledge, and other reasons.

Trees go through that splitting but there's not that inheritance of properties. Her examples include part-whole relationships as well as a tree like general - colonel - lt colonel... - private. Trees are good because you can figure out relationships, but they're kind of rigid and handle only one dimension.

Paradigms are matrices showing the intersection of two attributes ( really?). Hm.

Facet analysis - choose and develop facets, analyze stuff using the different facets, develop a citation order. These are friendly and flexible once you get going, but deciding on facets is difficult and then there might not be any relationships between the facets.

With all of these things, things get disrupted when perspective changes, or the science changes, or there are too many things that don't fit neatly into the scheme. The article stops kind of suddenly - but this really ties back to Bowker and Star who are much more comprehensive (well it's a book after all!) in how all of this ties into culture, but less detailed about how classifications work.

Thus completes the relevance section... back to diffusion of innovations (see separate post on Rogers) These articles were originally assigned by M.D. White, who was a guest speaker at our doctoral seminar. One of her advisees did her dissertation on the diffusion of electronic journals, good stuff. Dr White was on my qualifying "event" committee, but she has since retired so no luck in having her on my next couple.

Fichman, R. G., & Kemerer, C. F. (1999). The illusory diffusion of innovation: An examination of assimilation gaps. Information Systems Research, 10(3), 255-275

The point of this article is that for corporate IT innovations, there's a real difference between acquisition and deployment; that is, many companies purchase technologies that they never deploy. If you measure adoption by number of companies who have purchased, then you'll miss rejection and discontinuance which are actually very prevalent. This difference between cumulative acquisition and deployment is the assimilation gap. If you think of the typical s curve then a higher one (higher cumulative# acquired) is acquisition and a lower one deployment, the area between the two curves is the gap. You can draw a line at any time t and see the difference. The problem is that you have censoring - some firms still have not deployed at the end of the observation window. The authors use survival analysis for this, which enables them to use the data even with censoring, to look at median times to deployment, and to make statistical inferences about the relative sizes of two gaps

They suggest that reasons for this gap for software innovations in firms might be increasing returns to adoption and knowledge barriers to adoption. Returns to adoption means that the more other organizations have already adopted, the more useful the innovation will be. Reasons for this include network effects, learning from the experiences of others, general knowledge in the industry about the innovation, economies of scale, and industry infrastructure to support the innovation (p. 260). Managers might hedge their bets for innovations that haven't caught on yet - purchase them, but wait to see what others do before deploying. Sometimes technology that is immature is oversold - and this only becomes clear after purchase. Knowledge barriers can be managerial as well as technological. It might not be clear how to go about the deployment process.

The authors did a survey of 1,500 medium to large firms (>500 employees) located using an advertiser directory from ComputerWorld. At these companies the respondents were mid-level IT managers with some software development tools installed at their site. They had 608 usable responses - but they ended up using only 384 because they wanted larger firms (>5 it staff) who were assumed to be more capable of supporting these software innovations. Acquisition - first purchase of first instance; Deployment - 25% new projects using. For one tool there was a very small gap, but for another it was pretty large. They came up with median times to deploy and also what percentage of acquirers will probably never deploy (for one innovation 43%!). They compared these to a baseline from a Weibull distribution (in which 75% deploy in 4 years).

Answers to the survey questions supported the idea that complexity and the technologies being oversold really contributed to this gap. An alternate explanation is that different people in the organization make the acquisition and deployment decisions.

(I'm going to stop now and start on next week's... more diffusion to come)

Labels: comps

¶ 12:15 PM| |cites (technorati) |

Comments: Post a Comment