Notes on Searching the Live Web by Hodder
Mary Hodder, lecture to UC Berkeley's SIMS 141 class, 11/22/2005, available in
rm format. (accessed 12/27/2005)
Live web - blogs wikis, etc., subset of web
She lists - blog pulse, sphere, technorati, bloglines, ice rocket, pub sub
Difference between static web and live web searching
- return of results (pagerank/relevance vs. reverse chronological), emphasis on live
- link searching vs. kw searching (not immediately obvious where search terms appear)
- engines find blogs by underlying structure produced by common blogging s/w (therefore not all retrieved are blogs, not all blogs retrieved)
- things drop off the front page ("aged") in liveweb search vs. google, which keeps archive (slower to crawl, relevance most important, deeper search)
Metrics of blog search
- links (technorati (last 6 mo), pubsub (not explicitly reported in search results), bloglines (forever)), different from site to site, confusing
- number of blogs searched ... bloglines gives #articles, others #feeds
- what are you actually searching? (see her venn diagram at 18:24)
- blogs no feed (~15%?)
- blogs w/feeds
- feeds not blogs
- number of RSS subscribers (in bloglines, feed via feedburner) -- using one or the other to look at influence or reach is inadequate... people try to extrapolate from both figures, using knowledge of subject area and how techie people are in that area
- her proposed metrics (see her blog, 22 different metrics, search only across smaller communities, not the whole blogosphere)
- blog to blog links (not blogroll, bcs on the fly)
- post to post links
- blogroll (decision to make a semipermanent part of the template, different relationship)
- comments
- blogserver records
- incoming traffic (people who read from bookmarks, find from web searches)
- re-order search results by "authority" -- number of links received. Sphere will allow by relevance
Splogs
- 13k blogs in an hour
- Google doesn't work as hard as they might to get rid of because of advertising dollars
What's needed
- (stop comparing everything to google and static web search from 1997)
- sophisticated interfaces
- topic browsing
- sophisticated weighting tools (more than just inbound link counts)
- adjustments to static web search to fine tune it
Her project to
tag w/identityIn response to questions...
Another way to help liveblog search:
- microformats (technorati approach, rel=)
- structured blogging (pubsub approach)
Problem with co-mingling blog links with static web: her example of looking for bank location -- it's not helpful to find blog posts about the bank.