Accessing the scientific literature through images, a rant
I went to a presentation on CSA's Illustrata, and I've read pieces by Sandusky and Tenopir of UTK on its development and evaluation... it seems like a very useful endeavor and a useful tool when it comes to the subjects I'm primarily concerned with.
While browsing my feeds just now, I saw a mention
of Marti Hearst's project, Biotext
. Biotext indexes the 150 PubMed Central journals. You can search the abstracts or captions or captions and show the results in a grid format. I'm actually somewhat disappointed, because I know of Hearst's work with Flamenco and faceted presentation of search results -- yet in this obvious place to do that, they do not. In Illustrata it shows the descriptors and all that and you can do searches on them...
What's even more disheartening is that Biotext seemingly has no visibility of Illustrata... I read one of the articles linked from the "about" page and scanned the other. There is this footnote:
Recently a commercial offering by a company called CSA Illustrata was brought to our attention; it claims to use figures and tables in search in some manner, but detailed information is not freely available.
A company called CSA, wtf? Ok for environmental science they're like the best A&I service (not to mention materials and aerospace but, ok, that's not pertinent to this post). It's not like they're new or unheard of.... it's also not like the folks at Berkeley couldn't learn more about it... maybe by oh, I don't know, reading the white paper or going to a presentation or maybe signing up for a free trial, or talking to their librarian? So they mostly talk about TREC stuff... great.
Oh, and I do love this bit:
"Recently, online full text of bioscience journal articles has become ubiquitous, eliminating one barrier. The intellectual property restriction is under attack, and we are optimistic that it will be nearly entirely diffused in a few years"
(I have the Beach Boys' "wouldn't that be nice" going through my head)... Gee, I hope the actual life scientists do appreciate that there are still a ton of things not available OA... like the Nature stuff, for example?
My thing is that I think this is really important, and we could really get some good work done if we build on each other's work. You know, cumulating as if we're doing science instead of ignoring as if we're computer scientists (sorry, I don't really mean to offend my one CS major reader, hi John!, but really, they did just invent taxonomies, knowledge representation, and information retrieval, you know).
I'd also like to see this in materials science or mechanical or aerospace engineering -- wouldn't it be great to see the computation fluid dynamics or finite element images? Maybe the micrograph pictures or failure pictures...