Christina's LIS Rant
Friday, October 31, 2008
  ASIST2008: Final Sessions
This last bit of sessions was disappointing - in each session, some of the presenters didn't show up. I started in From open source to open libraries, but I had heard all but the first speaker speak - and I was looking forward to that presentation:

Knowledge Sharing and Management in Open Access e-Resources & Communities (KM, DL)

Thomas Krichel, LIU and RePeC
(slides at: http://openlib.org/home/krichel/presentations/columbus_2008-10-29.ppt)

archives with articles in one location but services can be widely used. RePEC is based on 900+ archives. 630k item dataset
… talking at the speed of light no way I can do notes…

some measures of users of services
some measures of derobitified web impact through LogEc

author registration success as a measure of community impact
- matched registry and an independent list of top 1000 economists and 79% registered (some of these 1000 are dead, so never get to 100%)
- impact on fee
1) always have been free working papers
2) publication delay is 2-3 years if not 10 by going down journal ladder so useless for state of art

impact a lot
- more efficient for working paper
- preserved working paper culture
- better citation/use for working papers
- preservation for working papers

remaining nuisance – libraries, we’re supporting the toll based ,so as long as we do , they will continue to exist

keys to success
- need an extraordinary individual who can do a tremendous amount of work on “surplus” time
- small group of volunteers
- own server
- disseminate as widely as possible

[ditched that session and headed to the geographical information session - but only 2 out of 5 presenters showed, and I missed all but the last few words of Dr Buckland's, apparently to do with some aspect of http://ecai.org/ ...]

Hit just the very end of Buckland (Berkeley)
Finding and Providing Context Online
internet allows you to search ineffectively in multiple different vocabularies

[ditched next speaker because I think he decided that the way to fill 45 minutes with a 10 minute talk was to spend 5 minutes on each very obvious bullet point]

Ended up in something *completely* different: Waddling around the Digital Iceberg: Use of Virtual Spaces and Environments by children, preteens, and teens.

Apparently there were only 2 speakers of all of the speakers? BUT I got to hear all of Eric Meyers (the guy from UW not the one from OII) and all about club penguin and learning... really fit well with the second plenary, which I didn't blog but was fabulous. So it ended on a good note.


  ASIST2008: Trends of Schools and the Fields
Thematic Maps of 19 iSchools
Chaomei Chen, Drexel

Instead of starting with journals or starting with a topic, paper, or individual, did an affiliation search for the ischools. Not perfect because coverage of ISI but also because of school name changes and problems with how affiliations are reported/recorded

CiteSpace – available on his home page
- showing some pictures from previous work on terrorism to show the transition from one cluster to the next and the path and high betweenness node/pivot point.

(go back and look at google earth map on flu diffusion – he skipped)

Showed Drexel-
gatekeepers in purple
showed all 19 ischools, co-citation graph

finds a co-citation cluster, takes the articles citing that cluster, and does noun phrase (extraction? identification?--?) to name the cluster
(citespace has even more features than when I last used it, cool!)

Departmental Websites and Female Student Recruitment: What do IT program websites reveal about woman friendliness
Kristen Hanks (Doctoral Student in Social Informatics at Indiana University)
Part of or as a result of Gender and IT Education larger project.
gender and IT research has 3 phases
1) proving there is an issue
2) understanding the issue
3) coming up with ways to address and ways to test those steps

She’s looking at understanding the issue. Questions re subtle gender cues and differences between applied vs. CS/Computer Engineering. “Applied” includes informatics, information science, instructional technology, information systems

Web pages are significant in recruitment (Abrahamson)
Ease of use vs. perceived usefulness (I thought there were later studies than this?)
women are more sensitive to
- visual/non-verbal cues
- message claims
- manipulative intent

“comprehensive information processors” (Meyers-Levy & Maheswaran, 1991)

Content analysis of 104 pages from 16 departments
applied more women but cs more intimate and higher percentage than represented in enrollment
applied – more quotes from male
CS – more accomplishments, more female, used whole names


  ASIST2008: The Office
(uh, oops, couple more to post)

The Office
“a place of work used for non-manual work” – OED
introduce the office as a useful concept for information research
- history
- theory
- information seeking behavior
- classification
- computer desktop

office as power, as aesthetics, organizing metaphor

Jana Hartel, Toronto
Teresa Dirndorfer Anderson, University of Technology Sidney
SooYoung Rieh, Wisconsin
William Jones, Washington
Barbara Kwasnik, Syracuse

Probst – “the action office”
Now cubicle culture

offices as “innovation junctions” – discrete innovations for info production, dissemination, and storage lurched information work forward, sociotechnological system

less well studied – most study in CSCW… also look at session on Materiality yesterday by Olaf Sundin looking at the impact of place

Hartel – her dissertation on cooks, their information stores were like offices …
Anderson - “office as a state of mind” or cloud
Kwasnik- effect of time on org of
Rieh – talk more, read more, think more, grade more, organize less – information seeking and use at home
Jones – office anywhere – computer desktop “what is the office in an era of nomadic computing”

TDA: “when is the office” aka: when two or more computers are gathered there must be wifi (and hopefully electrical outlets)
digital environments are
- multisided
- multilayerd
- fluid
- ever shifting

-“perils of dichotomizing” human and machine (?)
Randall et al 2005 – sensitizing concepts in ethnographic research of work practice

when is infrastructure? only as a relational property, not necessarily as thing (Bateson 1978)
ethnography of infrastructure – when is the office?
infrastructure is part of human organization
infrastructure inversion (bowker 1994) – foreground infrastructure
info systems as political creations
same technology foster sense of community can also restrict access (Weingarten and overbey)
embedded background work with highly visible public performance (star and strauss)
information life cycle (Harper 2000) – use documents as artifacts, as the “red thread” to follow through a system

[my question is an office where you work alone or is it people together? or both]

BHK: Time and the office
office was the situation, but it was what made people decide to classify things one way or another
looked for “enduring reciprocities” – things that go together
looked at:
- situation and document attributes
- value of document
- cognitive state

lots of explicit mentions of time (some implicit)
- tense
- duration
- continuation
- frequency
- speed
- age

(this is from her dissertation research, so that’s available as are the articles that came from it)

SYR: Home office and influence on information seeking and use
Pew (Sept 2008) – 45% do at least some work from home, 56% of “networked workers” do, 20% do so constantly

mixed blessing – flexible hours, but some break from work is needed

her 2004 JASIST article – ethnographic study looking at people’s home computer environments
at home – different social roles, much broader range of topics, looking for more unfamiliar objects – more search engine use while at work may go to the same pages, and do the same things
one person household – computer in center
family – not in family hanging-out space
computer designed for single users – as a work tool

not big blocks of time – shorter intervals between other activities, less intensive
“success” meant progress in 5 minutes, not completion of task in the short intervals
at home, there’s no one to help you with searches unlike at work where there is technical support and colleagues

finding information for other people, really wanted to discuss searches while they were doing them, because they were unable to replicate later or re-find information (didn’t really use bookmarks so much) – recording sharing and disseminating – also lack of coordination among family members (why did you close my window, I was doing something)

WJ: who needs an office?
do we need to work together to work together?
example of his book – illustrator and designer – working together never or almost never co-located
a lot of stuff a la Clark & Brennan

BK – is f2f being eroded by people checking e-mail during meetings – feels like “molasses” ask a question and delay if answer – they recently had a faculty planning day and turned off wireless - for the most part worked beautifully with very efficient discussion – until people sneaked a peek at their blackberries under the table

q from Dawn P-M regarding getting into other people’s stuff – see Emilee Rader’s work (Michigan). also see Sonja Talja – collaborative information retrieval chapter, too


Wednesday, October 29, 2008
  ASIST2008: Notes on Posters
I went to poster session 3 before it was “open” so I could hit the DRM session at the same time.
Poster notes:
K. McCain (Drexel) – tri-citation and PFnets to look at Eugene Garfield’s citation image (this way of looking at a pretty dense network and de-composing seems to work really well at showing the connections)

B. St Jean et al (Michigan) -- Institutional Repository, interesting big study, this poster was about the phone interview part.

A. Gruzd (UIUC) – using this text analytics tool to resolve referent name ambiguity in, for example, forum threads… (bob said that mary went to school – figuring out that bob is user BobR and mary is user QuiteContrary) He’s got this available on a web site: http://textanalytics.net/

Earlier poster sessions:
A couple interesting things on the information uses of museum artifacts by children as well as retrieval using an ontological map of cultural objects – they’re still trying to build a test collection from what I can tell

A lot on FRBR – like at which level will users accept a substitute (different manifestation? different expression?) and like Joe Hourcle’s work on scientific data – this seems really promising so I’d like to hear more feedback on it.

A few things on PIM – like college students and then also other users by Japzon of Drexel -- how many different places they keep their files and images – no explicit care to preserve for long term although consideration of reuse… trust in the servers/services that they use online to always keep the things.

Lots of things on tags, nothing terribly surprising, but work that needs to be done

The poster: Using Wikipedia To Make Web Pages More Readable – basically it looks up a term you highlight in Wikipedia – but uses the context of use to find only one article that is most relevant for quick information.

Very interesting work from some folks at U Washington on information design for homeless youth.

Somewhat disappointed in the information available on a study of data in STEM


Tuesday, October 28, 2008
  ASIST2008: Bibliometrics
Author Co-citation Analysis
See their more complete paper already available in early view at JASIST (DOI:10.1002/asi.20910 or maybe there’s another one?)

limitations – retrospective, and usually contributions only as first author, fixed when articles are published

extend bibliographic coupling from document to author, so can change over time if one or other of the authors is still publishing
- current trends
- evolution

author bib coupl. frequ , measure relatedness of authors
- defined as the number of ref the two authors’ oeuvres share
- calculated as the overlap btwn the weighted ref sets of two oeuvres
- factor analysis as method of revealing underlying structure of interrelationships

factors extracted by principal component analysis (PCA)
model fit in ABCA was a lot lower than ACA – in ABCA look at very broad range of papers and topics, finds those common with other authors

ABCA realistic view of state
ACA better view of external/internal influences

Reactive tendencies of Bibliometric Indicators
Frandsen and Nicolaisen
Use of alphabetization when listing author names in journal articles – suggest adding as a negative steering effect of bibliometric indicators
- reflexivity – author level – submit to high prestige journals; journal – attempts to inflate impact factors (doping);
steering effects (performativity of measures)
Glanzel – positive steering effects – motivate researchers to collaborate or publish
negative – exaggerated collaboration, inflation, salami slicing, citation cliques, self-citations, editors suggest citations from journal where submitted.
Weingart 2005
can depend on disciplines
- credit practices vary by and within disciplines
- value placed on being first author
- not explicit about ordering of authors – learned and just done automatically
Laband, 2002 – 89% of articles in econ are alpha

[at the sts global grad student conference, the keynote from the guy from Ecole des Mines de Paris spoke about a London School of Econ article on the performativity of research indicators… very interesting area I should definitely dig into more]
Info science dropping in alphabetized multiauthorships – statistically significant negative coefficients
Econ rising – statistically significant positive coefficients

deviating from the norm of alpha in econ is a strong signal – hmm.
metrics relying on first author can distort, but we could give equal credit to all authors – but this would not be fair in information science

questions: MK – did you take out the number of times that the primary author has a name that is earlier in the alphabet
DW – look at corresponding author ? – good future thing
other – what about co-authors that rotate

[my question about future work – would be about in BMJ and other journals where they say exactly who did what]

Indicators of Structural Change and Interdisciplinarity: Dynamic animations of Journal Maps
Loet Leydesdorff
computer program available on his web site to solve some of these problems
he used this for journals but can also use for ACA

relationships among journals, individuals,…. all change at the same time, how can you control…?

static analysis – graph analysis vs factor analysis
dynamic problem – both factors and factor loadings of variables can change at the same time
system of partial differential equations

comparative stats based on journal maps
entropy stats > algorithmic solutions (but no visualizations, so hard to understand)

interdiscplinarity operationalized as betwnness centrality in the vector space

2-d we can use Kruskall’s stress (MDS), Kamada & Kawai

majorant function (gansner .. .dynamic stress minimalization (Schank 2008) – like K&K but add stress function – so can do 3 d

vice using square co-occurrence prefers to use 2 mode author –doc and then take cosine

place of social networks journal wrt other fields – so show animation over multiple years
[ok, so this animation is way cool]

interestingly 2004 was sort of an accident, before and after, journal was in sociology or mathematical sociology

nanotechnology journal
early on mostly applied physics – some chemistry off and on
then tightly chem. and phys and bio shows up… interesting.

cog sci
in between psych and educational research, high betweenness centrality, excursions to other sciences like cs (comp linguistics)

- excursions from disciplinary basins of attraction
- no interdiscplinarity but bi-disciplinarity
- no interfaces btwn social and natural sciences stabilized – they come and go from year to year

[oh – this is Visone he’s talking about .. I downloaded that… but never really ran it through its paces]

question: about the stability issue – what can we attribute this to? maybe special issues – yes in part –
of course these were journals picked to not be stable.


  ASIST2008: Digital Rights Management or Digital Restrictions Management?
Digital Rights Management or Digital Restrictions Management? (Panel)
Tuesday, October 28, 2008

Kristin Eschenfelder (U Wisconsin-Madison)
Kevin L. Smith (Duke, http://library.duke.edu/blogs/scholcomm/)
Bill Burger (VP Marketing, Copyright Clearance Center)
Johns Sullivan (Free Software Foundation)

Rafal Kasprowski (moderator)

KE: Can I e-mail this? The use restrictions found in licensed digital resources
Setting the context in library land
assumption: libraries always opposed to DRM
concerned due to compatibility and interoperability – example proprietary readers that are downloaded
second major concern, digital preservation
third, patron dissatisfaction
fourth, ideological barrier – in opposition to idea of libraries to promote maximal access and use

but: in archives and special collections – lots of interest in using DRM
expand access, but control who sees – if culturally sensitive, or nesting locations of spotted owls, or copyright issues

what counts as DRM, anyway?
hard technology – strictly control or disallow direct or subsequent use actions (saving, printing)
soft technology – interface or server side configurations to discourage but there are browser or other workarounds (see her articles in College & Research Libraries)
policy – license terms internal policies, terms of use, law
cultural law – cultural views about who can access and use information

Soft TPM types
TPM – technological protection measures
TPM by extent of use – maybe server fraud something systems? batch downloading alerts and blocking
TPM by frustration – chunking – like NetLibrary where you can only see a page at a time and it is really inconvenient to print or download
TPM by obfuscation – interface does not advertise use functionality
TPM by omission – vendor or software interface doesn’t have the option, but you can do it through the browser or OS (like right click, or control p)
TPM by decomposition – like in health and medical resources, materials only available in html, so much harder to save, e-mail, and transfer.
TPM by threat – (like JSTOR), declarations in EULA or pop-up

KLS: Mitigating the effects of DMCA anti-circumvention rules
DMCA does purport to define DRM: “technological measures that effectively control access”

self-help: take measures to prevent or address or assert rights without resorting to court
so DMCA is self-help, but getting the court to enforce the self-help.

part of the international Reproduction Rights Organization – so they have bilateral agreements to license foreign titles and exchange royalties
btwn rights holders and the users of content
make it easy for users to do the right thing
CCC makes no use of DRM – their model is about spontaneous pay-per-use, it frustrates their customers

had some positives , here are his negatives:
- huge hassle and inconvenience
- easily hacked
- punishes your best customers
- a crutch used to avoid adopting more progressive and adaptable business models
- anti-competitive
- slows technical innovation (example: DAT, digital audio tape)
- undermines your fair use rights

JS: Free Software Foundation
more like free speech rather than free beer (or as free-range would say, free kittens)
like GNU Linux
freedom to
- run the program
- study the program/ look at the software
- improve the program
- make and share copies and modified versions (derivative works)

free software != open source (open source movement isn’t so concerned with ethical principles of freedom)

examples: Apache, Handbrake (copying DVDs)
treat drm as an ethical and social problem rather than just as unfortunate business decision
their campaign is DefectiveByDesign.org, FSF and Civic Actions, launched in May 2006 – they call it digital restrictions management, targeted apple Microsoft amazon sony netflix warner emi boston public library (?)

don’t group things as IP – not physical property – talk about copyright, dmca – these are different things and the laws were with different purposes than laws for physical property

these drm systems force users to install proprietary software that phones home to remote server to check rights – so this is spyware and poses security issues.

apple’s iphone, and perhaps their view for the computer later is to vet and only allow programs they approve covered by their drm – so blocked aps to use iphone as a wireless hotspot or use voip vice cellphone – walled garden where they completely control what is on your device, and this can include no free software

the issue with DRM over public domain works – example of the constitution on the kindle – can’t share it… and pdf of Alice in Wonderland that can’t be read aloud.

Crap – the session just got cut off – over time… crap, crap crap.. no questions. speakers just completely ran out.

They should never, never, never allow more than 3 speakers for a single session


Monday, October 27, 2008
  ASIST2008: Tagging as a Communication Device
Tagging as a Communication Device: does every tag cloud have a silver lining
[v.v. incomplete notes take at face value and don't attribute malice to poor typing!]

Heather D Pfeiffer (New Mexico State U)
Emma Tonkin (UKOLN, U Bath)
Mark R. Lindner (UIUC)
Margaret E. I. Kipp (LIU)
David R Millen (IBM TJ Watson Research, Cambridge, MA)

Tagging as metadata: ontological architecture of tags
Knowledge in language-
knowledge is communicated through syntax (symbols), semantics (meaning), pragmatics (context of usage) – pragmatics is the context of usage

terms are just at the surface, meaning comes from relationships
so can see tags as just syntactic terms that we to apply semantic information to.

asks – do we have a change in language since 1600 (um, yeah, how is that controversial?? what? I either didn’t get her talk or I don’t see why it’s novel)

Ten minutes of language development
(funny example from “true names” map)
essentialist idea – id concepts, label concepts, id relationships – building a strawman, this is easy
- assumes perfect accuracy in id and labeling
- …
place vs. space, somewhere or with meaning
position vs. location (where you are, where you think you are), physical context, context awareness
concepts – positions, labels – names for positions, agents negotiate labels for shared concepts
sharing – joint or shared attention to a concept
Grounded naming game (steels, Vogt)
joint attention is co-location
with disagreement – resolve who feels stronger, probabilistic decay; voting/majority wins;
these require that you have gone there and discussed it.
can you describe this via a transition matrix – what is the prob that we come to consensus,
is perfect accuracy (she prob means agreement) possible, or desirable?
mechanism to handle change – necessary
if nothing ever changes, then nothing interesting happens – too active lose all consistency – just right
- some level of variation
- looking at measuring degrees of similarity, measures of variation useful for mapping (example how different is ASIST language from ACM from IEEE language?)… apply this to tagging..

(her stuff is always excellent and always too brief!)

Patterns of Collaborative tagging in a large organization
del.icio.us – success 2003
corporate versions 2005
- dog ear (his and others) – to market via Lotus
- onomi (2007 – Mitre)

behind the firewall
- can link up to enterprise search

use – goals: find, refind, explore
community browsing
personal search
explicit search

graph # action events vs. percent click through (some evidence of perceived potential utility)
“my tags” – lots of click throughs

examination of groups within IBM
- they compared the use of different groups – which were more similar than different
- unique posters 50% - use as a return on system – contributors/users is that the same across groups
- tags per bookmark- lots more tags for intranet vs internet -- why? they think possibly more heterogeneity within intranet so more things are needed to disambiguate [the explanation I would float would be that a lot of content providers who would use more tags to make sure their stuff is found]
- few, but stable number private tags
- fewer tags for private information

social tagging roles: publishers, evangelists, leaders
interviews with 33 taggers
tags more broadly – in corporate directory, blogs, wikis
- community building
- community seeker
- evangelist – raise visibility of something – to get people who sub to tag to notice your new content, or to be known for a topic
- publisher – to drive traffic to a resource – part of the day job
- small team leader – sharing resources amongst team, tags used by convention, team leaders less active taggers

tag use - intentional and not
tag similarity in use within enterprise – tags you would expect to cluster together, don’t
e-mailed people who used these tags – to ask why word a and not similar word b
- you’re right – it’s arbitrary
- no – heteronyms – they’re different
- composition/decomposition
- preferred usage, a standard in some sense
- to be found by desired audience

list of his related work
“social snippets in expertise search (Shami, Ehrlich, and Millen 2008)

ML (hope he puts notes up, really cool)
Integrating tagging: tagging as integration
in tagging research – very little explicit discussion of view of linguistics
tend to use segregational accounts using classical views

signs product of communication process not vice versa

time key factor - communication integrates past, present, future
constraints – biomechanical (psychological, physiological ), macrosocial, circumstantial (context, activities involved)
coordinating these activities and integrating these activities make communication possible, failures can happen in each
macrosocial – proficiency, practice, conformity (unconscious alignment) – community

Sen et al – tag classes - factual, subjective, personal – movielens – integrated
Kipp – Bopp and Starr – basic function – authors vs. user vs. indexers – assign different terms (he says they are integrating different functions even if same purpose)
context – there is always a task, and time is always an issue, even if not specific enough to warrant a tag
PIM research can also be restated in contextual terms

tagging as integration – individual
communicating with myself in the present, taking into account how he has done so in the past, with expectation of integrating it for future use

community- more macrosocial constraints
individuals in community will be more or less proficient – will conform more or less

Communication in Tagging, Collaborative Classification Practices in Social Bookmarking Tools

what constitutes a synonym isn’t the same for all people – some might be ok with collapsing down while others see important differences btwn, for example, cinema and movies
placing a document in context – but not the same context for author and user.

Tagging as PIM
Tagging as communication – not subject tags in particular, like funny, to read… emotional reaction, emotional reaction, reviewing function
Corrado and Moulaison IKSO2008 - difference when tagging for community and tagging for themselves – no real difference

del.icio.us – marjority are subject related but
time and task

communication – disagreement in aboutness – can this be more of a discussion of aboutness

in some ways, non-subject tags show more commitment [wonder if she’s looked at the one where you can check read or not]

tags are really time sensitive – engagement with resources


Sunday, October 26, 2008
  ASIST2008: eResearch Crosses the Pond
NOTE: take this for what it is - stream of consciousness bits and pieces

eResearch Crosses the Pond
3:30, Sunday 10/26/08

Christine Borgman - UCLA

Jenny Fry – Loughborough U

Clifford Lynch- Coalition for Networked Information

Eric T. Meyer – Oxford Internet Institute, Univ Oxford

Carole Palmer – UIUC

Intro, ETM:
what does e-research mean?
all of the ICT resources, data, tools, digitized resources to enable distributed collaboration and research – the grid, esocial science, escience, ehumanities, digitizing humanities

example, bridging the pond – her book, written in Oxford
scholar’s reflections on data – how do these vary btwn US and UK?
- what are my data?
- with whom can I share?
- who is interested?
- release – who, authority, expectations…
- who owns?

5 different projects – collaborative research on data in cyberinfrastructure
CENS – biologists, robotics, engineers – huge gridded systems
who is the owner of the dataset – haven’t thought about it, don’t know
will release data only in specific states
will release upon request
will release to non-conflicting
will release with embargo

data practices vary with disciplines and countries and funding source, specialty, individual, research methods, status of researcher, availability of repositories, local policies – cross-national, where does the data reside?

Measuring the benefits of data sharing: the challenges
JF (and collaborators, not speaking) – larger project funded by JISC
- Lyon 2007 cost-benefit analysis of data curation and preservation
- Beagrie et al 2008 costs of data repositories are an order of magnitude greater than those for e-print repositories
- lags cost to impact make it difficult
- disciplinary-sensitive methods
costs are well described, both direct and indirect, but benefits are somewhat more difficult
case studies ebi and qualidata …
ukda – uk data archive – social sciences data… funding source requires offering data to … 30% are rejected due to privacy, etc.
EBI – exponential growth “somewhere between enormous and terrifying”
vs qualidata – 48 datasets accepted in 2007-8
real effort for business case, many cases of funding pulled (astrogrid), but still back of the envelope calculations for usage - they frequently don’t have good data on use/usage
paradigm shift – change workflow based on easy access to these data centers
future work: help figure out what is needed to make a business case, and do some centrally, figure out how to do this in a disciplinary sensitive way

evolving strategies for supporting eresearch – cyberinfrastructure. this term e-research shorthand for a number of phenomena – systematic use of IT in broadest sense to enhance scholarly practice, scholarly inquiry, scholarly communication… so this probably goes back to the 1960s but can really only pinpoint when gov’t recognized and started to fund – 1980s super computers and allocating time… latest round high performance computing and collaboration environments and emphasis on data (sharing, preservation…)…this is the most unique new dimension…
UK ahead, US 2003 –
how funding works US vs UK.. UK funding/funded? councils – in a disciplinary way, and some important private councils… in IT there is JISC funded by top slicing the higher education budget
compare to US with NSF, NIH, etc., all balkanized by discipline and agency and different views of how it should/could/will work. much more diverse
uk more visibility of national need for national strategy for data preservation/curation (I might have gotten what he said wrong)
us is very fragmented, biotech info, planetary science info at NASA, but environmental data everywhere… is this a key part of the mission for research libraries? can this make individual universities more competitive in attracting scholars, winning grants… showing up in institutional leadership (top down vs. bottom up in UK)

example GeoVue – borough planners – google earth – data from ordnance survey for virtual London. In us gov’t data can’t be owned, in UK gov’t data is Crown Copyright- ordnance survey said no, won’t license for this. Guardian – “free our data” campaign… citizen paid for data, why can’t they use it?
example 2 WWW of Humanities Project, transatlantic digitization… their project to make internet archive easier to use for research,

high impact information
variation in curation requirements across domains
value of collections of collections

data visualization moving research forward
difficulty in getting protocol and instrumentation information
hypothesis testing system – but used very rarely for that, used for other purposes
less exploratory searching for high impact

perspectives of users vs depositors for a multiscale neuroscience mostly image data repository
scientists data workflows and trying to develop system requirement for managing data sets in IRs – working with liaison librarians

profiling complexity and differences – the number of transformations to data to make it what to keep

study of data and archiving – 60% “archive data”, 59% expect to keep more than 10 years.. few off-campus backups, issues with migration and preservation

alignment with work being done elsewhere (using same instrument?), also using what learned in cultural heritage collections – preparing for long term analytical potential
collections as more than a sum of their parts - “building contextual mass”
flat representation
diminished intentionality – purpose of collections, relationships btwn

q: Diane Sonnenwald – long term - over time for new purposes from people from different disciplines
CB: curational longevity, but how do you get at data from a different perspective than originally intended – this is an old retrieval problem
(just came from chemical info meeting) open ontologies – keep alive, keep categorizing
cross walks and gateways
we’re still understanding the problem
CP: we’re very interested in this – how do you represent what might be done later? keep door open.. we really know little about retrieval of datasets
CL: it’s impt to underscore how interdisciplinary the reuse might be -- like climate change and biodiversity, migrations of peoples, interdisciplinary syntheses building on availability of data that are new – but these questions are quite hard – funding and describing and preserving… people who created dataset have fully mined it so it isn’t their priority to preserve it.. they’d rather pay for building a new dataset… so this work is prospective and speculative
JF: this seems like a question that ETM is encountering with internet archive project
[I was actually thinking that – about how pulling century old data on dates birds site and flowers open to look at climate change]
ETM: yes…. what are the roles of information professionals? some of these people don’t realize there are specialists who look at these things and try to solve these problems in their own way
CB: we’ve dropped the ball on this – this is incredibly important and very few of us have any of this in our curricula… and it’s very domain specific… need domain specialists who can be data brokers
CP: panel on Wednesday on data curation education
CL: pushing this too late in the education process… k-12 more emphasis on what data do we need and how do we marshal data to make an argument… operating in a data rich environment and making sense of it. problem of documenting the context of the data enough for reuse
JF: domain scientists – example linguists have been training students to deal with these large datasets (?)… she gets asked if the disciplines will continue to exist but still call back to the disciplines to …?
CL: role of institutions vs role of funding agencies in preserving data (I think)

Q:(missed name): mentioning NOAA and LTEM – so there are these big repositories that have niches but scientists outside of these areas don’t know about other areas – at Syracuse doing things like scientific information literacy
CL: certainly big pools – how can we knit these together to facilitate use across different sources… people stay in one pool once they learn to deal with it… also worried about fragility of funding
CB: fragility of funding see Lord & McDonald report. Lots of repositories but many different levels of cataloging… see for example water data and came up with 10k different variables... lots of effort, but difficulty with communities
ETM: public meme that all of this is easy – ridiculous Chris Anderson article about the end of science – these are all things we need to address
CP: this is really hard but we need to start and go in with eyes open
Q Howard White (Drexel): big long term effort for social science data archives – 1974 -his dissertation was on the relationship of librarians to social science data archives – is that history being taken into account?
CB: yes
CL: at least some – for sustainability example of ICPSR is used… but those datasets are somewhat small, with complex codebooks, can’t just reach in there and use that data… in universities there are these things like social science data labs who have long dealt with this
Q Catherine Blake (UNC?) : quantifying benefits – how do you describe new types of research before you can even get access to the data sets?
JF: can’t answer this with just one project, example : ebi integrating data from different investigations… they are working on figuring out a series of questions that might be asked – but some of this is normative … but don’t know an alternative approach
CB: claim around why data are impt for cyber infrastructure – if we have enough, then we can ask new questions (not just faster) so if that is possible then the ROI is a different calculation… start with understanding documenting for reuse now, understand that then think about new uses.


  ASIST2008: Opening Session
ASIST2008: Opening session
1pm, Sunday, October 26, 2008
Genevieve Bell
Intel Corporation (October 2008)

(introduced by Brenda Dervin, responses by Howard Rheingold and Andrew Keen)

Technology and social impacts. Relationship of people with information – people society structure and meaning – transformation…

Internet – embodies the place where it was invented… ways technology got created encoded values but not everyone’s value. notions of participation, structure of information.

“internet goes feral” – moves from PCs to other devices, the technology and what it enables/prevents… things are moving forward and backward – immersive experiences on PCs, but more simple factual (my word) searches on mobile devices

internet > us & uk, pc > tv, mostly women 25-45, not the typical “early adopter” of technology

users may use the internet and take advantage of quick access to information without ever touching a computer, having power, or being literate

new internet users – US will forever not be the majority. Not Anglophone, not understanding Western metaphors, languages,… more cross cultural communication… example, way Chinese publish and escape sensors is to find another word that sounds familiar, and use the image for that… v. difficult to search for.

Different system of knowledge/information – knowledge through piety? through work? through study? through experience?

infrastructure – fat pipe – speed down vs. speed up… if equal, greater civic participation, if unequal more about consumption…. what about countries where you have to pay for speed and quantity to download, or capped internet (Australia, UK)… irony that you don’t know what you’re going to get. “killer app” BBC backfile slowed internet in the UK 30-40%, very compelling everyone downloading video. In Africa – internet you visit, not something you bring into your home – other things in internet café, social experience…. everyone doesn’t have the same kind of internet

regulating the internet: Indonesia – very well wired, effort to bridge digital divide, more mosques than telephones, so decision to use mosques (program never thrived)… we think of libraries for internet access, but in other cultures, where?

Who are stakeholders? in New Zealand, spectrum allocation and aboriginal peoples; religious leaders in Cairo… linking internet/technology usage to good citizenship, modernity by leaders in Singapore, South Korea.

Gov’t control/filtering of internet. We imagine the internet flows freely, but there is more control rather than less as we go.

socio-technical concerns.
old: privacy, trust, security, risk
new: reliability, access, reputation, participation, authenticity, authorship. ownership, surveillance, cultural health (digital literacy, dumbing down , distinctiveness)

there is no fixed notion of “the web”
new challenges and questions
- make sense of users, non-users, former users (not due to cost)
- disconnection & switching-off
- recognize emergent socio-technical concerns and alternate knowledge practices.

H.R. – response
liberty is a problem we don’t see – some people don’t miss it.
knowledge of cultural values
important impact of internet – whatever it is – is lowering threshold to collective action – doesn’t see a change in this
find an answer to any question by posing it correctly, and how can you trust it – reputation is more important
networked society vs. internet society – networked society has been around forever and there have been barriers and pipes for information

A.K. – response
why would people buy more chips? why would Intel win based on what she’s learned? apparently not bcs. lady in Indonesia has outsourced internet
given her “lack of conclusions” and that she is a “senior insider” where is this going? where is technology
don’t think she told the whole story – only the good looking half, he’d like to talk about the other half.
internet as philosophy, internet as ideology, internet as theory
from counter culture to cyberspace by turner on ideology of internet…
internet driven by counter culture group, it is a series of ideological statements of technology liberating people, idea of free market..
this is a challenge to hierarchy, religion, authority
the internet is not the real world – it is an idealized version – grants complete freedom, does away with barriers… the real world is increasingly ugly… world of profound inequality and injustice,… crux of digital revolution is teaching us an idealized way of operating in “a” world, but we have the reverse, the real world…
all of GB’s challenges go under the heading “individualism”
he’s worried about digital fascism – we’re in a time like the 1920s when people reacted against industrialism which changed society and community (I think he said) – he’s not worried about us, the professionals and insiders, but the 2 billion new users in the next x years – their disappointment with the lack of the internet world to connect with their world.

GB – response to response from AK
always hears “that’s nice sweetie, but what do you do at Intel?”
she thinks when next 2B people encounter the internet, unlikely that the ideology of the internet is sustainable… places where next 2B people come from don’t have the fetish of the individual, more collectivist cultures, and reputation of village or family or lineage. There may be the types of conflicts you imagine but different resolutions.

terms of the origin of the internet, won’t necessarily be the terms under which it carries forward.

- we’ve made the internet seem special in this talk, but it is another instantiation of technologies making changes in society

HR – response to both
watch deterministic language, need discussions of human agency… it’s not about the technology of the printing press, more about whole set of practices about teaching reading… what about appropriation of technologies for different uses…

Audience questions –
Dan Russell from Google – how people conceive of the internet (he thinks about how people come up with questions for google)… how do people think about “the internet”?
GB – technologies were complete symbol sets, you can’t be a person without x, internet became like other objects (example of tradition of burning paper objects for ancestors in after life), Indonesia, mobile phones = modernity, symbolic resonance – a story you want to tell about a better you. also about infrastructure, instrumentalist, … kiosk – people come up and can do one of 5 things, queries sit on the kiosk til it’s picked up carted off to connection, downloaded answers, and then brought back with answers… this version of immediacy is different. imagined as repository like library, temple, or my grandfather’s shed, you would find things eventually but not immediately or all at once

Jana Hartel from U Toronto – internet center of your research – but the world is bigger than the internet, so it is dangerous to think of the internet as the center of human experience…
GB – internet might be the least important thing, fetish made of new technologies, interested in the persistences as well as the changes

AK – internet is an old-fashioned term, need something else to describe this ubiquitous media
students more relying solely on internet
GB – no it’s not.. many places around the world, many learning trajectories where there’s no internet at all – it won’t always be, if you want to learn the answer go to the internet… in fact, most of the places she’s been, you don’t go to the internet for answers you go to the wise person
HR -- what is it that makes us human, we use symbolic communication to organize collective action (paraphrase)… symbolic communication is easily reprogrammed…. changing the way we transmit these symbols…

Gary Marchionini, UNC – notions of identity – personal identity – digital diaphragms/condoms/ sanctuaries on the internet… are there things happening now that can help us take control of our identities
HR – the degree to which we can’t control how our identities are on the internet
GB – we hold multiple e-mail addresses and we give them to different people depending on use… other places one e-mail for the household… we think of all of these accounts and things being about us individually, we present a particular identity, but there are multiple offline identities… we’ve always maintained multiple fragmented selves, and technology has mapped on to this
AK – not convincing that these traditions are represented in this new technology, it’s different and revoluationary


Wednesday, October 22, 2008
  Norms, Counter-Norms, Ideology?
Dr Free-Ride, in basic concepts posts (1,2,3,) describes the Mertonian scientific norms as well as anti-norms found in empirical studies by Mitroff.

I have a bunch of Merton and one Mitroff[*] on my comps readings and one of my readers (thanks, Ken) suggested Mulkay[**] - which actually links these two and then goes further - maybe into a bit of controversy. Definitely worth talking about.

Mertonian norms - from RK Merton[***] - are
Often mentioned with these are rationality, humility, and emotional neutrality [**].

Mitroff spent a bunch of time talking to and studying the Apollo scientists and showed with evidence in the form of quotes and well-crafted explanation that indeed the scientists sometimes report the scientific norms, but in everyday work, practice counter-norms. Now these counter-norms are not evil, but necessary. Mulkay provides several examples.
So there are norms and counter norms both there at the same time in science. Mulkay asks, "To what extent are these norms institutionalized?" Institutionalized means that they are positively connected to the reward system of science. Well, the reward system of science revolves around formal scholarly communication - publish peer-reviewed journal articles, get promoted, grad students, a better lab... But, for the most part, Mertonian norms don't come into play when selecting, reading, or citing an article. Most of the personality is taken out of them. They do play in getting to the position in which you can do the science (disadvantages due to race, gender, class, prestige of the institution are examples Mulkay gives).

According to Mulkay, the difference can be attributed in part to how the data was gathered for the two views. Merton and others studied the work of and talked to "great scientists" - these are the people who have the most invested in making their lab look good. Mitroff talked to and watched practicing scientists.

Here comes the somewhat controversial part. Mulkay says that scientists have, over history, intentionally provided the incomplete and misleading view of science to further their own interests - they use a "vocabulary of justification" to promote what becomes an ideology.

He gets to this point by looking bit of history. Post-Civil War (US) scientists started to argue for the autonomy and independence of science. Scientists say: Science is self-regulating because we have these norms and all, non-scientists can't really understand and judge value. By the first world war, there was little popularization, and scientists were quite isolated. Post war, there was a need to get more support for science - and funding - from the government so there was a move to paint science as the "source of national progress" (isn't it?). The view of scientists that was advertised was as virtuous and all good things like humble, patient, altruistic...

So he makes the point that there are these "vocabularies of justification which are used to evaluate, justify, and describe the professional actions of scientists, but which are not institutionalized within the scientific community such that conformity is maintained" and that scientist use this "occupational ideology" to maintain some autonomy and freedom from governmental control.

I think ideology might go a bit far - I think scientists are brought up with the Mertonian norms but are pretty pragmatic when work needs to be done. No doubt modeled on their advisor and shaped by bad experiences, they do keep certain secrets, and save time (rightfully so) by judging work on the impact factor of the journal, prestige of the affiliation, or past history of the researcher. Dr. F-R talks a bit more about malfeasance or intentional bad things - but there are a lot of things which don't fit into the Mertonian norms but are just fine within the actual social norms of the local social circle of scientists.


[*] Mitroff, I. I. (1974). Norms and counter-norms in a select group of the Apollo moon scientists: a case study of the ambivalence of scientists. American Sociological Review, 39(4), 579-595.
[**] Mulkay, M. J. (1976). Norms and ideology in science. Social Science Information, 15(4-5), 637-656. DOI:10.1177/053901847601500406
[***]Merton, R. K. (1973). The normative structure of science. In N. W. Storer (Ed.), The sociology of science: theoretical and empirical investigations (pp. 267-278). Chicago: University of Chicago Press. (Original work published 1942).

Update: comps tag added


Saturday, October 11, 2008
  Community detection in co-authorship networks
Wow - this paper uses a lot of the same techniques (and the same algorithm) I use in my IEEE eScience conference paper (details and pre-print to follow, probably not 'til the beginning of November though).

Rodriguez, M. A., & Pepe, A. (2008). On the relationship between the structural and socioacademic communities of a coauthorship network. Journal of Informetrics, 2(3), 195-201. DOI:10.1016/j.joi.2008.04.002

They are looking at a large multi-institution, multi-disciplinary NSF research center. They want to know if the communities - areas that are more connected to each other than to the rest of the network - detected in the co-authorship network using several standard algorithms correspond to any of these characteristics of the authors
I would like to have seen some other characteristics - but this is what was available (actually, come to think of it, how did they know country of origin? not stated - ew... seems problematic).

So then they did a contingency table and found the chi-squared. Turns out that department and affiliation are the only statistically significant characteristics -- that seems pretty obvious. I'm sort of glad the country of origin isn't. I think the characteristics seem a bit weak, but I like the general idea of the article. I'd like to see more things like gender, and a more granular representation of their discipline (so biology isn't enough, but what type of biology or maybe what lab or research group, too).

