ASIST2008: eResearch Crosses the Pond
NOTE: take this for what it is - stream of consciousness bits and pieces
eResearch Crosses the Pond
3:30, Sunday 10/26/08
Christine Borgman - UCLA
Jenny Fry – Loughborough U
Clifford Lynch- Coalition for Networked Information
Eric T. Meyer – Oxford Internet Institute, Univ Oxford
Carole Palmer – UIUC
what does e-research mean?
all of the ICT resources, data, tools, digitized resources to enable distributed collaboration and research – the grid, esocial science, escience, ehumanities, digitizing humanities
example, bridging the pond – her book, written in Oxford
scholar’s reflections on data – how do these vary btwn US and UK?
- what are my data?
- with whom can I share?
- who is interested?
- release – who, authority, expectations…
- who owns?
5 different projects – collaborative research on data in cyberinfrastructure
CENS – biologists, robotics, engineers – huge gridded systems
who is the owner of the dataset – haven’t thought about it, don’t know
will release data only in specific states
will release upon request
will release to non-conflicting
will release with embargo
data practices vary with disciplines and countries and funding source, specialty, individual, research methods, status of researcher, availability of repositories, local policies – cross-national, where does the data reside?
Measuring the benefits of data sharing: the challenges
JF (and collaborators, not speaking) – larger project funded by JISC
- Lyon 2007 cost-benefit analysis of data curation and preservation
- Beagrie et al 2008 costs of data repositories are an order of magnitude greater than those for e-print repositories
- lags cost to impact make it difficult
- disciplinary-sensitive methods
costs are well described, both direct and indirect, but benefits are somewhat more difficult
case studies ebi and qualidata …
ukda – uk data archive – social sciences data… funding source requires offering data to … 30% are rejected due to privacy, etc.
EBI – exponential growth “somewhere between enormous and terrifying”
vs qualidata – 48 datasets accepted in 2007-8
real effort for business case, many cases of funding pulled (astrogrid), but still back of the envelope calculations for usage - they frequently don’t have good data on use/usage
paradigm shift – change workflow based on easy access to these data centers
future work: help figure out what is needed to make a business case, and do some centrally, figure out how to do this in a disciplinary sensitive way
evolving strategies for supporting eresearch – cyberinfrastructure. this term e-research shorthand for a number of phenomena – systematic use of IT in broadest sense to enhance scholarly practice, scholarly inquiry, scholarly communication… so this probably goes back to the 1960s but can really only pinpoint when gov’t recognized and started to fund – 1980s super computers and allocating time… latest round high performance computing and collaboration environments and emphasis on data (sharing, preservation…)…this is the most unique new dimension…
UK ahead, US 2003 –
how funding works US vs UK.. UK funding/funded? councils – in a disciplinary way, and some important private councils… in IT there is JISC funded by top slicing the higher education budget
compare to US with NSF, NIH, etc., all balkanized by discipline and agency and different views of how it should/could/will work. much more diverse
uk more visibility of national need for national strategy for data preservation/curation (I might have gotten what he said wrong)
us is very fragmented, biotech info, planetary science info at NASA, but environmental data everywhere… is this a key part of the mission for research libraries? can this make individual universities more competitive in attracting scholars, winning grants… showing up in institutional leadership (top down vs. bottom up in UK)
example GeoVue – borough planners – google earth – data from ordnance survey for virtual London. In us gov’t data can’t be owned, in UK gov’t data is Crown Copyright- ordnance survey said no, won’t license for this. Guardian – “free our data” campaign… citizen paid for data, why can’t they use it?
example 2 WWW of Humanities Project, transatlantic digitization… their project to make internet archive easier to use for research,
high impact information
variation in curation requirements across domains
value of collections of collections
data visualization moving research forward
difficulty in getting protocol and instrumentation information
hypothesis testing system – but used very rarely for that, used for other purposes
less exploratory searching for high impact
perspectives of users vs depositors for a multiscale neuroscience mostly image data repository
scientists data workflows and trying to develop system requirement for managing data sets in IRs – working with liaison librarians
profiling complexity and differences – the number of transformations to data to make it what to keep
study of data and archiving – 60% “archive data”, 59% expect to keep more than 10 years.. few off-campus backups, issues with migration and preservation
alignment with work being done elsewhere (using same instrument?), also using what learned in cultural heritage collections – preparing for long term analytical potential
collections as more than a sum of their parts - “building contextual mass”
diminished intentionality – purpose of collections, relationships btwn
q: Diane Sonnenwald – long term - over time for new purposes from people from different disciplines
CB: curational longevity, but how do you get at data from a different perspective than originally intended – this is an old retrieval problem
(just came from chemical info meeting) open ontologies – keep alive, keep categorizing
cross walks and gateways
we’re still understanding the problem
CP: we’re very interested in this – how do you represent what might be done later? keep door open.. we really know little about retrieval of datasets
CL: it’s impt to underscore how interdisciplinary the reuse might be -- like climate change and biodiversity, migrations of peoples, interdisciplinary syntheses building on availability of data that are new – but these questions are quite hard – funding and describing and preserving… people who created dataset have fully mined it so it isn’t their priority to preserve it.. they’d rather pay for building a new dataset… so this work is prospective and speculative
JF: this seems like a question that ETM is encountering with internet archive project
[I was actually thinking that – about how pulling century old data on dates birds site and flowers open to look at climate change]
ETM: yes…. what are the roles of information professionals? some of these people don’t realize there are specialists who look at these things and try to solve these problems in their own way
CB: we’ve dropped the ball on this – this is incredibly important and very few of us have any of this in our curricula… and it’s very domain specific… need domain specialists who can be data brokers
CP: panel on Wednesday on data curation education
CL: pushing this too late in the education process… k-12 more emphasis on what data do we need and how do we marshal data to make an argument… operating in a data rich environment and making sense of it. problem of documenting the context of the data enough for reuse
JF: domain scientists – example linguists have been training students to deal with these large datasets (?)… she gets asked if the disciplines will continue to exist but still call back to the disciplines to …?
CL: role of institutions vs role of funding agencies in preserving data (I think)
Q:(missed name): mentioning NOAA and LTEM – so there are these big repositories that have niches but scientists outside of these areas don’t know about other areas – at Syracuse doing things like scientific information literacy
CL: certainly big pools – how can we knit these together to facilitate use across different sources… people stay in one pool once they learn to deal with it… also worried about fragility of funding
CB: fragility of funding see Lord & McDonald report. Lots of repositories but many different levels of cataloging… see for example water data and came up with 10k different variables... lots of effort, but difficulty with communities
ETM: public meme that all of this is easy – ridiculous Chris Anderson article about the end of science – these are all things we need to address
CP: this is really hard but we need to start and go in with eyes open
Q Howard White (Drexel): big long term effort for social science data archives – 1974 -his dissertation was on the relationship of librarians to social science data archives – is that history being taken into account?
CL: at least some – for sustainability example of ICPSR is used… but those datasets are somewhat small, with complex codebooks, can’t just reach in there and use that data… in universities there are these things like social science data labs who have long dealt with this
Q Catherine Blake (UNC?) : quantifying benefits – how do you describe new types of research before you can even get access to the data sets?
JF: can’t answer this with just one project, example : ebi integrating data from different investigations… they are working on figuring out a series of questions that might be asked – but some of this is normative … but don’t know an alternative approach
CB: claim around why data are impt for cyber infrastructure – if we have enough, then we can ask new questions (not just faster) so if that is possible then the ROI is a different calculation… start with understanding documenting for reuse now, understand that then think about new uses.