ASIST: Managing and Disseminating Scientific Data and Information
Managing and Disseminating Scientific Data and Information: A Technical Discussion
Tuesday, November 2, 2005Brad Hemminger
Astronomy data growth, huge. Each image is 16kx16k pixels – 1terabyte of information a night. Lots of observatories and facilities to share the data. Likewise with huge datasets in genetics and biomed
Digital data collections are a catalyst for the democratization of research and education
Archive and integrate or
- Lose collections over time
- Can’t build on past work
Types of challenges
Technical (easier)
Knowledge - sharing (different languages) and storage
Critical steps
Overall semantic interoperable framework. Standards for communicating knowledge. Common public repositories (like Genbank)
More challenges
- Change the behavior of researcher (!?!)
- Mandating by funding agency
- Requirement for publication
- Local education
- Privacy and security (IRB)
- Indexing, data mining
Jon Jablonski, GIS Data ManagementPeople expect Google earth-type CSI magic GIS information work. There’s an active user community building apps on Google Earth. In reality, using the data is much more complicated.
GIS vector vs. raster.
Managed by many different agencies: all levels of government, data librarians, map librarians. (a guy in a closet with a copy of ArcMap)
Data management: 1 PC, filing cabinets, file based repositories, geo-databases, clearing houses
CDs are cataloged as edited anthologies w/out a table of contents note. The wetlands layer is one of 55 on the CD.
Respositories – sometimes brief, irregular, inconsistent metadata (a counter example is CUGIR, which is well organized)
Vs. Databases – tables will all of the various features, you can clip out the part you want
Vs. Clearinghouses (like geodata.gov) – you wind up only being able to see a screenshot (instead of live data) or you have to pay for the data
Needs
- Access control
- Layer-level description
- Hierarchical browsing access
- Scalable for aerial photography
- Google-able
Wait for Google Earth hackers to fix the problem?
Bonnie Carroll, National Science Board Study Starts The Ball Rolling on Effective Management of Scientific Data in the US: A Policy and Practical ReportLong-lived data report (l-l means that technology impacts its use)
To provide a framework for analysis, definitions for discussion
Research (project) vs. resource (community) vs. reference collections (global)
Mandated data collection maintenance – how paid for? Overhead? Direct cost?
Data collections need curration, peer review – data scientists
Citation of datasets
Who owns the data? Can you compete for the curation of the set?
Joint memo OSTP/OMB: New emphasis on data collections and measurement of R&D investment impact (? Interesting)
Management of scientific collections (like piles of dead birds and butterflies at the Smithsonian)
GOESS – Global Observation of Earth System of Systems, remote sensing data
(note: I made a mess of trying to ask this question, but there were some major issues here. First, there are lots of complications in data sharing in science. See Birnholtz’ dissertation online at CREW. Second, the high level stuff Carroll talked about was very interesting, but way too much for the forum. Too fast, too many acronyms, too much)