IEEE eScience: Opening keynote
really bare bones notes
Rich Wolski, Keynote Speaker
Building Science Clouds Using commodity, Open-Source Software Components
Start with what’s happening in the commercial clouds and work toward the scientific world
A lot of people getting excited in distributed computing – commercial entities going ahead, both big and small
virtualization < web services < SLAs – users : need to automate as much as possible, also need to make it clear to users what they are getting when they rent space in the cloud
public cloud vs. private cloud
What can be done in a cloud? – workflows are great, but there are things that don’t work well in grids
what extensions or mods are required for scientific applications, data assimilation, multiplayer gaming (latency constraints)
How do clouds interact with other systems – like mobile devices which are already in/on their own cloud
open source cloud: simple, extensible, widely available and popular technologies, easy to install and maintain
examples: U Chi, Nimbus, on GT4 and Globus (from grid computing), but not looking for when grids act like couds
Enomalism now called ECP (startup with open source), difficult and pretty opaque
They’re building on Eucalyptus.cs.ucsb.edu: elastic utility computing architecture linking your programs to useful systems
- linux (like Amazon)
How to know if it’s a cloud? try things on it that you can do on amazon web services (like in the interface) and see if it can do these things (no strict definition of “cloud” just knew a cloud when they saw one, AWS).
- software overlay so it didn’t really monkey with the underlying infrastructure (with some grid things you had to blow away your operating system, install a whole bunch of libraries, and it was really difficult to know what it was doing and to support it)
A driver of this was to save money – researchers wanted to work in the commercial cloud, but it’s really expensive. If nothing else, they can use this to debug before moving into the commercial cloud.
Goal to more like democratize – not to replace Amazon and Google services at all – but to allow for other people to try things, but you won’t have their data centers, you’ll still need to have the hardware (and other things).
Interface is based on Amazon’s published WSDL, EC2 command-line tools, REST interface
sys admin – cloud admin, toolset for user accounting and cloud management
Security – they figured it out (WS-security, SSH key generation, etc.), but makes installation and sequencing a bit complicated – refer to his slides
Performance – if Xen is installed right, they haven’t seen the performance hits people are concerned about with virtualization
You can play with the Eucalyptus Public Cloud – with limitations
Benchmarking tests show it’s running very fast and responding like the amazon services
Part of VGrADS – an NSF project – Linked Environments for Atmospheric Discovery – real time use of Doppler radar for local/regional forecasting
… lots of details
clouds and grids are distinct
- full private cluster is provisioned
- individual can only get a tiny fraction of the total resource pool
- no support for cloud federation
- opaque wrt resources
- built so that a single user can get most or use the entire cloud for a single project
- federation as first principle
- resources are exposed
Questions from the audience:
- teragrid – I guess currently a 3week queue? maybe use this to get projects in faster?
- SLAs how to actually monitor and enforce?