IEEE eScience: Dan Reed's Keynote
Keynote:
Daniel A. Reed
Microsoft Research
Cloud Seeding: Watering Research Flowers
(slides will not be shared per his request)
3 pillars of discovery: theory, experiment, and computational e-Science (ok, so observational sciences don’t provide new discoveries?)
(ew viscous flow in disposable diapers)
truisms- bulk computing is almost free but software and power aren’t
ubiquitous sensors – but lagging in data fusion
moving lots of data is still hard
people are expensive – robust software is extremely labor intensive
scientific challenges are complex and social engineering is not our forte – increasingly social engineering is the limiting factor in our success (pesky users)
political/technical approaches much change or we risk solving irrelevant problems
how do we develop effective programming tools for the average jane who is using at best Perl, MATLAB, etc…. typical scientist or engineer vs. savvy computer scientist
social implications of the data deluge
- hypothesis driven (data was pretty scarce in the past)
- exploratory –“what correlations can I glean from everyone’s data?”
- this requires different tools and techniquest
- massive multidisciplinary data rising rapidly at an unprecedented scale
clash of cosmologies – article that astro was going from observation to mere measurement
scientific computing is along for the ride instead of in the drivers seat – we use GPUs bcs the commercial industry is driving innovation in the processors and the average desktop machine doesn’t really need to be any faster for the typical office or home tasks… so we have to use these tools developed for the commercial market
use these commercial cores and parallelism – new approaches are needed to really take advantage of the parallelism
cloud application frameworks (slide from Dennis Gannon)
OS virtualization > software as a service > parallel frameworks (this is a triangle)
Hadoop over EC3 is 2/3 the way toward OS virt from parallel
GFS, BigTable,MapReduce at parallel frameworks
Amazon S3/EC2 at OS virt
Microsoft mesh at Software as a service
Microsoft Azure Services Platform- cloud services
services platform (hosted sharepoint, CRM, SQL, .NET….)
(I was hearing about software as a service ages ago so I guess now it’s becoming more mainstream)
Data Center costs for cloud computing- (physical plant and power issues are huge)
land 2%
core and shell costs 9%
architectural 7%
mechanical and electrical – rest
energy and supporting infrastructure cost like 4x the cost of the 1u server
researchers have different perceptions about the cost of ownership bcs they don’t see the power bills, only the server and people costs
15MW is what you need to build a cloud computing facility (15 yr amortization)
servers 2k each, 50,000
commercial power .07kWhr
security/administration 15 people @$100k/yr
$3M/months related to power
instruments and infrastructure
- from desktop to lab level to organization level maybe state level then national level and regional level
building blocks of cyberinfrastructure – this is his slide from 10 years ago
in the past 10 years
- commodity clusters – proliferation, of inexpensive hardware, race for MachoFLOPS, broadbase for enabling software, low level programming changes
- grids and distributed services – multidisciplinary collaborations, less broad base for enabling software
research money vs. production, maintenance, on going reliable work
teraflop is no news now, done in the lab with linux clusters
security – PII, HiPPA – research machines still have to be secure and patched and this is a real cost (not just an inconvenience)
business
- capital is cheap, labor is expensive, costs are explicit
academia, govt
- capital is hard, labor is seeming cheap (students!), costs are implicit
funding is at best flat
infrastructure inefficiency reduces research funding – need to become more efficient
so this is his argument to go to cloud computing
- elasticity
- economies of scale
- efficiency
- cost clarity
- pay as you go
- support
- geodistribution (security)
Labels: IEEEeScience08