Cloud Computing Introduction
Computing in the Clouds Summary Think-Pair-Share According to Aaron Weiss, what are the different shapes the Cloud can take? What are the implications of these different shapes? Computing in the Clouds provides a high-level overview of what Cloud computing is. It does so, by discussion the different shapes the Cloud can appear to users. Questions What are some key terms that characterize the Cloud? commodity, distributed, elasticity, scalability, rent, out-sourcing
Shapes of Cloud 1. Data Center: In this shape, the cloud is an architecture that eschews expensive high-end hardware for many commodity machines. Utilization is maximized through the use of wide-spread virtualization. 2. Distributed Computing: In this shape, the cloud allows us to tackle large-scale problems by utilizing hundreds to thousands of machines simultaneously. Unfortunately, harnessing all these resources is difficult, and we need specialized programming models, frameworks, and tools. 3. Utility Grid: In this shape, the cloud is an elastic and scalable resource pool. There are trade-offs involved with this arrangement. 4. Software as a Service: In this shape, the cloud is a way of delivering software and services over the internet. Rather than storing data locally, users store their data on the cloud service.
Data Center Large centralized computer systems are not new. In the beginning there was the mainframe. Eventually, this gave away to the mini-computer, and then the microcomputer (aka personal computer). "I think there's a world market for maybe five computers." -- Thomas Watsom, IBM Chairman 1943 Pictures IBM System 360,, DEC PDP-11,, Apple II According to the article, networking and the growing power of data centers has lead us back to the idea of centralized computing. Questions What economic and technological factors lead to decentralized computing? What economic and technological factors are leading us back to centralized computing? Computing technology often goes in cycles (what is old is new again). That is what is acceptable for one era is deprecated in the next and then acceptable again. Questions What are some other examples of cyclic technology trends in computer science? garbage collection, out-of-order execution, interpreted languages, portability. What factors influence these shifts? Economics, hardware trends, human capital.
Modern Data Centre Today's data centers are unlike yesterday's mainframes or data centers: Old: Expensive high-performance enterprise hardware. Vertical Scaling: scale up by improving the performance of a single node. New: Lots of inexpensive commodity hardware. Horizontal Scaling: scale out by adding more machines. New data center involves lots of machines (upwards of a million) working together. This poses challenges: Connection and hardware failures common. Because of large numbers of machines, energy costs are high (50% of data center costs, 1.5% of world electricity) Pictures Google Data Center Notre Dame Greenhouse
Virtualization New data centre relies heavily virtualization to maximize utilization A virtual machine provides a virtual hardware interface to the operating system such that the guest OS interacts with virtual devices rather than physical ones. VMs provide sandboxing, that is isolation, from other VMs on the same hardware. Each VM is an independent machine with its own copy of the OS. By executing many VMs on one machine, adminstrators can improve utilization since most VMs are generally idle. Data center requires some coordination layer to manage resources. Distributed Computing Questions What is distributed computing? What are some examples of distributed computing? What makes distributed programming difficult?
Distributed computing Many autonomous and independent machines working together to accomplish a common goal. Coordinating and mapping tasks to system resources is a complex problem. Developing algorithms and implementing applications that execute on such distributed systems is difficult (See: Eight Fallacies of Distributed Computing). To tackle these problems we need new systems: HDFS: distributed storage MapReduce: scalable distributed data processing
Utility Grid Building your own data centre is costly: real estate, hardware, power, cooling, and maintenance. What if you only need 99% of the computing capacity 10% of the time -> lots of underutilization. Amazon faced this problem, so they decided to rent their excess capacity to third-parties -> Amazon EC2. With Amazon EC2, websites such as Reddit, Dropbox, and Netflix can utilize the resources provided by Amazon to scale up and down as required and alleviate the need to maintain their own infrastructure. The cloud allows companies to rent and utilize computing resources on-demand, that is, only when they need them: Elasticity: resources grow and shrink as demand requires Scalability: improve performance by adding more resources
Software as a Service Rather than having a local application, users utilize remote software (i.e. web sites) as if they were services. Put most of the heavy computation on remote servers and consume the content on local light-weight devices. "The network is the computer." -- John Gage, Sun Microsystems. Questions What are the advantages and disadvantages of this model? Is this model feasible? Is this model inevitable? Obstacles Cloud is here and likely to stay, but issues remain: 1. Network: US market not really ready to handle network load. 2. Privacy: Can we trust Cloud providers to trust our data? 3. Lock-in: There are no standards for interoperability between Cloud vendors. Conclusion The Cloud is amorphous, or perhaps multi-modal. It means many things to different people.