GC3: Grid Computing Competence Center Large Scale Computing Infrastructures Lab 5: libcloud Sergio Maffioletti <sergio.maffioletti@gc3.uzh.ch> GC3: Grid Computing Competence Center, University of Zurich http://www.gc3.uzh.ch/ November 15, 2012
What will we cover today? 1. Basic of Libcloud 2. Compute and Storage APIs 3. Libcloud in action on Openstack and on AWS
Apache Libcloud a unified interface to the cloud Why Libcloud? The choice of Libcloud is purely pragmatical: it s written in python and offers standardized interface towards a large collection of cloud stacks. Many other libraries are available.
Before we get started For practical assignments and exercises we will use resources from AWS and FutureGrid If possible we will run the exercises from our individual laptop (client) Recommended to create a dedicated virtualenv AWS: you should have received credential information individually FutureGrid: you should all have registered on the https://portal.futuregrid.org
Before we get started For practical assignments and exercises we will use resources from AWS and FutureGrid If possible we will run the exercises from our individual laptop (client) Recommended to create a dedicated virtualenv AWS: you should have received credential information individually FutureGrid: you should all have registered on the https://portal.futuregrid.org
Before we get started For practical assignments and exercises we will use resources from AWS and FutureGrid If possible we will run the exercises from our individual laptop (client) Recommended to create a dedicated virtualenv AWS: you should have received credential information individually FutureGrid: you should all have registered on the https://portal.futuregrid.org
Before we get started For practical assignments and exercises we will use resources from AWS and FutureGrid If possible we will run the exercises from our individual laptop (client) Recommended to create a dedicated virtualenv AWS: you should have received credential information individually FutureGrid: you should all have registered on the https://portal.futuregrid.org
Libcloud setup $ virtualenv --no-site-packages virtualenvs lsci2012 $ source virtualenvs lsci2012/bin/activate $ easy install apache-libcloud
FutureGrid setup You need to copy credential information from FutureGrid login node $ scp -r <futuregrid username>@india.futuregrid.org:openstack/ virtualenvs lsci2012/etc
What is Libcloud? Libcloud is a Python library which abstracts differences between cloud provider APIs and allows users to manage their cloud resources (servers, storage, load-balancers) using a unified and easy to use interface. 1 1 Tomaz Muraus @KamiSLO
Why Libcloud? Different APIs Different response formats (XML, JSON, text) Different authentication methods
Libcloud APIs Two main packages/utilities that we will use: Compute Storage
libcloud: Compute Compute component is the oldest one and allows you to manage cloud and virtual servers offered by different providers Provider: a cloud vendor Driver: encapsulate common provider operations Node represents a running virtual appliance.
libcloud: Compute NodeSize represents node hardware configuration. Usually this is amount of the available RAM, bandwidth, CPU speed and disk size. NodeImage represents an operating system image. NodeLocation represents a server physical location. NodeState represents a node state. Standard states are: running, rebooting, terminated, pending and unknown.
Compute: APIs list images() list sizes() list locations() list nodes() create node() destroy node() reboot node() deploy node() ex * - provider specific functionality
Run the following example from libcloud.compute.types import Provider as ComputeProvider from libcloud.compute.providers import get_driver as get_compute_driver driver = get compute_driver(computeprovider.ec2) help(driver.create node)
Compute: AWS example import os from libcloud.compute.types import Provider as ComputeProvider from libcloud.compute.providers import get_driver as get_compute_driver # Read access details from your <SURNAME>.cred file AWSAccessKeyId = "" AWSSecretKey = "" driver = get_compute_driver(computeprovider.ec2) ec2_compute = driver(awsaccesskeyid, AWSSecretKey, secure=false) images = ec2_compute.list_images() nodes = ec2_compute.list_nodes() sizes = ec2_compute.list_sizes()
Compute: and the OpenStack example # Read access details from $HOME/virtualenvs lsci2012/etc/novarc # Use EC2_URL for getting information for FG_HOST and FG_PORT EC2_ACCESS_KEY = "" EC2_SECRET_KEY = "" FG_HOST = "" FG_PORT = 0 FG_SERVICE_PATH = "/services/cloud" driver = get_compute_driver(computeprovider.eucalyptus) fg_compute = driver(ec2_access_key,ec2_secret_key, secure=false, host=fg_host, port=fg_port, path=fg_service_path) images = fg_compute.list_images() nodes = fg_compute.list_nodes() sizes = fg_compute.list_sizes()
an Example of an horizonal library Libcloud is an horizontal library: it encompassed many features supported by many providers. But does not guarantees that all providers support all features. Try this: fg_compute.list_locations()
Prepare keypairs In many EC2-compliant cloud stacks, for the instance to be actually usable via ssh, you also need to pass the ex keyname parameter and set it to a keypair name that exists in your account for that region. Libcloud provides a way to create or import a keypair programmatically.
Prepare keypairs Generate RSA keypair (e.g. ssh-keygen) keyname = lsci2012_<surname> ec2_compute.ex_import_keypair(keyname, /home/sergio/.ssh/lsci2012_rsa.pub ) fg_compute.ex_import_keypair(keyname, /home/sergio/.ssh/lsci2012_rsa.pub ) ec2_compute.ex_describe_keypairs(keyname) fg_compute.ex_describe_keypairs(keyname)
Compute: Inspect node details # Select NodeImage to start... ec2_image <NodeImage: id=ami-90a21cf9, name=195067550827/lsci2012_mhc-coev_base, driver=amazon EC2 (us-east-1)...> fg_image <NodeImage: id=ami-0000004d, name=lsci2012/lsci2012_mhc-coev_20121015.qcow2.manifest.xml, driver=eucalyptus...> ec2_size <NodeSize: id=m1.small, name=small Instance, ram=1740 disk=160 bandwidth=none price=0.085 driver=amazon EC2 (us-east-1)...> ec2_location <ExEC2AvailabilityZone: name=us-east-1a, zone_state=available, region_name=us-east-1>
Compute: Start nodes running_nodes = [] node = ec2_compute.create_node(name= lsci2012, image=ec2_image, size=ec2_size, location=ec2_location, ex_keyname=keyname, ex_securitygroup="lsci2012", ) running_nodes.append(node) node = fg_compute.create_node(name= lsci2012, image=fg_image, size=fg_size, ex_keyname=keyname ) running_nodes.append(node)
Inspecting running nodes status At the moment is not possible to update node s status directly from node object. Need to recreate node list from NodeDriver ec2_nodes = ec2_compute.list_nodes() fg_nodes = fg_compute.list_nodes() node = fg_nodes[0] node <Node: uuid=dc8a7b4c8c438381a543894b033644adc84bc26d, name=i-000001ca, state=0, public_ip=[ 149.165.158.6 ], provider=eucalyptus...> node.driver.node_state_map terminated : 2, running : 0, shutting-down : 2, pending : 3
Accessing running node Once node is running and a public ip has been assigned, we can log-in using defined keypair # ssh -i /.ssh/lsci2012 ubuntu@[node s public IP]
Exercise 01a 1. Start an AWS image using the AMI: ami-90a21cf9 (It is an LSCI2012 image containing the Matlab binary from MHC COEV project) 2. Wait untill the node has booted and all services started 3. Check image s public IP address 4. Log via SSH 5. start interactively MHC COEV program $. /share/apps/mhc_coev/mhc_coev-040711 1 $ $MHC COEV
Exercise 01b 1. Start the FutureGrid image using the AMI: ami-0000004d (It is an LSCI2012 image containing the Matlab binary from MHC COEV project) 2. Wait untill the node has booted and all services started 3. Check image s public IP address 4. Log via SSH 5. start interactively MHC COEV program $. /apps/mhc_coev/mhc_coev-040711 1 $ $MHC COEV Note: on FutureGrid ami, you probably need to run as root
Bootstrapping nodes with user data Instances can automatically run scripts passed via user data. libcloud driver has extra ex userdata parameter for create node user data needs to be passed as content userdata content = open(userdata_file).read() node = ec2compute.create node(name="lsci2012_lab05", image=ec2_image, size=ec2_size, location=location, ex keyname=keyname, ex_securitygroup="lsci2012", ex userdata=userdata content )
Exercise 02 Start same AMI as Exercise 01 (both variants a and b) passing MHC COEV invocation command as part of ex userdata
Storage Storage API allows you to manage cloud storage and services such as Amazon S3, Rackspace CloudFiles, Google Storage and others.
Storage Object represents an object or so called BLOB. Container represents a container which can contain multiple objects. You can think of it as a folder on a file system. Difference between container and a folder on file system is that containers cannot be nested. Some APIs also call it a Bucket.
Storage list containers() list container objects() get container() get object() download object() download objects as stream() upload object() upload object via stream() delete object() delete container()
Storage from libcloud.storage.providers import get_driver as get_storage_driver from libcloud.storage.types import Provider as StorageProvider s3_driver = get_storage_driver(storageprovider.s3) s3_storage = s3_driver(awsaccesskeyid, AWSSecretKey, secure=false) # Create a container my container = s3 storage.create_container( lsci2012_<surname> ) <Container: name=lsci2012, provider=amazon S3 (standard)> # Retrieve container object from StorageDriver container = s3 storage.list_containers()[0] [<Container: name=lsci2012, provider=amazon S3 (standard)>]
Storage # Upload data to selected container my container.upload_object( lsci2012_lab05_<surname>.txt, object_name= lab05_<surname>.txt ) my container.list_objects() [<Object: name=lsci2012_lab05_maffioletti.txt, size=4395859, hash=ab0f3076193959ffe1b249d47522af5d, provider=amazon S3 (standard)...>]
Storage >>> obj = my_container.list_objects()[0] >>> obj <Object: name=lsci2012_lab05_maffioletti.txt, size=31, hash=e320fe515ef6f7c5c1cfb346804bc062, provider=amazon S3 (standard)...> obj.delete ()
Exercise 3 Create 6 input files each of them containing 1 different sequence of parameters 475 10 0.001 1 0 0.5 475 10 0.001 1 1 0.5 475 10 0.001 2 0 0.5 475 10 0.001 2 1 0.5 475 10 0.001 3 0 0.5 475 10 0.001 3 1 0.5 Upload the files in S3 bucket using your container (e.g. lsci2012 maffioletti/mhc coev input01 )
S3Curl: S3 Authentication Tool for Curl Curl is a popular command-line tool for interacting with HTTP services. S3Curl is a Perl script that calculates the proper signature, then calls Curl with the appropriate arguments.
S3Curl: S3 Authentication Tool for Curl Deploy S3curl on your laptop: http://s3.amazonaws.com/doc/s3-example-code/ s3-curl.zip Note: it requires HMAC SHA1
Authentication setup Crate.s3curl file using id and key from respective AWS and FutureGrid access details %awssecretaccesskeys = ( # personal account personal => { id =>, key =>, }, );
Deploy S3Curl on FutureGrid/AWS node Log on your running instance via ssh. Then as root: aptitude update aptitude install unzip unzip s3-curl.zip chmod +x s3-curl/s3curl.pl aptitude install libdigest-hmac-perl
Exercise 4 1. Start same AMI as in Exercise 1 2. Upload.s3curl file 3. Install S3Curl 4. Try to access S3 bucket lsci2012 SURNAME from there (list content)
Exercise 5 1. Same as in exercise 4 but done through user data 2. Remember: user data script is executed as root
Exercise 6 Start same AMI as Exercise 1, then have the AMI automatically: 1. Download 1 input file from S3 2. Start MHC COEV using data from downloaded input file
Exercise 7: put all together 1. Start 6 images each of them with a reference to one of the input files 2. Write results back to S3 bucket Results files are: full workspace * latest work.mat 3. Use container with your SURNAME as placeholder Note: Do *NOT* transfer all.mat files back.
Readings http://libcloud.apache.org http://libcloud.apache.org/supported providers.html