GC3: Grid Computing Competence Center UZH Experiences with OpenStack What we did, what went well, what went wrong. Antonio Messina <antonio.messina@uzh.ch> 29 April 2013
Setting up
Hardware configuration at UZH 25 mixed blades + 2 service machines. around 200 cores total. 12T for Cinder (block storage, similar to EBS) 12T for Swift (object storage, similar to S3) Gigabit network
Software configuration compute nodes installed via pxe/debseed with Ubuntu 12.04. OpenStack Folsom (late 2012) Deployment and configuration automated through CFEngine. no over-committing of CPUs or memory. (does not make really sense for HPC)
Network configuration no shared storage (no need to migrate instances). Swift storage (for users and for storing VM images). nova-network instead of quantum (quantum lacks of features and it s unstable). Flat networking (because of network constraints at UZH) Automatic assignment of public IPs.
Deployment phase Deploying OpenStack on a production environment is not straightforward: Hard to understand The Big Picture. Many services involved. Configuration not easy to automate/replicate. 1 Documentation does not always help (see next slides). @UZH: 1 month for a very basic setup. 1 you often need to issue commands and parse their output; a configuration file would be easier for automation.
What we didn t like
Official documentation and community guides Covers the trivial case, which is not usually the one you want to implement in production. Many HOWTOs, no decent Reference Guide or Administrator Manual. Often not up to date or incorrect! The OpenStack Operations Guide recently published helps a bit.
Official documentation and community guides Covers the trivial case, which is not usually the one you want to implement in production. Many HOWTOs, no decent Reference Guide or Administrator Manual. Often not up to date or incorrect! The OpenStack Operations Guide recently published helps a bit.
Projects - current implementation Each user can belong to multiple projects. Each user can have different roles in different projects. Images, instances, swift containers and objects always belong to one and only one project. Images, instances, swift containers and objects can either be public or private to the project. Sharing is done only using projects
Projects - current implementation Each user can belong to multiple projects. Each user can have different roles in different projects. Images, instances, swift containers and objects always belong to one and only one project. Images, instances, swift containers and objects can either be public or private to the project. Sharing is done only using projects
Projects - current implementation Each user can belong to multiple projects. Each user can have different roles in different projects. Images, instances, swift containers and objects always belong to one and only one project. Images, instances, swift containers and objects can either be public or private to the project. Sharing is done only using projects
Projects - current implementation Each user can belong to multiple projects. Each user can have different roles in different projects. Images, instances, swift containers and objects always belong to one and only one project. Images, instances, swift containers and objects can either be public or private to the project. Sharing is done only using projects
Projects - current implementation Each user can belong to multiple projects. Each user can have different roles in different projects. Images, instances, swift containers and objects always belong to one and only one project. Images, instances, swift containers and objects can either be public or private to the project. Sharing is done only using projects
Projects - why this is bad (sharing) A user belongs to one single group at the time. you cannot use an image from project A and run it in project B you cannot access a *volume* from Project A when running as member of project B You cannot share something with just one user.
Projects - why this is bad (sharing) A user belongs to one single group at the time. you cannot use an image from project A and run it in project B you cannot access a *volume* from Project A when running as member of project B You cannot share something with just one user.
Projects - why this is bad (sharing) A user belongs to one single group at the time. you cannot use an image from project A and run it in project B you cannot access a *volume* from Project A when running as member of project B You cannot share something with just one user.
Projects - why this is bad (security) Security is clearly not top-priority for the OpenStack development team. Each member of a project can: terminate everybody s instances. delete everybody s images. A user cannot change its own password (for security reasons!?!) If you have the admin role on a project, you are the administrator OF THE WHOLE OPENSTACK INSTALLATION!
Projects - why this is bad (security) Security is clearly not top-priority for the OpenStack development team. Each member of a project can: terminate everybody s instances. delete everybody s images. A user cannot change its own password (for security reasons!?!) If you have the admin role on a project, you are the administrator OF THE WHOLE OPENSTACK INSTALLATION!
Projects - why this is bad (security) Security is clearly not top-priority for the OpenStack development team. Each member of a project can: terminate everybody s instances. delete everybody s images. A user cannot change its own password (for security reasons!?!) If you have the admin role on a project, you are the administrator OF THE WHOLE OPENSTACK INSTALLATION!
Projects - why this is bad (security) Security is clearly not top-priority for the OpenStack development team. Each member of a project can: terminate everybody s instances. delete everybody s images. A user cannot change its own password (for security reasons!?!) If you have the admin role on a project, you are the administrator OF THE WHOLE OPENSTACK INSTALLATION!
Security (networking) (1/2) (talking about nova-network, not quantum) Security groups only protect you from machines on different networks by default. You cannot change the security group of a VM while it s running. 2 FlatDHCP network driver is the easiest to setup but does not support any network separation between projects. 2 You can change the rules of the chosen security group, but this will affect also other instances.
Security (networking) (2/2) The VLAN network driver allows network separation but at cost of increased complexity: must create one VLAN for each group on all the switches. must create a network for each project. steps hard to automate! (need specific support on the switches) Quantum should solve some of these issues, but the complexity is even bigger!
Other security concerns In the past, many nasty bugs were found on various OpenStack components Weak authentication for services (passwords instead of SSL certificates). Files containing sensible passwords are usually world-readable. By default API services does not use SSL certificates (should be a requirement). Just to say one: glance stores its swift login and password with each image URL in the internal database.
What we actually liked
What we like (1/2) Basic workflow works well and it s reliable: start/stop machines create/delete images create/delete snapshots associate public IPS security groups (for public IPs) nova-network with a basic setup works without issues. scaling of VMs.
What we like (2/2) web interface is essential but easy to use. powerful command lines. Decent EC2 API compatibility. Very important to produce tools that can work both with Amazon and OpenStack Responsive community.
Future works
Future works Implement High-Availability for central services. Testing alternative storage systems Different use cases need different storages. thinking of moving from swift to Ceph. taking a look to quantum (but not so close) DO NOT update to Grizzly (yet) It s nice to have the latest shiny features but it s even better to have a working, reliable system.
Questions?