Tuesday, 17 February 2015

Delegation of rights

At CERN, we have 1st and 2nd level support teams to run the computer centre infrastructure. These groups provide 24x7 coverage for problems and initial problem diagnosis to determine which 3rd line support team needs to be called in the event of a critical problem. Typical operations required are
  • Stop/Start/Reboot server
  • Inspect console
When we ran application services on physical servers, these activities could be performed using a number of different technologies
  • KVM switches
  • IPMI for remote maagement
  • Power buttons and the console trolley
With a virtual infrastructure, the applications are now running on virtual machines within a project. These operations are not available by default for the 1st and 2nd level teams since only the members of the project can perform these commands. On the other hand, the project administrator rights contain other operations (such as delete or rebuild servers) which are not needed by these teams.

To address this, we have defined an OpenStack policy for the projects concerned. This is an opt-in process so that the project administrator needs to decide whether these delegated rights should be made available (either at project creation or later).

Define operator role

The first step is to define a new role, operator, for the projects concerned. This can be done through the GUI (http://docs.openstack.org/user-guide-admin/content/section_dashboard_admin_manage_roles.html) or via the CLI (http://docs.openstack.org/user-guide-admin/content/admin_cli_manage_projects_users.html). In CERN's case, we include it into the workflow in the project creation.

On a default configuration,

$ keystone role-list
|                id                |      name     |
| ef8afe7ea1864b97994451fbe949f8c9 | ResellerAdmin |
| 8fc0ca6ef49a448d930593e65fc528e8 | SwiftOperator |
| 9fe2ff9ee4384b1894a90878d3e92bab |    _member_   |
| 172d0175306249d087f9a31d31ce053a |     admin     |

A new role operator needs to be defined, using the steps from the documentation

 $ keystone role-create --name operator
| Property |              Value               |
|    id    | e97375051a0e4bdeaf703f5a90892996 |
|   name   |             operator             |

and the new role will then appear in the keystone role-list.

Now add a new user operator1

$ keystone user-create --name operator1 --pass operatorpass
| Property |              Value               |
|  email   |                                  |
| enabled  |               True               |
|    id    | f93a50c12c164f329ee15d4d5b0e7999 |
|   name   |            operator1             |
| username |            operator1             |

and add the  operator1 account to the role

$ keystone user-role-add --user operator1 --role operator  --tenant demo
$ keystone user-role-list --tenant demo --user operator1

A similar role is defined for accounting which is used to allow the CERN accounting system read-only access to data about instances so that an accounting report can be produced without needing OpenStack admin rights.

For mapping which users are given this role, we use the Keystone V3 functions available through the OpenStack unified CLI.

$ openstack role add --group operatorGroup --role operator  --tenant demo

Using a group operatorGroup, we are able to define the members in Active Directory and then have those users updated automatically with consistent role sets. The users can also be added explicitly

$ openstack role add --user operator1 --role operator  --tenant demo

Update nova policy

The key file is called policy.json in /etc/nova which defines the roles and what they can do. There are two parts to the rules, firstly a set of groupings which give a human readable description for a complex rule set such as a member is someone who is not an accounting role and not an operator:

    "context_is_admin":  "role:admin",
    "context_is_member": "not role:accounting and not role:operator",
    "admin_or_owner":  "is_admin:True or (project_id:%(project_id)s and rule:context_is_member)",
    "default": "rule:admin_or_owner",
    "default_or_operator": "is_admin:True or (project_id:%(project_id)s and not role:accounting)",

The particular rules are relatively self-descriptive.

The actions can then be defined using these terms

    "compute:get_all": "rule:default_or_operator",
    "compute:get_all_tenants": "rule:default_or_operator",
    "compute:reboot":"rule:default_or_operator", "compute:get_vnc_console":"rule:default_or_operator",
    "compute_extension:console_output": "rule:default_or_operator",
    "compute_extension:consoles": "rule:default_or_operator",

With this, a user group can be defined to allow stop/start/reboot/console while not being able to perform the more destructive operations such as delete.

Thursday, 5 February 2015

Choosing the right image

Over the past 18 months of production at CERN, we have provided a number of standard images for the end users to use when creating virtual machines.
  • Linux
    • Scientific Linux CERN 5
    • Scientific Linux CERN 6
    • CERN CentOS 7
  • Windows
    • Windows 7
    • Windows 8
    • Windows Server 2008
    • Windows Server 2012
To accelerate deployment of new VMs, we also often have
  • Base images which are the minimum subset of packages on which users can build their custom virtual machines using features such as cloud-init or Puppet to install additional packages
  • Extra images which contain common additional packages and accelerate the delivery of a working environment. These profiles are often close to a desktop like PC-on-demand. Installing a full Office 2013 suite on to a new virtual machine can take over one hour so preparing this in advance saves time for the users.
However, with each of these images and additional software, there is a need to maintain the image contents up to date.
  • Known security issues should be resolved within the images rather than relying on the installation of new software after the VM has booted
  • Installation of updates slows do the instantiation of virtual machines
The images themselves are therefore rebuilt on a regular basis and published to the community as public images. The old images, however, should not be deleted as they are needed in the event of a resize or live migration (see https://review.openstack.org/#/c/90321/). Images cannot be replaced in Glance since this would lead to inconsistencies on the hypervisors.

As a result, the number of images in the catalog increases on a regular basis. For the web based end user, this can make navigating the Horizon GUI panel for the images difficult and increase the risk that an out of date image is selected.

 The approach that we have taken is to build on the image properties (http://docs.openstack.org/cli-reference/content/chapter_cli-glance-property.html) which allow the image maintainer to tag images with attributes. We use the following from the standard list

  • architecture => "x86_64", "i686"
  • os_distro => "slc", "windows"
  • os_distro_major => "6", "5"
  • os_distro_minor => "4"
  • os_edition => "<Base|Extra>" which set of additional packages were installed into the image.
  • release_date => "2014-05-02T13:02:00" for at what date was the image made available to the public user
  • custom_name => "A custom name" allows a text string to override the default name (see below)
  • upstream_provider => URL gives a URL to contact in the event of problems with the image. This is useful where the image is supplied by a 3rd party and the standard support lines should not be used.

We also defined additional fields

With these additional fields, the latest images can be selected and a subset presented for the end user to choose.

The algorithm used is as follows with the sorting sequence

  1. os_distro ASC
  2. os_distro_major DESC
  3. os_distro_minor DESC
  4. architecture DESC
  5. release_date DESC
Images which are from previous releases (i.e. where os_distro, os_distro_major and architecture are the same) are only shown if the 'All' tab is selected.

The code is in preparation to be proposed as an upstream patch. For the moment, it can be found in the CERN github repository (https://github.com/cernops/horizon)

Tuesday, 27 January 2015

Exceeding tracked connections

As we increase the capacity of the CERN OpenStack cloud, we've noticed a few cases of an interesting problem where hypervisors lose network connectivity. These hypervisors are KVM based running Scientific Linux CERN 6. The cloud itself is running Icehouse using Nova network.

Connection tracking refers to the ability to maintain state information about a connection in memory tables, such as source and destination ip address and port number pairs (known as socket pairs), protocol types, connection state and timeouts. Firewalls that do this are known as stateful. Stateful firewalling is inherently more secure than its "stateless" counterpart .... simple packet filtering.
More details are available at [1].

On busy hypervisors, in the syslog file, we have messages such as

Jan  4 23:14:44 hypervisor kernel: nf_conntrack: table full, dropping packet.

Searching around the internet, we found references to a number of documents [2][3] discussing the limit.

It appears that the default algorithm is pretty simple. For a 64 bit hypervisor,
  • If RAM < 1 GB, the maximum conntrack is set to RAM in bytes  / 32768
  • Otherwise, set to 65536
Our typical hypervisors contain 48GB of memory and 24 cores so a busy server handling physics distributed data access can easily use 1000s of connections, especially if sockets are not being closed correctly. With several instances of these servers on a hypervisor, it is easy to reach the 65536 limit and start to drop new connections.

To keep an eye on the usage, the current and maximum values can be checked using sysctl.

The current usage can be checked using

# sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 6650

The maximum value can be found as follows

# sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 65536

To avoid overload on the hypervisors, the hypervisor conntrack value can be increased and should then be set to the sum of the connections expected from each virtual machine. This can be done using the /etc/sysctl.d directory or with an appropriate configuration management tool.

Note that you'll need to set both net.netfilter.nf_conntrack_max as well as  net.nf_conntrack_max. For the CERN OpenStack cloud we have increased the values from 64k to 512k.

Thursday, 8 January 2015

Using bitnami images with OpenStack

At CERN, we generally use puppet to configure our production services using modules from puppetforge to quickly set up the appropriate parameters and services.

However, it is often interesting to try out a new software package for a quick investigation. For this in the past, people have used Bitnami on test systems or their laptops where they installed the operating system and then installed the Bitnami application packages.

With an OpenStack cloud, deploying Bitnami configurations can be even more quickly achieved. We are running OpenStack Icehouse and KVM or Hyper-V hypervisors.

The steps are as follows
  • Download the cloud image from Bitnami
  • Load the image into Glance
  • Deploy the image
  • Check the console for messages using Horizon
  • Use the application!
Since the operating system comes with the image, it also avoids issues with pre-requisites or unexpected configurations.

Getting the images from Bitnami

Bitnami provides installers which can be run on operating systems that have been previously installed but also cloud images which include the appropriate operating systems in the virtual machine image.

There are a number of public clouds supported such as Amazon and Azure but also private cloud images for VMware and Virtual Box. For this example, we use the images for Virtual Box as there is a single image file.

A wide variety of appliances are available. For this case, we show the use of Wordpress (of course :-). The Wordpress virtual machine private cloud list gives download links. This contains the appliance image of Ubuntu, middleware and Wordpress in a zip file containing

  • A OVF metadata file
  • A VMDK disk image
We only use the VMDK file.

Loading the VMDK into Glance

A VMDK file is a disk image like a QEMU KVM qcow2 one. KVM also supports VMDK so it is possible to load it directly into Glance. Alternatively, it can be converted into qcow2 using qemu-img if this is needed.

glance image-create --name wordpress-bitnami-vmdk  --file bitnami-wordpress-4.1-0-ubuntu-14.04-OVF-disk1.vmdk --disk-format vmdk --container-format=bare 

This creates the entry into Glance so that new VMs can be created.

Creating a new VM

The VM can then be instantiated from this image. 

nova boot --flavor m1.medium --image wordpress-bitnami-vmdk --key-name keypair hostname

The keyname and hostname text should be replaced by your preferred key pair and VM host name.

Check console

Using the graphical interface, you can see the details of the deployment. Since ssh is not enabled by default, you can't log in. Once booted, you will see a screen such as below which gives instructions on how to access the application.

If you wish to log in to the Linux shell, check the account details at the application page on Bitnami.

Use the application

The application can be accessed using the web URL shown on the console above. This allows you to investigate the potential of the application by running a few simple OpenStack and web commands for a working application instance in a few minutes.

Note: it is important that the application is kept up to date. Ensure you follow the Bitnami updates and ensure appropriate security of the server.

Thursday, 6 November 2014

Our cloud in Icehouse

This is going to be a very simple blog to write.

When we upgraded the CERN private cloud to Havana, we wrote a blog post giving some of the details of the approach taken and the problems we encountered.

The high level approach we took rthis time was the same, component by component. The first ones were upgraded during August and Nova/Horizon last of all in October.

The difference this time is that no significant problems were encountered during the production upgrade.

The most time consuming upgrade was Nova. As last time, we took a very conservative approach and disabled the API access to the cloud during the upgrade. With offline backups and the database migration steps taking several hours given the 1000s of VMs and hypervisors, the API unavailability was around six hours in total. All VMs continued to run during this period without issues.

Additional Functions

With the basic functional upgrade, we are delivering the following additional functions to the CERN end users. These are all based off the OpenStack Icehouse release functions and we'll try to provide more details on these areas in future blogs.
  • Kerberos and X.509 authentication
  • Single Sign On login using CERN's Active Directory/ADFS service and the SAML federation functions from Icehouse
  • Unified openstack client for consistent command lines across the components
  • Windows console access with RDP
  • IPv6 support for VMs
  • Horizon new look and feel based on RDO styling
  • Delegated rights to the operators and system administrators to perform certain limited activites on VMs such as reboot and console viewing so they can provide out of working hours support.


Along with the CERN team, many thanks should go to the OpenStack community for delivering a smooth upgrade across Ceilometer, Cinder, Glance, Keystone, Nova and Horizon.

Tuesday, 21 October 2014

Kerberos and Single Sign On with OpenStack

External Authentication with Keystone

One of the most commonly requested features by the CERN cloud user community is support for authentication using Kerberos on the command line and single-sign on with the OpenStack dashboard.

In our Windows and Linux environment, we run Active Directory to provide authentication services. During the Essex cycle of OpenStack, we added support for getting authentication based on Active Directory passwords. However, this had several drawbacks:
  • When using the command line clients, the users had the choice of storing their password in environment variables such as with the local openrc script or re-typing their password with each OpenStack command. Passwords in environment variables has significant security risks since they are passed to any sub-command and can be read by the system administrator of the server you are on.
  • When logging in with the web interface, the users were entering their password into the dashboard. Most of CERN's applications use a single sign on package with Active Directory Federation Services (ADFS). Recent problems such as Heartbleed show the risks of entering passwords into web applications.
The following describes how we configured this functionality.


With our upgrade to Icehouse completed last week, the new release of the v3 identity API, Keystone now supports several authentication mechanisms through plugins. By default password, token and external authentication were provided. In this scenario, other authentication methods such Kerberos or X.509 can be used with a proper apache configuration and the external plugin provided in keystone. Unfortunately, when enabling these methods on apache, there is no way to make them optional so the client can choose the most appropriate.

Also when checking the projects he can access, the client normally does two operations on keystone, one to retrieve the token, and the other one with the token to retrieve the project list. Even if it is specified in the environment variables, the second call always uses the catalog, so if in the catalog has version 2 and we are using version 3 then we have an exception while doing the API call.


In this case we need a solution that allows us to use Kerberos, X.509 or another authentication mechanism in a transparent way and also backwards compatible, so we can offer both APIs and let the user choose which is the most appropriate for its workflow. This will allow us to migrate services from one API version to the next one with no downtime.

In order to allow external authentication to our clients, we need to cover two parts, client side and server side. Client side to distinguish which is the auth plugin to use, and Server side to allow multiple auth methods and API versions at once.

Server Solution

In order to have different entry points under the same api, we would need a load balancer, in this particular case we use HAproxy. From this load balancer we are calling two different sets of backend machines, one for version 2 of the API and the other for version 3. In this loadbalancer, we can analyze the version of the url where the client is connecting to so we can redirect him to the appropriate set. Each backend is running keystone under apachea and it is connected to the same database. We need this to allow tokens to be validated no matter the version is used on the client. The only difference between the backend sets is the catalog, the identity service is different on both pointing the client to the available version on each set. For this particular purpose we will use a templatedcatalog.

Right now we solve the multiversion issue of the OpenStack environment, but we didn't allow Kerberos or X.509. As these methods are not optional we may need different entry points for each authentication plugin used. So we need entry points for OpenStack authentication (password, token), Kerberos and X.509. There is no issue with the catalog if we enable these methods, all of them can be registered on the service catalog like normal OpenStack authentication, because any consequent call on the system will use token based authentication.
So in the apache v3 backend we have the following urls defined:


If you post an authentication request to the Kerberos url, this will require a valid Kerberos token, in case it is not sent it will initiate a challenge. After validating it, it will it the user as the REMOTE_USER. In case of client certificate authentication, you will use the X.509 url that will require a valid certificate, in this case it will use the DN as the REMOTE_USER. After this variable is set, then Keystone can take over and check the user in the Keystone database.
There is a small caveat, we cannot do offloading of SSL client authentication on the HAproxy, so for this purpose we need to connect directly from the client, it uses a different port 8443 and connects directly to the backends configured. So for X.509 authentication we use 'https://mykeystone:8443/x509/v3'

Client Solution

For the client side, the plugin mechanism will only be available on the common cli (python-openstackclient) and not on the rest of the toolset (nova, glance, cinder, ...). There is no code yet that implements the plugin functionality, so in order to provide a short term implementation, and based on our current architecture, we can base it the selection of the plugin on the OS_AUTH_URL for the moment. The final upstream implementation will almost certainly differ at this point by using a parameter or discover the auth plugins available. In that case the client implementation may change but this is likely to be close to the initial implementation.

In openstackclient/common/clientmanager.py
        if 'krb' in auth_url and ver_prefix == 'v3':
            LOG.debug('Using kerberos auth %s', ver_prefix)
            self.auth = v3_auth_kerberos.Kerberos(
        elif 'x509' in auth_url and ver_prefix == 'v3':
            LOG.debug('Using x509 auth %s', ver_prefix)
            self.auth = v3_auth_x509.X509(
        elif self._url:

HAproxy configuration

  chroot  /var/lib/haproxy
  group  haproxy
  log  mysyslogserver local0
  maxconn  8000
  pidfile  /var/run/haproxy.pid
  stats  socket /var/lib/haproxy/stats
  tune.ssl.default-dh-param  2048
  user  haproxy

  log  global
  maxconn  8000
  mode  http
  option  redispatch
  option  http-server-close
  option  contstats
  retries  3
  stats  enable
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

frontend cloud_identity_api_production
  bind ssl no-sslv3 crt /etc/haproxy/cert.pem verify none
  acl  v2_acl_admin url_beg /admin/v2
  acl  v2_acl_main url_beg /main/v2
  default_backend  cloud_identity_api_v3_production
  timeout  http-request 5m
  timeout  client 5m
  use_backend  cloud_identity_api_v2_production if v2_acl_admin
  use_backend  cloud_identity_api_v2_production if v2_acl_main

frontend cloud_identity_api_x509_production
  bind ssl no-sslv3 crt /etc/haproxy/cert.pem ca-file /etc/haproxy/ca.pem verify required
  default_backend  cloud_identity_api_v3_production
  rspadd  Strict-Transport-Security:\ max-age=15768000
  timeout  http-request 5m
  timeout  client 5m
  use_backend  cloud_identity_api_v3_production if { ssl_fc_has_crt }

backend cloud_identity_api_v2_production
  balance  roundrobin
  stick  on src
  stick-table  type ip size 20k peers cloud_identity_frontend_production
  timeout  server 5m
  timeout  queue 5m
  timeout  connect 5m
  server cci-keystone-bck01 check ssl verify none
  server cci-keystone-bck02 check ssl verify none
  server p01001453s11625 check ssl verify none

backend cloud_identity_api_v3_production
  balance  roundrobin
  http-request  set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)]
  stick  on src
  stick-table  type ip size 20k peers cloud_identity_frontend_production
  timeout  server 5m
  timeout  queue 5m
  timeout  connect 5m
  server cci-keystone-bck03 check ssl verify none
  server cci-keystone-bck04 check ssl verify none
  server cci-keystone-bck05 check ssl verify none
  server cci-keystone-bck06 check ssl verify none

listen stats
  stats  uri /
  stats  auth haproxy:toto1TOTO$

peers cloud_identity_frontend_production
  peer cci-keystone-load01.cern.ch
  peer cci-keystone-load02.cern.ch
  peer p01001464675431.cern.ch
Apache configuration
WSGISocketPrefix /var/run/wsgi

Listen 443

<VirtualHost *:443>
  ServerName keystone.cern.ch
  DocumentRoot /var/www/cgi-bin/keystone
  LimitRequestFieldSize 65535

  SSLEngine On
  SSLCertificateFile      /etc/keystone/ssl/certs/hostcert.pem
  SSLCertificateKeyFile   /etc/keystone/ssl/keys/hostkey.pem
  SSLCertificateChainFile /etc/keystone/ssl/certs/ca.pem
  SSLCACertificateFile    /etc/keystone/ssl/certs/ca.pem
  SSLVerifyClient         none
  SSLOptions              +StdEnvVars
  SSLVerifyDepth          10
  SSLUserName             SSL_CLIENT_S_DN_CN
  SSLProtocol             all -SSLv2 -SSLv3

  SSLHonorCipherOrder     on
  Header add Strict-Transport-Security "max-age=15768000"

  WSGIDaemonProcess keystone user=keystone group=keystone processes=2 threads=2
  WSGIProcessGroup keystone

  WSGIScriptAlias /admin /var/www/cgi-bin/keystone/admin
  <Location "/admin">
    SSLVerifyClient       none

  WSGIScriptAlias /main /var/www/cgi-bin/keystone/main
  <Location "/main">
    SSLVerifyClient       none

  WSGIScriptAlias /krb /var/www/cgi-bin/keystone/main

  <Location "/krb">
    SSLVerifyClient       none

  <Location "/krb/v3/auth/tokens">
    SSLVerifyClient       none
    AuthType              Kerberos
    AuthName              "Kerberos Login"
    KrbMethodNegotiate    On
    KrbMethodK5Passwd     Off
    KrbServiceName        Any
    KrbAuthRealms         CERN.CH
    Krb5KeyTab            /etc/httpd/http.keytab
    KrbVerifyKDC          Off
    KrbLocalUserMapping   On
    KrbAuthoritative      On
    Require valid-user

  WSGIScriptAlias /x509 /var/www/cgi-bin/keystone/main

  <Location "/x509">
    Order allow,deny
    Allow from all

  WSGIScriptAliasMatch ^(/main/v3/OS-FEDERATION/identity_providers/.*?/protocols/.*?/auth)$ /var/www/cgi-bin/keystone/main/$1

  <LocationMatch /main/v3/OS-FEDERATION/identity_providers/.*?/protocols/saml2/auth>
    ShibRequestSetting requireSession 1
    AuthType shibboleth
    ShibRequireSession On
    ShibRequireAll On
    ShibExportAssertion Off
    Require valid-user

  <LocationMatch /main/v3/OS-FEDERATION/websso>
    ShibRequestSetting requireSession 1
    AuthType shibboleth
    ShibRequireSession On
    ShibRequireAll On
    ShibExportAssertion Off
    Require valid-user

  <Location /Shibboleth.sso>
    SetHandler shib

  <Directory /var/www/cgi-bin/keystone>
    Options FollowSymLinks
    AllowOverride All
    Order allow,deny
    Allow from all


The code of python-openstackclient as long as the python-keystoneclient that we are using for this implementation is available at:

We will be working with the community in the Paris summit to find the best way to integrate this functionality into the standard OpenStack release.


The main author is Jose Castro Leon with help from Marek Denis.

Many thanks to the Keystone core team for their help and advice on the implementation.

Saturday, 19 July 2014

OpenStack plays Tetris : Stacking and Spreading a full private cloud

At CERN, we're running a large scale private cloud which is providing compute resources for physicists analysing the data from the Large Hadron Collider. With 100s of VMs created per day, the OpenStack scheduler has to perform a Tetris like job to assign the different flavors of VMs falling to the specific hypervisors.

As we increase the number of VMs that we're running on the CERN cloud, we see the impact of a number of configuration choices made early on in the cloud deployment. One key choice is how to schedule VMs across a pool of hypervisors.

We provide our users with a mixture of flavors for their VMs (for details, see http://openstack-in-production.blogspot.fr/2013/08/flavors-english-perspective.html).

During the past year in production, we have seen a steady growth in the number of instances to nearly 7,000.

At the same time, we're seeing an increasing elastic load as the user community explores potential ways of using clouds for physics.

Given that CERN has a fixed resource pool and the budget available is defined and fixed, the underlying capacity is not elastic and we are now starting to encounter scenarios where the private cloud can become full. Users see this as errors when they request VMs that no free hypervisor could be located.

This situation occurs more frequently for the large VMs. Physics programs can make use of multiple cores to process physics events in parallel and our batch system (which runs on VMs) benefits from a smaller number of hosts. This accounts for a significant number of large core VMs.

The problem occurs as the cloud approaches being full. Using the default OpenStack configuration (known as 'spread'), VMs are evenly distributed across the hypervisors. If the cloud is running at low utilisation, this is an attractive configuration as CPU and I/O load are also spread and little hardware is left idle.

However, as the utilisation of the cloud increases, the resources free on each hypervisor are reduced evenly. To take a simple case, a cloud with two compute nodes of 24 cores handling a variety of flavors. If there are requests for two 1-core VMs followed by one 24 core flavor, the alternative approaches can be simulated.

In a spread configuration,
  • The first VM request lands on hypervisor A leaving A with 23 cores available and B with 24 cores
  • The second VM request arrives and following the policy to spread the usage, this is scheduled to hypervisor B, leaving A and B with 23 cores available.
  • The request for one 24 core flavor arrives and no hypervisor can satisfy it despite there being 46 cores available and only 4% of the cloud used.
In the stacked configuration,

  • The first VM request lands on hypervisor A leaving A with 23 cores available and B with 24 cores
  • The second VM request arrives and following the policy to stack the usage, this is scheduled to hypervisor A, leaving A with 22 cores and B with 24 cores available.
  • The request for one 24 core flavor arrives and is satisfied by B
A stacked configuration is configured using the RAM weight being negative (i.e. prefer machines with less RAM). This has the effect to pack the VMs. This is done through a nova.conf setting as follows


When a cloud is initially being set up, the question of maximum packing does not often come up in the early days. However, once the cloud has workload running under spread, it can be disruptive to move to stacked since the existing VMs will not be moved to match the new policy.

Thus, it is important as part of the cloud planning to reflect on the best approach for each different cloud use case and avoid more complex resource rebalancing at a later date.


  • OpenStack configuration reference for scheduling at http://docs.openstack.org/trunk/config-reference/content/section_compute-scheduler.html