Following on from previous upgrades, CERN migrated the OpenStack cloud to Kilo during September to November. Along with the bug fixes, we are planning on exploiting the significant number of new features, especially as related to performance tuning. The overall cloud architecture was covered at the Tokyo OpenStack summit video https://www.openstack.org/summit/tokyo-2015/videos/presentation/unveiling-cern-cloud-architecture.
As the LHC continues to run 24x7, these upgrades were done while the cloud was running and virtual machines were untouched.
Previous upgrades have been described as below
- Juno - http://openstack-in-production.blogspot.fr/2015/05/our-cloud-in-juno.html
- Icehouse - http://openstack-in-production.blogspot.fr/2014/11/our-cloud-in-icehouse.html
- Havana - http://openstack-in-production.blogspot.fr/2014/02/our-cloud-in-havana.html
- Cinder - we encountered the bug https://bugs.launchpad.net/cinder/+bug/1455726 which led to a foreign key error. The cause appears to be related to UTF8. The patch (https://review.openstack.org/#/c/183814/) was not completed so did not get included into the release. More details at the thread at http://lists.openstack.org/pipermail/openstack/2015-August/013601.html.
- Keystone - one of the configuration parameters for caches had changed syntax and this was not reflected in the configuration generated by Puppet. The symptoms were high load on the Keystone servers since caching was not enabled.
- Glance - given the rolling upgrade on Glance, we took advantage of having virtualised the majority of the Glance server pool. This allows new resources to be brought online with a Juno configuration and the old ones deleted.
- Following the upgrade, we had an outage of the metadata service for the OpenStack specific metadata. The EC2 metadata works fine. This appears to be a cells related issue.
- The VM resize functions are giving errors during the execution. We're tracking this with the upstream developers.
- We wanted to use the latest Nova NUMA features. We encountered a problem with cells and this feature, although it worked well in a non-cells cloud. This is being tracked in https://bugs.launchpad.net/nova/+bug/1517006. We will use the new features for performance optimisation once these problems are resolved.
Catching these cases with cells early is part of the work for the scope of the the Cell V2 project at https://wiki.openstack.org/wiki/Nova-Cells-v2 to which we are contributing along with the BARC centre in Mumbai so that the cells configuration becomes the default (with only a single cell) and the upstream test cases are enhanced to validate the multi cell configuration.
As some of the hypervisors are still running Scientific Linux 6, we used the approach from GoDaddy to package the components using software collections. Details are available at https://github.com/krislindgren/openstack-venv-cent6. We used this for nova and ceilometer which are the agents installed on the hypervisors. The controllers were upgraded to CentOS 7 as part of the upgrade to Kilo.
Overall, getting to Kilo enables new features and includes bug fixes to reduce administration effort. Keeping up with new releases requires careful planning and sharing upstream activities such as the Puppet modules but has proven to be the best approach. With many of the CERN OpenStack team in the summit in Tokyo, we did not complete the upgrade before Liberty was released but this has been completed soon afterwards.
With the Kilo base in production, we are now ready to start work on the Nova network to Neutron migration, deployment of the new EC2 API project and enabling Magnum for container native applications.