Tuesday, 5 May 2015

Purging Nova databases in a cell environment

CERN cloud infrastructure is running in production since July 2013 and during almost 2 years more than 1,000,000 production VMs were created in the infrastructure.

During the last few months, we have had an average of 11,000 VMs running with a creation/deletion rate between 100 and 200 VMs per hour.

Number of VMs created (cumulative)

Difference between VMs created and deleted per month (cumulative)

The information of all these instances is stored in the database and remains when the instances are deleted because nova uses “soft” delete (mark the records as deleted without removing them). As a consequence, the database size grows overtime being more difficult to manage and increasing operations time.

At CERN, we have the policy to preserve deleted instances information for 3 months in the database before removing them.

Nova has the functionality to move the deleted instances to “shadow” tables (nova-manage db archive_deleted_rows [--max_rows <number>]). This functionality can 
remove all deleted entries in the main tables or a maximum number of rows can be specified. However, a row doesn’t mean an instance because some tables have several entries for the same instance. Also, in a cloud that uses cells running the “archive_deleted_rows” with “max_rows” defined will not keep the top and children cells in sync.

In order to remove the database entries of deleted instances in a cell environment with a grace period we developed a small tool that is available at:

We start removing deleted instances from the top database defining a date until the rows should be deleted.
python cern-db-purge --date "2015-02-01 00:00:00" --config nova.conf

Cascading doesn’t work in nova database so the script checks if the instances were deleted before the specified date and removes all the rows associated with them in the different tables. Also, it saves the uuid and some more information about an instance in a file. We decided to not delete the instances from top and children at same time to have more operational control during the interventions. 

This file is used to remove the deleted instances from the children cells. 
python cern-db-purge --file "delete_these_instances.txt" --cell 'top_cell!child_cell_01' --config nova.conf

The script goes through all instances in the file for that specific child cell and after some consistency checks removes all the rows related with the instances.
In this way we make sure that if an instance is removed from the top cell is also deleted from the children cells.

Depending in the database size the tool can take several hours to run.

This tool needs access to nova database and was tested with Icehouse release. The database endpoint should be defined in the configuration file (it can be nova.conf). Since it reads and updates the database, the administrator should be extremely careful when using it.