Hyperthreading in the cloud

The cloud at CERN is used for a variety of different purposes from running personal VMs for development/test, bulk throughput computing to analyse the data from the Large Hadron Collider to long running services for the experiments and the organisation.

The configuration of many of the hypervisors is carefully tuned to maximise the compute throughput, i.e. getting as much compute work done in a given time rather than optimising the individual job performance. Many of the workloads are also nearly all embarrassingly parallel, i.e. each unit of compute can be run without needing to communicate with other jobs. A few workloads, such as QCD, need classical High Performance Computing but these are running on dedicated clusters with Infiniband interconnect compared to the typical 1Gbit/s or 10Gbit/s ethernet for the typical hypervisor.

CERN has a public procurement procedure which awards tenders to the bid with the lowest price for a given throughput compliant with the specifications. The typical CERN hardware configuration is based on a dual socket configuration and must have at least 2GB/core.

Intel provides a capability for doubling the number of cores on the underlying processor called Simultaneous multithreading or SMT. From the machine perspective, this appears as double the number of cores compared to non-SMT configurations. Enabling SMT requires a BIOS parameter change so resources need to be defined in advance and appropriate capacity planning to define the areas of the cloud which are SMT on or off statically.

The second benefit of an SMT off configuration is the memory per core doubles. A server with 32 SMT on cores and 64GB of memory with hyper-threading has 2GB per core. A change to 16 cores by dropping SMT leads to 4GB per core which can be useful for some workloads.

Setting the BIOS parameters for a subset of the hypervisors causes multiple difficulties

With older BIOSes, this is a manual operation. New tools are available on the most recent hardware so this is an operation which can be performed with a Linux program and a reboot.
A motherboard replacement requires that the operation is repeated. This can be overlooked as part of the standard repair activities.
Capacity planning requires allocation of appropriate blocks of servers. At CERN, we use OpenStack cells to allow the cloud to scale to our needs with each cells having a unique hardware configuration such as particular processor/memory configuration and thus dedicated cells need to be created for the SMT off machines. When these capacities are exceeded, the other unused cloud resources cannot be trivially used but further administration reconfiguration is required.

The reference benchmark for High Energy Physics is HEPSpec06, a subset of the Spec benchmarks which match the typical instruction workload. Using this, run in parallel on each of the cores in a machine, the throughput provided by a given configuration can be measured.

SMT	VM configuration	Throughput HS06
On	2 VMs each 16 cores	351
On	4 VMs each 8 cores	355
Off	1 VM of 16 cores	284.5

Thus, the total throughput of the server with SMT off is significantly less (284.5 compared to 351) but the individual core performance is higher (284.5/16=17.8 compared to 351/32=11). Where an experiment workflow is serialised for some of the steps, this higher single core performance was a significant gain, but at an operational cost.

To find a cheaper approach, the recent additions of NUMA flavors in OpenStack was used. The hypervisors were configured with SMT on but a flavor was created to only use half of the cores on the server with 4GB/core so that the hypervisors were under committed on cores but committed by memory to avoid another VM being allocated to the unused cores. In our configuration, this was done by adding numa_nodes=2 to the flavor and the NUMA aware scheduler does the appropriate allocation.

This configuration was benchmarked and compared with the SMT On/Off.

SMT	VM configuration	Throughput HS06
On	2 VMs each 16 cores	351
Off	1 VM of 16 cores	284.5
On	1 VM of 16 cores with numa_nodes=2	283

The new flavor shows similar characteristics to the SMT Off configuration without requiring the BIOS setting change and can therefore be deployed without needing the configuration of dedicated cells with a particular hardware configuration. The Linux and OpenStack schedulers appear to be allocating the appropriate distribution of cores across the processors.

6 comments:

Editor16 February 2020 at 22:25
Thank you sharing this Information
I also found Various useful links related to Devops, Docker & Kubernetes

Kubernetes Kubectl Commands CheatSheet

Introduction to Kubernetes Networking

Basic Concept of Kubernetes

Kubernetes Interview Question and Answers

Kubernetes Sheetsheat

Docker Basic Tutorial

Linux Sar Command Tutorial

Linux Interview Questions and Answers

Docker Interview Question and Answers

OpenStack Interview Questions and Answers
Avijit16 June 2020 at 06:25
This material makes for extraordinary perusing. It's brimming with helpful data that is interesting,well-introduced, and straightforward. I like articles that are all around done.

SEO services in kolkata
Best SEO services in kolkata
SEO company in kolkata
Best SEO company in kolkata
Top SEO company in kolkata
Top SEO services in kolkata
SEO services in India
SEO copmany in India
Avijit6 July 2020 at 06:31
Thank you for your attention to detail and great writing style. Your professionalism shows in your article. I like your interesting views and appreciate your unique ideas. This is quality.

Denial management software
Denials management software
Hospital denial management software
Self Pay Medicaid Insurance Discovery
Uninsured Medicaid Insurance Discovery
Medical billing Denial Management Software
Self Pay to Medicaid
Charity Care Software
Patient Payment Estimator
Underpayment Analyzer
Claim Status

Verify Customer Identity7 January 2021 at 02:47
The actual requirement for id verification service is a lot higher these days. There are lots of id verification methods that anybody can attain on a stable site termed Trust Swiftly, and an agency can implement the methods to protect their own web business ideally. When online searchers make use of this https://trustswiftly.com/ website, they acquire knowledge about id verification service.
Anonymous19 April 2022 at 18:14
Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download Now

>>>>> Download Full

Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download LINK

>>>>> Download Now

Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download Full

>>>>> Download LINK Ff
Anonymous19 April 2022 at 18:14
Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download Now

>>>>> Download Full

Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download LINK

>>>>> Download Now

Openstack In Production - Archives: Hyperthreading In The Cloud >>>>> Download Full

>>>>> Download LINK 4v

OpenStack in Production - Archives

LHC Tunnel

Thursday 29 September 2016

Hyperthreading in the cloud

6 comments: