XtratuM hypervisor redesign for LEON4 multicore processor E.Carrascosa, M.Masmano, P.Balbastre and A.Crespo Universidad Politécnica de Valencia, Spain
Outline Motivation/Introduction XtratuM hypervisor XtratuM SMP Evaluation Conclusions
European aero-space sector Quite conservative sector However, systems have more and more powerful processors and larger memories The same workload that required hundred of processors now just requires a few ones So, why do not use less hardware to run the same software Isolation between execution components must be guaranteed Interest in adopting a TSP based-architecture principle Already used in the avionic sector (IMA)
Time & Space Partitioning (TSP) (I) TSP based-architecture Defines a set of isolation execution environments Guarantees temporal and spatial isolation of the components of the system Enables to concentrate modules with low criticality level jointly with high criticality ones in the same computer
Time & Space Partitioning (TSP) (II) Software (several technological solutions) Hypervisor Partitioning kernel Micro-kernel Standard ARINC-653 (avionics) defines the interface for building TSP systems (partitioning kernel)
XtratuM hypervisor (I) Bare metal, open source hypervisor designed to meet safety critical real-time requirements Uses paravirtualization techniques Supported architectures: x86, LEON2/3/4 (SPARCv8), ARM Cortex R4f Strong temporal isolation: fixed cyclic scheduler Strong spatial isolation: partitions are executed in user mode, and do not share physical memory pages
XtratuM hypervisor (II) Robust communication mechanisms (ARINC sampling/queuing ports) Robust error management via health monitor Devices can be directly managed by partitions. Shared devices can be organised in a IOServer All resources are allocated via Configuration Table (XML) Tracing facilities
Hypercalls example High-level hypercalls xm_s32_t XM_create_queuing_port(char * portname, xm_u32_t maxnomsgs, xm_u32_t maxmsgsize, xm_u32_t direction); xm_s32_t XM_hm_status(xm_HmStatus_t *hmstatusptr); xm_s32_t XM_trace_event(xm_u32_t bitmask, xmtraceevent_t *event); Low-level hypercalls void XM_sparc_set_psr(xm_u32_t psr); void XM_sparc_flush_cache(void); void XM_sparc_flush_regwin(void); void XM_rett(void);
XML configuration file <SystemDescription version="1.0.0"> [...] <ProcessorTable> <Processor id="0" frequency="80mhz"> <CyclicPlanTable> <Plan id="0" majorframe="2000ms"> <Slot id="0" start="0ms" duration="1000ms" partitionid="0"/> </Plan> </CyclicPlanTable> </Processor> </ProcessorTable> [...] <PartitionTable> <Partition id="0" name="partition1" flags="system" console="uart"> <PhysicalMemoryAreas> <Area start="0x40180000" size="256kb" mappedat="0x40000000"/> </PhysicalMemoryAreas> </Partition> </PartitionTable> </SystemDescription>
Configuration & Deployment
XtratuM Roadmap CNES: Centre National d'études Spatiales ESA: European Space Agency AstriumST: Astrium Transportation (Airbus) Certifiable (ECSS) LEON2/3 (CNES/ESA) LEON2 (CNES) LEON3 (ESA) Prototype x86 x86 LEON2 (CNES) LEON3 (ESA) Preindustrialised LEON4- SMP (ESA) x86 SMP ARM Cortex (AstriumST) LEON3-SMP (CNES) 2004... 2007 2008 2009 2010 2011 2012 2013 2014 LKM
XtratuM-SMP Project System impact of distributed multicore systems Participants: Astrium-SAT, UPV Funded by ESA Aim: Support SMP Support IOMMU Test the NGMP processor Board: FPGA LEON4 NGMP
LEON4-NGMP processor Fault-tolerant synthesizable VHDL Quad-Core 32-bit LEON4 SPARCv8 MMU Two shared FPUs 4x4KB instruction/data L1 cache 256KB L2 cache System frequency: 50MHz Developed by Aeroflex Gaisler (funded by ESA)
SMP model extensions (I) Booting: CPU0 sets up the system CPU1-3 set up their local info Scheduler synchronisation Devices Global clock one clock for the whole system Local timer per CPU IPIs for synchronisation
SMP model extensions (II) Partition model extension Inclusion of the VCPU concept Number restricted by configuration Mimics underlying CPU behaviour: All halted but VCPU0, which is in charge of starting up the others VCPUs are local to partitions New hypercalls to manage VCPUs: XM_get_vcpuid(), XM_reset_vcpu(), XM_halt_vcpu() Behaviour of some hypercalls (XM_halt_partition, ) updated to affect all VCPUs within the partition
SMP model extensions (III) Partition0 Partition1 Partition2 VCPU0 VCPU1 VCPU0 VCPU1 VCPU0 VCPU1 XtratuM CPU0 CPU1
SMP model extensions (IV) Local cyclic scheduler Each CPU has its own plan VCPUs are referenced in the plan <Slot start= 0 duration= 200ms partitionid= 0 vcpuid= 0 /> Same MAF for all the plans Synchronisation of the plans at the start of each hyperperiod A VCPU cannot be allocated to more than one CPU MAF MAF CPU0 0, 0 1, 0 0, 1 0, 0 1, 0 0, 1 CPU1 1, 1 1, 1 1, 1 1, 1
SMP model extensions (V) Local fixed priority scheduler It can be used instead of the cyclic scheduler Each VCPU is configured with a priority, a VCPU can be preempted by a higher priority one Included to experiment with alternative scheduling policies Improves interrupt responsiveness
SMP model extensions (VI) IOMMU support ESA requirement: Implement IOMMU support Provides I/O isolation at bus level (DMA or other master devices) IOMMU tables are statically configured as a part of the XML configuration file It cannot be changed during the execution
Evaluation (I) Aim: Measure the impact of the hypervisor layer Benchmarks: Dhrystone and CoreMark Assumptions: One partition, one CPU Overhead due to hypercalls
Evaluation (II) Aim: Measure the impact of the hypervisor layer and the context switch Benchmarks: Dhrystone and CoreMark Assumptions: One partition, one CPU, different slot durations Overhead due to hypercalls + context switch
Evaluation (III) Aim: Measure the impact of the hypervisor layer, the context switch and several CPUs Benchmarks: Dhrystone and CoreMark Assumptions: Several partitions, several CPUs Overhead due to hypercalls + context switch + CPU interactions
Conclusions XtratuM has been adapted to support SMP architectures Partitions also implement SMP: VCPU New hypercalls required IOMMU is supported: Paritions can use DMA Board evaluation shows: Little overhead caused by XtratuM internals (CS, hypercalls) Almost parallel access to the L2 content (128-bit bus width)