GENIVI Lifecycle Webcast 30 th January 2014



Similar documents
Application Framework: Apertis Hands-on

Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI

11.1. Performance Monitoring

About Parallels Desktop 10 for Mac

Mentor Embedded Automotive Solutions

Running a Workflow on a PowerCenter Grid

Upgrading Cisco UCS Central

StreamServe Persuasion SP5 Control Center

COMMANDS 1 Overview... 1 Default Commands... 2 Creating a Script from a Command Document Revision History... 10

Cisco Prime Collaboration Deployment Troubleshooting

BB2798 How Playtech uses predictive analytics to prevent business outages

- 1 - SmartStor Cloud Web Admin Manual

RedHat (RHEL) System Administration Course Summary

Construct User Guide

Mentor Embedded IVI Solutions

Registry Tuner. Software Manual

User s Guide for Polycom CX7000 Systems

Avira System Speedup. HowTo

SAP NetWeaver High Availability and Business Continuity in Virtual Environments with VMware and Hyper-V on Microsoft Windows

Hyper-V Installation Guide for Snare Server

Managing Rack-Mount Servers

IVUE System Administration

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

Table of Contents. Use. Troubleshooting. Setup. Welcome. 11 How to arm/disarm system/camera(s) 19 Sync Module setup issues. 3 Installing the Blink app

All Tech Notes and KBCD documents and software are provided "as is" without warranty of any kind. See the Terms of Use for more information.

Intellicus Cluster and Load Balancing (Windows) Version: 7.3

GENIVI FAQ. What is the GENIVI Alliance?

How To Set Up An Intellicus Cluster And Load Balancing On Ubuntu (Windows) With A Cluster And Report Server (Windows And Ubuntu) On A Server (Amd64) On An Ubuntu Server

This section will focus on basic operation of the interface including pan/tilt, video, audio, etc.

Load Manager Administrator s Guide For other guides in this document set, go to the Document Center

FioranoMQ 9. High Availability Guide

Glassfish Architecture.

Desktop Activity Intelligence

ICS Technology. PADS Viewer Manual. ICS Technology Inc PO Box 4063 Middletown, NJ

ServerPronto Cloud User Guide

Lenovo Miix 2 8. User Guide. Read the safety notices and important tips in the included manuals before using your computer.

WINDOWS PROCESSES AND SERVICES

Android Operating System:

Statement of Support on Shared File System Support for Informatica PowerCenter High Availability Service Failover and Session Recovery

Type Message Description Probable Cause Suggested Action. Fan in the system is not functioning or room temperature

High Availability for Citrix XenServer

Connected Vehicles as things on the Internet. big data, the cloud, and advanced automotive research

Splunk for VMware Virtualization. Marco Bizzantino Vmug - 05/10/2011

SnapManager 1.0 for Virtual Infrastructure Best Practices

ASF: Standards-based Systems Management. Providing remote access and manageability in OS-absent environments

138 Configuration Wizards

Frequently Asked Questions: Cisco Jabber 9.x for Android

Fifty Critical Alerts for Monitoring Windows Servers Best practices

Automatic Service Migration in WebLogic Server An Oracle White Paper July 2008

FactoryTalk View Site Edition V5.0 (CPR9) Server Redundancy Guidelines

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Foglight Experience Monitor and Foglight Experience Viewer

E-Series. NetApp E-Series Storage Systems Mirroring Feature Guide. NetApp, Inc. 495 East Java Drive Sunnyvale, CA U.S.

GL-250: Red Hat Linux Systems Administration. Course Outline. Course Length: 5 days

About Parallels Desktop 9 for Mac

Secure Web Gateway Version 11.7 High Availability

Acano solution. Acano Clients v1.7 Getting Started Guide. June D

PMOD Installation on Linux Systems

Operating Instructions - Recovery, Backup and Troubleshooting Guide

Veeam ONE What s New in v9?

EXPRESSCLUSTER X for Windows Quick Start Guide for Microsoft SQL Server Version 1

NTI Backup Now EZ v2 User s Guide

Chapter 2 Array Configuration [SATA Setup Utility] This chapter explains array configurations using this array controller.

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

Database Backup and Recovery Guide

High Availability Solutions for the MariaDB and MySQL Database

ORACLE INSTANCE ARCHITECTURE

Dell OpenManage Mobile Version 1.4 User s Guide (Android)

I/O Device and Drivers

Reboot the ExtraHop System and Test Hardware with the Rescue USB Flash Drive

USB Floppy USB Floppy Disk Emulator

CHAPTER. Monitoring and Diagnosing

System Administration of Windchill 10.2

SHARPCLOUD SECURITY STATEMENT

Troubleshooting. System History Log. System History Log Overview CHAPTER

UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.

Symantec Enterprise Vault

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Auditing UML Models. This booklet explains the Auditing feature of Enterprise Architect. Copyright Sparx Systems Pty Ltd

UPDATE MANAGEMENT SERVICE The advantage of a smooth Software distribution

The Carbonite Appliance HT10 User Guide

Android Architecture For Beginners

Backup & Disaster Recovery Appliance User Guide

Cisco WebEx Node Management System. Administrator s Guide

Volunteers for Salesforce User s Guide Version 3.5

MiVoice 6725ip Microsoft Lync Phone REV04 WORK SMART USER GUIDE

Oak Ridge National Laboratory Computing and Computational Sciences Directorate. Lustre Crash Dumps And Log Files

SysPatrol - Server Security Monitor

Sophos Mobile Control as a Service Startup guide. Product version: 3.5

CS161: Operating Systems

Chapter 3: Operating-System Structures. Common System Components

Computer Setup User Guide

End User Guide. July 22, 2015

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Avira Free Android Security (version 1.2) HowTo

Getting Started with Tizen SDK : How to develop a Web app. Hong Gyungpyo 洪 競 杓 Samsung Electronics Co., Ltd

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

2015 MicroDoc GmbH, München Java and IoT from a MicroDoc perspective

CTERA Agent for Windows

FileNet System Manager Dashboard Help

Transcription:

GENIVI Lifecycle Webcast 30 th January 2014 29-Jan-14 David Yates, Continental Automotive Gmbh Lifecycle topic owner and SysArch Member Dashboard image reproduced with the permission of Visteon and 3M Corporation. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. 1

Scope of Presentation The aim of this presentation is to provide an overview of the Lifecycle architecture within GENIVI detailing where we believe the Automotive world requires extensions to existing open source solutions. The following topics will be covered: Welcome & Introduction Lifecycle Domain Overview Component Overview Startup/Shutdown Concept Introduction to NSM (session management) Introduction to Resource Roadmap Call to action Location of further information (AMM presentations) 29-Jan-14 2

Lifecycle Domain Overview Set events (session information) Request state change Get node states Node state change notification Set last user context Lifecycle Domain Get internal supply/thermal states Supply state change notification Thermal state change notification Supply Thermal Node State Power Boot Resource Startup/shutdown management Resource limitation Run time observation Persistency Log & Trace 29-Jan-14 3

Lifecycle Manifest Package Product Component Platform Component Node State Machine Node State Manager Node Startup Controller Supply Manager Supply Node State Boot systemd Thermal Manager Thermal Power Resource cgroup service Power Event Collector Node Resource Mgr Node Health Monitor 29-Jan-14 4

Lifecycle Concept Plug in for: ADC, PMIC Plug in for: Sensors, Devices Plug in for: Wakeup reason, node / vehicle network State chart Supply Reaction on conditions Turn off display, drives, mute audio, State chart Thermal Reaction on conditions Turn on fan, reduce audio volume, Power *1 state change notifications *1 Events: Good Poor Bad Clamp State Button WU Bus WU Vehicle Network State chart Node State Set LUC Last-User-Context Session handling (Phone, Diag,SWL, ) Node State change protocol Shutdown management Boot config Boot HMI, Phone, SWL/Update Diagnostics Node observing for CPU load, memory, appl. crash Resource Plug in for: Application specific observing and recovering 29-Jan-14 5 Ctrls Node Resource config

Startup/Shutdown Boot takes care about Startup Node State takes care about Shutdown Why do we have this split? systemd stops and unloads all components during its shutdown concept. This requires alot of time to make them functional again in the event of a cancel shutdown. An IVI system must be able to resume operation without losing any context and without the need for a reboot. Therefore Node State will only call registered consumers in the shutdown phase. This event notification will drive the components into a stable state and ensure that everything has been stored which will be needed for the next startup. With this approach components would not be shutdown which is required for certain exceptions like the flash filesystem. Therefore additionally the shutdown management concept will include/use the systemd shutdown concept, where appropriate for legacy/critical components. 6

Node State Manager Shutdown preparation in Startup Phase kernel Before systemd Runlevel replacement GENIVI extensions initrd Mandatory targets (Base System & Early Features) Start NSM via systemd A B C BASE_RUNNING (during Node Startup Controller init) focussed.target (last user context) LUC_RUNNING unfocussed.target(s) FULLY_RUNNING FULLY_ OPERATIONAL lazy.target J 7

Node State Manager Shutdown Execution Consumer J Consumer I Writing LUC Consumer H Node Startup Controller systemd app1.service Consumer G Consumer F Consumer E Node Startup Controller systemd app2.service Consumer D Writing LUC Consumer C Consumer B Consumer A Node Startup Controller systemd Shutdown.target (flash file systems) Enables: 1. Shutdown activities are triggerable without unloading the components. 2. Legacy components can be shut down in their traditional way. 3. Full flexibility on where to integrate systemd based shutdown units. 8

NSM Session Phone Node State Machine Events/ Data Events/ Data Node Session State PhoneSession SWLSession. Node State Manager Node State PhoneSession SWLSession. Shutdown Phone Set method Request system restart Signal SWL Audio HMI Audio HMI Navigation Lifecycle Requests Navi 9

Resource - Goals Resource management contains the functionality to ensure that the node runs in a stable and defined manner. To do this, it will monitor and limit different aspects of SW component behavior including system resources (i.e. CPU load and memory) and critical run-time observation. Resource allocation will be configurable on a component basis through the use of cgroups. Health management will provide a configurable escalation strategy defining actions to be taken in the case of system failures. Note: what is not included is security handling for resources (i.e. restricted access to resources) 10

Health Health will ensure that the node runs in a stable and defined manner. To do this it is planned to have the following multi layered observation system and escalation strategy: Read/write data start/ restart Applications Applications Applications notify alive Persistence execute recovery Recovery Recovery Apps Apps Recovery Client Delete app data request app restart request node restart start/ restart systemd NHM monitoring of userland request node restart notify alive and monitor node status NSM notify alive /dev/watchdog forward NHM heartbeat externally or to internal HW Watchdog 11

Concepts for the System Health - NHM The Node Health Monitor will work in conjunction with systemd to monitor component failures in the system. It will be responsible for : Monitoring systemd to automatically record and track application failures Providing an interface with which components can register failures when not using the systemd watchdog mechanism Maintaining failure statistics over multiple lifecycles for the system and components the service name will be used to identify and track component failures statistics on number of failures in number of lifecycles will be maintained (i.e. 3 failures in last 32 lifecycles) Monitoring the wakeup and shutdown events to catch unexpected system restarts Provide an interface for components to read system and component error counts Provide an interface for recovery applications to request a node restart

Concepts for the System Health NHM cont.. Additionally the Node Health Monitor will test a number of product defined criteria with the aim to ensure that userland is stable and functional. For instance it will be able to validate that : there is enough free system memory the CPU is not reporting an excessively high load for a sustained period defined file accessibility is possible defined processes are still running communication is possible (DBUS) a user defined process can be executed with an expected result If the NHM believes that there is an issue with user land then it will be capable to initiate a system restart

Concepts for the System Health Node Wdog It is proposed to use, when supported, a low level HW watchdog to validate that systemd is running correctly. A typical watchdog implementation is capable to initiate an emergency shutdown process when it believes that a failure has occurred : idle init, so nothing new can be started kill all processes write a reboot record to wtmp turn off accounting turn off quota turn off swap unmount all mounted partitions NOTE: In this scenario a normal system shutdown will not be completed therefore cached persistent data from that Lifecycle will be lost

Concepts for the System Health - systemd systemd provides watchdog functionality for monitoring and restarting failing services in the system and for sending heartbeats itself to a HW Watchdog Within a service unit file it is possible to configure systemd that it will expect a heartbeat from the service within a particular time interval (WatchdogSec=). If this heartbeat is not received then systemd can be directed using tags in the applications unit file on how to behave. Typically this will result in the application being automatically restarted (Restart=). The problem is that this can result in a cyclic restart scenario with only limited options (StartLimitInterval=, StartLimitBurst=) to influence the restart behavior. Therefore, it is proposed that recovery applications are started automatically by systemd (OnFailure=) in case of failures.

Concepts for the System Health Recovery Client A Recovery Client is a component that is executed when a failure has been detected in the system. There can be a one to one relationship between apps and recovery clients or one client can handle multiple apps. It should contain enough functionality to be able to : request the error status count from the NHM providing the name of the service file failing based on the error count attempt recovery, for instance: if a file system fails to mount then the recovery action could be to format the file system and request a node restart if it is an application that has failed multiple times then we may want to delete that applications persistency data and restart the application when possible, request that the SW is uninstalled or rolled back request systemd to restart the application request a node restart via the NHM

Resource Resource Node State Mgmt systemd cgroups Node Resource Manager Node State Manager Starts services Configure cgroups Control system resources Report/Handle resource allocation errors Monitor system resources Kill resource abusers Evaluate node restart requests Handle node restart requests Application Component P3 Supply Control Logic 17

Example cgroup configuration (CPU) Radio NAV Speech Weather 3 rd party APPS Media Phone AUTOMOTIVE cpu.shares = 50, runtime= 100, period = 1000 APPS cpu.shares = 20, runtime= 500, period = 2000 Browser ROOT Unlimited Diagnostics Safety Cameras Positioning Comm Stacks Background tasks Infrastructure Services SW Loading Vehicle Network PDC BGND cpu.shares = 1, Kernel 18

Example cgroup configuration (Memory) Radio NAV Speech Weather 3 rd party APPS Media Phone Comm Stacks PDC Browser APPS memory.limit_in_bytes = 200M.. ROOT Unlimited Diagnostics Safety Cameras Positioning Background tasks Infrastructure Services SW Loading Vehicle Network BGND memory.limit_in_bytes = 10M Kernel 19

Lifecycle Roadmap Gemini Horizon Roadmap systemd cgroup service Adopted comps. from the OSS community specific Node Startup Controller Owned component, funded by GENIVI, implemented by Codethink specific Node State Manager Owned component, implemented by Continental abstract specific Node State Machine Product specific extension to the Node State Manager placeholder placeholder placeholder Node Resource Mgr Owned component, to be implemented by Continental specific Node Health Monitor Owned component, implemented by Continental abstract specific 29-Jan-14 20

Call to action We hope today s presentation has interested you in what we are working on within GENIVI and the Open Source Software that we have already released and plan to release in the future. The components described today have been defined and created within the GENIVI consortium as Open Source Software with the MPLv2 licence. For that reason the code is freely available in a public git repository outside of GENIVI. If you have interest in the components and can see other potential uses in your domains then please check out the links on the following slides. We are very open to inputs and requirements from all interested parties so please ask questions and get involved. http://projects.genivi.org/node-state-manager/about 29-Jan-14 21

Call to action continued For those already working inside of GENIVI that wish to contribute directly in the Systemd Infrastructure group we are always looking for more participants and have many topics ongoing for which you might be interested: Persistency User SW IPC Diagnostics Please feel free to check out the GENIVI Wiki page where you can find more information about the above topics and how to participate in our weekly telephone conference calls. https://collab.genivi.org/wiki/display/genivi/system+infrastructure+expert+group 29-Jan-14 22

Further Information If you are interested in further information regarding the GENIVI Lifecycle concept then you can find information within the GENIVI Wiki and public project page: https://collab.genivi.org/wiki/display/genivi/sysinfraeglifecycledef (restricted) http://projects.genivi.org/node-state-manager/about (open) All presentations of the concepts can be found using this link Lifecycle Presentations The code for the Node State Manager and the Node Health Monitor can be found in the GENIVI git : http://git.projects.genivi.org/?p=lifecycle/node-state-manager.git http://git.projects.genivi.org/?p=lifecycle/node-startup-controller.git http://git.projects.genivi.org/?p=lifecycle/node-health-monitor.git and you can contact me directly (David.Yates@continental-corporation.com) 29-Jan-14 Copyright GENIVI Alliance 2012 23