Quality Assurance for Stable Server Operation



Similar documents
Fujitsu s Activities for Quality Assurance

Fujitsu s Approach to Cloud-related Information Security

On-Demand Virtual System Service

Engineering Cloud: Flexible and Integrated Development Environment

Information Technology Engineers Examination. Network Specialist Examination. (Level 4) Syllabus. Details of Knowledge and Skills Required for

High-Speed Thin Client Technology for Mobile Environment: Mobile RVEC

Cloud Computing Based on Service- Oriented Platform

Thermal Power Plant Asset Management with Asset-centric Data Model

Security Technology for Smartphones

Media Cloud Service with Optimized Video Processing and Platform

Approach to Energy Management for Companies

Operation Efficiency Improvements for IT Infrastructure through Runbook Automation Technology

System Management and Operation for Cloud Computing Systems

Development and Runtime Platform and High-speed Processing Technology for Data Utilization

The Evolution of Load Testing. Why Gomez 360 o Web Load Testing Is a

Business Plan in 2015 of Organization for Cross-regional Coordination of Transmission Operators, Japan

Disaster Recovery Feature of Symfoware DBMS

Social Innovation through Utilization of Big Data

JOURNAL OF OBJECT TECHNOLOGY

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

Mechanical Design Platform on Engineering Cloud

Network Virtualization for Large-Scale Data Centers

Network Sensing Network Monitoring and Diagnosis Technologies

General Design Information Management System: PLEMIA

Integration of PRIMECLUSTER and Mission- Critical IA Server PRIMEQUEST

The Analyzing Method of Root Causes for Software Problems

Virtual Platforms Addressing challenges in telecom product development

Fujitsu s Approach to Platform as a Service (PaaS)

Provisioning Technology for Automation

Online Failure Prediction in Cloud Datacenters

CUSTOMER GUIDE. Support Services

Camber Quality Assurance (QA) Approach

IT Infrastructure of Data Center Services Based on ITIL

Accelerate your mission with GTSI Integration Services

NETWORK MONITORING & ALERTING SERVICES SERVICE DEFINITION

Network Solution for Achieving Large-Scale, High-Availability VoIP Services

Overview of Next-Generation Green Data Center

Simplify Your Windows Server Migration

Application Performance Testing Basics

SoMA. Automated testing system of camera algorithms. Sofica Ltd

Prospect of ICT Utilization at Core Clinical Research Hospitals

Revolutionizing System Support in the Age of Electronic Medical Records

SaaS and PaaS of Engineering Cloud

Virtualization s Evolution

How To Improve User Interface Design In An Ema System

Improving. Summary. gathered from. research, and. Burnout of. Whitepaper

TWX-21 Business System Cloud for Global Corporations

Quality Management Manual for Patent Examination. (Quality Manual)

Fujitsu s Activities for Universal Design

How to Plan a Successful Load Testing Programme for today s websites

Three Stages for SOA and Service Governance

Security Architectures for Cloud Computing

Development, Acquisition, Implementation, and Maintenance of Application Systems

Server Platform Optimized for Data Centers

Fujitsu s Approach to Online Medical Billing Mechanism

Cloud-based Electronic Medical Record System for Small and Medium Hospitals: HOPE Cloud Chart

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure

Symfoware Server Reliable and Scalable Data Management

DEDICATED TO EMBEDDED SOLUTIONS

Solution Concept for Public Libraries: One Library

Software Testing. Knowledge Base. Rajat Kumar Bal. Introduction

Information Security Services

ITIL V3 Foundation Certification - Sample Exam 1

Sample Exam Foundation Level Syllabus. Mobile Tester

R3: Windows Server 2008 Administration. Course Overview. Course Outline. Course Length: 4 Day

Power Supply and Demand Control Technologies for Smart Cities

WWT View Point. Journey to the Private Cloud: Take the First Steps with FlexPod

B&B ELECTRONICS WHITE PAPER. Managed Ethernet Switches - Key Features for a Powerful Industrial Network

Fujitsu s Approach to Cloud Computing

Network Virtualization Platform (NVP) Incident Reports

Fujitsu s Approach to Hybrid Cloud Systems

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT

Structural and Thermal Fluid Simulation and CAD System Linking

TestScape. On-line, test data management and root cause analysis system. On-line Visibility. Ease of Use. Modular and Scalable.

Use of modern telephone network for time transfer: An innovation

Making the Business and IT Case for Dedicated Hosting

Transcription:

Quality Assurance for Stable Server Operation Masafumi Matsuo Yuji Uchiyama Yuichi Kurita Server products are the core elements of mission-critical systems in society and business and are required to operate stably at all times. Furthermore, as technology continues to advance rapidly and the environment surrounding server products changes, the quality that customers demand is changing. In addition to providing high functionality and high performance, servers must support virtualization and energy-saving functions and the migration to Cloud services. In the light of these changes, quality assurance for assuring stable operation in server products is also evolving by optimizing development processes and developing new evaluation techniques. This paper introduces server-product development processes and evaluation techniques based on the policy of ensuring product quality right from the start and quality management in pursuit of quality, cost, and delivery. 1. Introduction As the backbone of mission-critical systems in society and the corporate world, servers must provide stable operating quality. Ensuring server quality is an important requirement for achieving a level of server operation acceptable to customers. At Fujitsu, we work to achieve high quality in our server products and provide stable operation in the customer s system through an extensive design review, an uncompromising system evaluation, and validation testing, all based on the concept of ensuring product quality right from the start. We also manage quality information obtained in all development processes and information about post-delivery operating quality in the customer s system and use this data as a basis for quality management. In this paper, we begin by introducing development processes that support quality assurance and evaluation methods that assure stable operation in recognition of market needs, both of which reflect Fujitsu s passion for ensuring product quality right from the start. We then introduce quality management at Fujitsu from design quality to operating quality based on the ISO9001 standard and describe the features of the manufacturing process at Fujitsu IT Products Ltd. (FJIT), a Fujitsu Group manufacturing plant in Japan focused on monozukuri (innovative manufacturing). 2. Development processes supporting quality assurance At Fujitsu, quality is built into the server new-product development process according to a system of internal standards, as shown in Figure 1. The results of each development phase are reviewed in assessment conferences and review meetings, and a transition to the next development phase is allowed only if those results satisfy established standards. This operation system is based on the concept of ensuring product quality right from the start. 164 FUJITSU Sci. Tech. J., Vol. 47, No. 2, pp. 164 170 (April 2011)

Design review Development plan review meeting Basic design review meeting Design testing study meeting Enter-testing assessment conference Enterproduction assessment conference Shipping assessment conference Planning Basic design Specific design Design testing Validation testing Commercial production and shipping support Development planning Concept design Architect team and development process monitor Figure 1 Outline of development processes. Here, the design phase has the most impact on quality, and quality assurance methods in this phase can be broadly divided into two types. The first applies know-how gained in previously developed products to the design-review process in new product development and strives to prevent the recurrence of problems experienced in the past. The second type uses logic simulators and server-structure simulators incorporating advanced techniques with the aim of improving design quality. The development processes from planning to design testing are conducted under the supervision of the design department, while those from validation testing onward are handled by the quality assurance department, which checks the quality of new products from the customer s viewpoint. The characteristic elements of these development processes are summarized below. 2.1 Architect team and development process monitor To enhance the quality of the entire system in addition to that of individual components, an architect team is formed from members of each component-development group. The team s tasks range from reaching consensus on specifications and planning a development process to actual process management. The position of development process monitor has also been introduced inside the quality assurance department, which checks the process phase transitions from a third-party perspective, so that the validity of the development process can be judged from both inside and outside viewpoints. 2.2 High quality right from the start In the waterfall style of development, incorporating high quality in the design phase is an important point in providing the customer with a stable system. In the past, the design testing phase went only as far as performing function checks on a component-by-component basis. This approach could result in development delays due, for example, to fatal logic changes when performing validation testing at the system level after entering the testing phase. For this reason, the development department and quality assurance department decided to work together on emphasizing system quality right from the FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 165

development phase. 2.2.1 Design quality improvement by proactive evaluation To improve design quality, validation items concerned with high load contention note 1) and accelerated margin tests note 2) have been moved up to design testing, thereby providing a mechanism for detecting critical problems caused by operation timing and fluctuations. In addition, to stabilize the quality of commercial production as early as possible, the production process is launched at the test-machine manufacturing step and manufacturing quality is examined closely. This enables latent problems in the commercial production process to be resolved beforehand. To improve the evaluation accuracy, the test coverage in each test phase is quantified and diagnosis rates are calculated to provide feedback to design testing. These measures provide a mechanism for detecting critical failures at an early stage so that appropriate measures can be taken. 2.2.2 Process improvement in serverstructure evaluation In the past, server-structure evaluation in the quality assurance department consisted of checking equipment in the validation testing phase. However, a manufacturing problem discovered at that time could require some time to resolve, causing shipping to be delayed. In response to this problem, the support department, commercial-production department, and quality assurance department now join forces from the specific design stage to perform thorough cross-checking using pilot models and note 1) For example, high load contention tests increase and decrease the frequency of memory access from both the input/output ports and central processing unit. note 2) Accelerated margin tests change the operation speed of a semiconductor device by varying the voltage, temperature, frequency, etc. virtual simulators that use Fujitsu s Virtual Product Simulator tool. The aim here is to eliminate manufacturing problems early. These activities enable an integrity level equal to the production model to be achieved in the validation testing phase, thereby achieving stable quality. 3. Evaluation methods for assuring stable operation This section describes evaluation methods for assuring stable operation in servers. An essential requirement of a mission-critical server is stable operation with no interruption of the customer s business processes. To meet this requirement, many measures for ensuring reliability are implemented in hardware and software at the design stage. These measures are evaluated in an environment equivalent to actual operation in the customer s system. The following introduces these characteristic evaluation methods. 3.1 System RAS evaluation for customer s operating environment Past validation testing involved product specification reviews and testing based on internal standards. However, this approach did not take into account the customer s actual operating environment, which led to various problems. The majority of these problems involved defective recovery operations in hardware and software originating in the timing of failed parts. In the customer s operating environment, software can behave in unforeseen ways leading to system crashes. This presented a quality assurance problem with regard to recovery operations after the failure of a part linking hardware and software. To deal with this problem, Fujitsu created a verification tool that links recovery operations and hardware/software and can thus evaluate system reliability, availability, and serviceability (RAS) in the customer s assumed operating environment. This tool automatically and repeatedly generates hardware false failures and 166 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)

Storage system BBC tester IP router Industry standard server Server All pins are automatically clipped by a robot. Pseudo-failures are generated by 0-V clipping of all physical pins and exhaustive RAS testing is performed for part failures. BBC: Black box clip IP: Internet protocol Figure 2 Outline of BBC tester. then checks resulting system operation to isolate problems that depend on the timing of failures. In addition, problems related to failures that occur only rarely in the customer s system, such as short circuits between signals, are also checked using Fujitsu s Black Box Clip tester, which automatically and repeatedly generates failures of this type (Figure 2). This tester clips physical signal pins in hardware to 0 V to generate pseudo-failures and evaluate their impact on the system from the RAS perspective. This evaluation is performed exhaustively for all pins in the pursuit of high quality across the entire system. Server Software configuration Patches Operating system version number Hardware combinations Partitioning Memory Input/output 3.2 System-wide compatibility evaluation System evaluation is not limited to the abovementioned system RAS evaluation. It also targets compatibility between hardware and software by extracting configurations (partitioning, memory, and input/output) envisioned by product specifications from a hardware perspective and considering combinations of operating system versions and patches from a software perspective (Figure 3). Stable operation is also assured by combining Figure 3 Evaluation of compatibility between hardware and software. benchmark tools and data processing to perform system evaluations under high-load, low-load, and high-load-contention operating conditions. 3.3 Application of automation technology In the past, testing was limited to fixed system configurations, which meant that sufficient testing could not be performed with FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 167

regard to timing, completeness, etc. To solve this problem, Fujitsu is combining a function for dynamically changing the system configuration (dynamic reconfiguration) with automation technology to perform detailed configuration testing. In this way, problems such as memory leaks, which in the past were difficult to detect until they became obvious, can be detected at an early stage through constant monitoring by an automated tool. 4. Quality management To continuously provide products that satisfy customer needs, Fujitsu is expanding quality improvement activities at every phase of product development, commercial production, and customer service on the basis of the quality assurance concepts in quality management systems (ISO9001). 4.1 Design quality management Design quality is built into upstream development processes. In other words, the results output from the planning, basic design, and specific design phases determine product quality. To raise the degree of completion in each of these design phases, Fujitsu not only performs progress and problem checking, as is generally done, but also draws up a list of development risks that can be envisioned beforehand in the planning phase while also assessing risk conditions at review meetings and assessment conferences held at the time of phase transitions. The review meetings and assessment conferences decide whether to allow a phase transition to occur by performing examinations according to assessment standards established for each phase and by assessing the impact of the risk on subsequent processes. In addition, processing monitoring note 3) by a third party note 3) Processing monitoring monitors and objectively evaluates whether the rules and procedures established in the development processes are being observed. enables problems arising in each development phase to be reported periodically to the project leader so that the risk of development-process delays and quality degradation can be reduced at an early stage. 4.2 Commercial-production quality management In the commercial-production stage, manufacturing quality is not the only matter of concern. It is also important to achieve a balance of quality, cost, and delivery (QCD). On the basis of this idea, Fujitsu is working to construct optimal manufacturing lines in pursuit of QCD and proactively introduce new test processes after the design review stage. Fujitsu is also establishing checkpoints in the stage preceding shipping assessment and introducing processes for observing the characteristics of manufacturing systems that can maintain stable shipping at production plants. After commercial production begins, quality targets are set for each manufacturing process, and the plan do check act (PDCA) cycle is repeated in quality improvement activities to maintain quality and pursue QCD. The quality information obtained in these activities is shared by various business departments and the quality assurance department with the aim of achieving companywide dissemination of quality information. This quality information is also being used as input for improving design quality in subsequent server models. 4.3 Operation quality management To supply products with exceptional quality and maintain stable operation in delivered products, Fujitsu sets quality targets for each product and promotes field quality management. These quality targets are determined by surveying quality trends throughout the industry with an eye to exceeding the quality levels of other companies. Target values are subject to 168 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)

periodic review to improve quality even further. Operation quality is maintained mainly by data evaluation based on statistical methods and recurrence prevention activities based on thorough root cause analysis. Data evaluation based on statistical methods borrows ideas from reliability engineering to analyze trends in early life failure, random failure, and abrasion failure for each model and component/unit. The results are used to predict future quality trends and take preventive actions as soon as possible. If a peculiar trend is observed from trend analysis or failure cause, an expert team conducts a thorough hunt for the root cause and oversees activities aimed at halting the spread of damage and preventing recurrence of the failure. If a design factor is involved, the root cause for both a built-in cause and failure-occurrence cause is clarified using the 5 Whys method of analysis and the PDCA cycle is repeated to provide feedback to the development processes. Operating quality data is shared among the development, quality assurance, and support departments as well as plant departments at regular quality meetings in an effort to improve quality. 5. Focus on monozukuri Fujitsu s core server products are manufactured by FJIT, whose business policy is to improve customer satisfaction through the pursuit of QCD. To this end, FJIT interacts closely with the Fujitsu development department and quality assurance department and works constantly to achieve optimal monozukuri while repeating the PDCA cycle to make daily improvements with a QCD balance in mind. Fujitsu has recently initiated companywide production-innovation activities by introducing the Toyota Production System to enhance monozukuri from a customer-centric perspective. Prime activities here are continuous flow processing focused on one-piece-at-atime production and streamlining by process combining and process mixing. 6. Future issues Recent improvements in semiconductor technology have been accompanied by higher levels of integration to the point where logic circuits that in the past could be achieved only using several chips can now be consolidated on a single chip. This development makes for smaller and lighter equipment but also magnifies the risk since a single logic failure can now have a big impact on the development process through, for example, the need for time-consuming fixes. This risk makes logic design simulation and testing that can cover new technologies all the more important in design reviews. There is also a demand for power-saving equipment as Cloud services that use largescale systems in data centers expand. However, while energy-saving modes exist for reducing power when equipment is not being used, they also present new problems related to operation timing. This situation calls for the development of evaluation techniques from a new perspective. 7. Conclusion This paper described Fujitsu s efforts to develop quality assurance techniques to support stable operation in its server products. These efforts seek to fulfill departmental missions through optimal QCD while improving quality assurance in response to ever-changing market trends and customer needs. Looking forward, Fujitsu will pursue evaluation techniques from new perspectives in the face of market trends toward Cloud computing and will work to develop optimal quality management from the perspective of the Cloud customer. FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011) 169

Masafumi Matsuo Fujitsu Ltd. Mr. Matsuo is engaged in the evaluation and quality management of enterprise servers. Yuichi Kurita Fujitsu Ltd. Mr. Kurita is engaged in the quality management of enterprise servers. Yuji Uchiyama Fujitsu Ltd. Mr. Uchiyama is engaged in the evaluation of enterprise servers. 170 FUJITSU Sci. Tech. J., Vol. 47, No. 2 (April 2011)