Oak Ridge National Laboratory Computing and Computational Sciences Directorate. File System Administration and Monitoring
|
|
|
- Felicity Hunt
- 10 years ago
- Views:
Transcription
1 Oak Ridge National Laboratory Computing and Computational Sciences Directorate File System Administration and Monitoring Jesse Hanley Rick Mohr Jeffrey Rossiter Sarp Oral Michael Brim Jason Hill Neena Imam * Joshua Lothian (MELT) ORNL is managed by UT-Battelle for the US Department of Energy
2 Outline Starting/stopping a Lustre file system Mounting/unmounting clients Quotas and usage reports Purging Survey of monitoring tools
3 Starting a Lustre file system The file system should be mounted in the following order (for normal bringup): MGT (Lustre will also mount the MDT automatically if the file system has a combined MGT/MDT) All OSTs All MDTs Run any server-side tunings After this, the file system is up and clients can begin mounting Mount clients and run any client-side tunings The commands for mounting share a similar syntax mount -t lustre $DEVICE No need to start a service or perform a modprobe
4 Mount by label / path Information about a target is encoded into the label [root@testfs- oss3 ~]# dumpe2fs - h /dev/mapper/testfs- l28 grep "^Filesystem volume name" Filesystem volume name: testfs- OST These labels also appear under /dev/disk/by-label/ If not using multipathing, this label can be used to mount by label: testfs- mds1# mount - t lustre - L testfs- MDT0000 /mnt/mdt Also avoid using this method when using snapshots If using multipathing, instead use the entry in /dev/ mapper/. This can be set up in bindings to provide a meaningful name. testfs- mds1# mount - t lustre - L /dev/mapper/testfs- lun0
5 Mounting Strategies These mounts can be stored in fstab. Include noauto param file system will not automatically mount at boot Include _netdev param file system will not mount if network layer has not started These targets could then be mounted using: mount - t lustre - a This process lends itself to automation
6 Client Mounting To mount the file system on a client, run the following command: mount -t lustre MGS_node:/fsname /mount_point, e.g., mount -t lustre @o2ib:/testfs /mnt/ test_filesystem As seen above, the mount point does not have to map to the file system name. After the client is mounted, run any tunings
7 Stopping a Lustre file system Shutting down a Lustre file system involves reversing the previous procedure. Unmounting all block devices on a host stops the Lustre software. First, unmount the clients On each client, run: umount -a -t lustre #This unmounts all Lustre file systems umount /mount/point #This unmounts a specific file system Then, unmount all MDT(s) On the MDS, run: umount /mdt/mount_point (e.g., /mnt/mdt from the previous example) Finally, unmount all OST(s) On each OSS, run: umount -t lustre a
8 Quotas For persistent storage, Lustre supports user and group quotas. Quota support includes soft and hard limits. As of Lustre 2.4, usage accounting information is always available, even when quotas are not enforced. The Quota Master Target (QMT) runs on the same node as the MDT0 and allocates/releases quota space. Due to how quota space is managed, and that the smallest allocable chuck is 1MB (for OSTs) or 1024 inodes (for MDTs), a quota exceeded error can be returned even when OSTs/MDTs have space/inodes.
9 Usage Reports As previously mentioned, accounting information is always available (unless explicitly disabled). This information can provide a quick overview of user/ group usage: Non root users can only view the usage for their user and group(s) # lfs quota -u myuser /lustre/mntpoint For more detailed usage, the file system monitoring software Robinhood provides a database that can be directly queried for metadata information. Robinhood also includes special du and find commands that use this database.
10 Purging A common use case for Lustre is as a scratch file system, where files are not intended for long term storage. In this case, purging older files makes sense. Policies will vary per site, but for example, a site may want to remove files that have not been accessed nor modified in the past 30 days.
11 Purging Tools An administrator could use a variety of methods in order to purge data. The simplest version includes a find (or lfs find) to list files older than x days, then remove them. Ex: lfs find /lustre/mountpoint -mtime +30 -type f This would find files that have a modification time stamp older than 30 days A more advanced technique is to use software like Lester to read data directly from a MDT.
12 Handling Full OSTs One of the most common issues with a Lustre file system is an OST that is close to, or is, full. To view OST usage, run the lfs df command. An example of viewing a high usage OST [root@mgmt ~]# lfs df /lustre/testfs sort - rnk5 head - n 5 testfs- OST00dd_UUID % /lustre/testfs[ost:221] Once the index of the OST is found, running lfs quota with the I argument will provide the usage on that OST. for user in $(users); do lfs quota -u $user -I 221 /lustre/testfs; done
13 Handling Full OSTs (cont.) Typically, an OST imbalance that results in a filled OST is due to a single user with improperly striped files. The user can be contacted and asked to remove/ restripe the file, or the file can be removed by an administrator in order to regain use of the OST. It is often useful to check for running processes (tar commands, etc) that might be creating these files. When trying to locate the file causing the issue, it s often useful to look at recently modified files
14 Monitoring There are some things that are important to monitoring on Lustre servers. These include things like high load and memory usage.
15 Monitoring General Software Nagios Nagios is a general purpose monitoring solution. A system can be set up with host and service checks. There is native support for host-down checks and various service checks, including file system utilization. Nagios is highly extensible, allowing for custom checks This could include checking the contents of the /proc/fs/lustre/ health_check file. It s an industry standard and has proven to scale to hundreds of checks Open source (GPL) Supports paging on alerts and reports. Includes a multi-user web interface
16 Monitoring General Software Ganglia Gathers system metrics (load, memory, disk utilization, ) and stores the values in RRD files. Benefits to RRD (fixed size) vs downsides (data rolloff) Provides a web interface for these metrics over time (past 2hr, 4hr, day, week, ) Ability to group hosts together In combination with collectl, can provide usage metrics for Infiniband traffic and Lustre metrics
17 Monitoring General Software Splunk Operational Intelligence Aggregates machine data, logs, and other user-defined sources From this data, users can run queries. These queries can be scheduled or turned into reports, alerts, or dashboards for Splunk s web interface Tiered licensing based on indexed data, including a free version. Useful for generating alerts on Lustre bugs within syslog There are open source alternatives such as ELK stack.
18 Monitoring General Software Robinhood Policy Engine File system management software that keeps a copy of metadata in a database Provides find and du clones that query this database to return information faster. Designed to support millions of files and petabytes of data Policy based purging support Customizable alerts Additional functionality added for Lustre file systems
19 Monitoring Lustre tools Lustre provides information on a low level about the state of the file system This information lives under /proc/ For example, to check if any OSTs on an OSS are degraded, check the contents of the files located at /proc/fs/lustre/obdfilter/*/degraded Another example would be to check if checksums are enabled. On a client, run: cat /proc/fs/lustre/osc/*/checksums More details can be found in the Lustre manual
20 Monitoring Lustre tools Lustre also provides a set of tools The lctl {get,set}_param functions display the contents or set the contents of files under /proc lctl get_param osc.*.checksums lctl set_param osc.*.checksums=0 This command allows for fuzzy matches The lfs command can check the health of the servers within the file system: [root@mgmt ~]# lfs check servers testfs- MDT0000- mdc- ffff880e0088d000: active testfs- OST0000- osc- ffff880e0088d000: active The lfs command has several other possible parameters
21 Monitoring Lustre tools The llstat and llobdstat commands provide a watch-like interface for the various stats files llobdstat: /proc/fs/lustre/obdfilter/<ostname>/stats llstat: /proc/fs/lustre/mds/mds/mdt/stats, etc. Appropriate files are listed in the llstat man page Example: [root@sultan- mds1 lustre]# llstat - i 2 lwp/sultan- MDT0000- lwp- MDT0000/stats /usr/bin/llstat: STATS on 06/08/15 lwp/sultan- MDT0000- lwp- MDT0000/stats on @o2ib1 snapshot_time req_waittime req_active mds_connect obd_ping lwp/sultan- MDT0000- lwp Name Cur.Count Cur.Rate #Events Unit last min avg max stddev req_waittime [usec] req_active [reqs] mds_connect [usec] obd_ping [usec] ^C
22 Monitoring Lustre Specific Software LMT The Lustre Monitoring Tool (LMT) is a Python-based, distributed system that provides a top-like display of activity on server-side nodes LMT uses cerebro (software similar to Ganglia) to pull statistics from the /proc/ file system into a MySQL database lltop/xltop A former TACC staff member, John Hammond, created several monitoring tools for Lustre file systems lltop - Lustre load monitor with batch scheduler integration xltop - continuous Lustre load monitor
23 Monitoring Lustre Specific Software Sandia National Laboratories created OVIS to monitor and analyze how applications use resources This software aims to monitor more than just the Lustre layer of an application pap156.pdf OVIS must run client-side, where most of the other monitoring tools presented here are run from the Lustre servers Michael Brim and Joshua Lothian from ORNL created Monitoring Extreme-scale Lustre Toolkit (MELT) This software uses a tree-based infrastructure to scale out Aimed at being less resource intensive than solutions like collectl
24 Summary How to start and stop a Lustre file system Steps to automate these procedures Software, both general and specialized, to monitor a file system
25 Acknowledgements This work was supported by the United States Department of Defense (DoD) and used resources of the DoD-HPC at Oak Ridge National Laboratory.
Oak Ridge National Laboratory Computing and Computational Sciences Directorate. Lustre Crash Dumps And Log Files
Oak Ridge National Laboratory Computing and Computational Sciences Directorate Lustre Crash Dumps And Log Files Jesse Hanley Rick Mohr Sarp Oral Michael Brim Nathan Grodowitz Gregory Koenig Jason Hill
Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK 05-25-2011
Determining health of Lustre filesystems at scale Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK 05-25-2011 Overview Overview of architectures Lustre health and importance Storage
Determining the health of Lustre filesystems at scale
Determining the health of Lustre filesystems at scale Jason J. Hill, Dustin B. Leverman, Scott M. Koch, David A. Dillow Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory {hilljj,leverman,smkoch,dillowda}@ornl.gov
Monitoring Tools for Large Scale Systems
Monitoring Tools for Large Scale Systems Ross Miller, Jason Hill, David A. Dillow, Raghul Gunasekaran, Galen Shipman, Don Maxwell Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
Cray Lustre File System Monitoring
Cray Lustre File System Monitoring esfsmon Jeff Keopp OSIO/ES Systems Cray Inc. St. Paul, MN USA [email protected] Harold Longley OSIO Cray Inc. St. Paul, MN USA [email protected] Abstract The Cray Data Management
LUSTRE USAGE MONITORING What the &#%@ are users doing with my filesystem?
LUSTRE USAGE MONITORING What the &#%@ are users doing with my filesystem? Kilian CAVALOTTI, Thomas LEIBOVICI CEA/DAM LAD 13 SEPTEMBER 16-17, 2013 CEA 25 AVRIL 2013 PAGE 1 MOTIVATION Lustre monitoring is
Lustre tools for ldiskfs investigation and lightweight I/O statistics
Lustre tools for ldiskfs investigation and lightweight I/O statistics Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools
How To Write A Libranthus 2.5.3.3 (Libranthus) On Libranus 2.4.3/Libranus 3.5 (Librenthus) (Libre) (For Linux) (
LUSTRE/HSM BINDING IS THERE! LAD'13 Aurélien Degrémont SEPTEMBER, 17th 2013 CEA 10 AVRIL 2012 PAGE 1 AGENDA Presentation Architecture Components Examples Project status LAD'13
New and Improved Lustre Performance Monitoring Tool. Torben Kling Petersen, PhD Principal Engineer. Chris Bloxham Principal Architect
New and Improved Lustre Performance Monitoring Tool Torben Kling Petersen, PhD Principal Engineer Chris Bloxham Principal Architect Lustre monitoring Performance Granular Aggregated Components Subsystem
High Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
Lustre & Cluster. - monitoring the whole thing Erich Focht
Lustre & Cluster - monitoring the whole thing Erich Focht NEC HPC Europe LAD 2014, Reims, September 22-23, 2014 1 Overview Introduction LXFS Lustre in a Data Center IBviz: Infiniband Fabric visualization
Monitoring the Lustre* file system to maintain optimal performance. Gabriele Paciucci, Andrew Uselton
Monitoring the Lustre* file system to maintain optimal performance Gabriele Paciucci, Andrew Uselton Outline Lustre* metrics Monitoring tools Analytics and presentation Conclusion and Q&A 2 Why Monitor
LLNL Production Plans and Best Practices
LLNL Production Plans and Best Practices Lustre User Group, 2014. Miami, FL April 8, 2014 LLNL-PRES-652453 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
Lustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com. 2014 DataDirect Networks. All Rights Reserved.
April 8th - 10th, 2014 LUG14 LUG14 Lustre Log Analyzer Kalpak Shah DataDirect Networks Lustre Log Analysis Requirements Need scripts to parse Lustre debug logs Only way to effectively use the logs for
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
Practices on Lustre File-level RAID
Practices on Lustre File-level RAID Qi Chen [email protected] Jiangnan Institute of Computing Technology Agenda Background motivations practices on client-driven file-level RAID Server-driven file-level
UNIX - FILE SYSTEM BASICS
http://www.tutorialspoint.com/unix/unix-file-system.htm UNIX - FILE SYSTEM BASICS Copyright tutorialspoint.com A file system is a logical collection of files on a partition or disk. A partition is a container
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems [email protected] Big Data Invasion We hear so much on Big Data and
Application Performance for High Performance Computing Environments
Application Performance for High Performance Computing Environments Leveraging the strengths of Computationally intensive applications With high performance scale out file serving In data storage modules
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre University of Cambridge, UIS, HPC Service Authors: Wojciech Turek, Paul Calleja, John Taylor
Lustre Monitoring with OpenTSDB
Lustre Monitoring with OpenTSDB 2015/9/22 DataDirect Networks Japan, Inc. Shuichi Ihara 2 Lustre Monitoring Background Lustre is a black box Users and Administrators want to know what s going on? Find
Lustre performance monitoring and trouble shooting
Lustre performance monitoring and trouble shooting March, 2015 Patrick Fitzhenry and Ian Costello 2012 DataDirect Networks. All Rights Reserved. 1 Agenda EXAScaler (Lustre) Monitoring NCI test kit hardware
High Availability and Backup Strategies for the Lustre MDS Server
HA and Backup Methods for Lustre Hepix May '08 [email protected] 1 High Availability and Backup Strategies for the Lustre MDS Server Spring 2008 Karin Miers / GSI HA and Backup Methods for Lustre Hepix May
Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services
Cognos8 Deployment Best Practices for Performance/Scalability Barnaby Cole Practice Lead, Technical Services Agenda > Cognos 8 Architecture Overview > Cognos 8 Components > Load Balancing > Deployment
How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations
How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations Jan van Doorn Distinguished Engineer VSS CDN Engineering 1 What is a CDN? 2 Content Router get customer
The Use of Flash in Large-Scale Storage Systems. [email protected]
The Use of Flash in Large-Scale Storage Systems [email protected] 1 Seagate s Flash! Seagate acquired LSI s Flash Components division May 2014 Selling multiple formats / capacities today Nytro
高 通 量 科 学 计 算 集 群 及 Lustre 文 件 系 统. High Throughput Scientific Computing Clusters And Lustre Filesystem In Tsinghua University
高 通 量 科 学 计 算 集 群 及 Lustre 文 件 系 统 High Throughput Scientific Computing Clusters And Lustre Filesystem In Tsinghua University 清 华 信 息 科 学 与 技 术 国 家 实 验 室 ( 筹 ) 公 共 平 台 与 技 术 部 清 华 大 学 科 学 与 工 程 计 算 实 验
Installing MooseFS Step by Step Tutorial
Installing MooseFS Step by Step Tutorial Michał Borychowski MooseFS Support Manager [email protected] march 2010 Gemius SA Overview... 3 MooseFS install process on dedicated machines... 3 Master server
Incremental Backup Script. Jason Healy, Director of Networks and Systems
Incremental Backup Script Jason Healy, Director of Networks and Systems Last Updated Mar 18, 2008 2 Contents 1 Incremental Backup Script 5 1.1 Introduction.............................. 5 1.2 Design Issues.............................
SUSE Manager in the Public Cloud. SUSE Manager Server in the Public Cloud
SUSE Manager in the Public Cloud SUSE Manager Server in the Public Cloud Contents 1 Instance Requirements... 2 2 Setup... 3 3 Registration of Cloned Systems... 6 SUSE Manager delivers best-in-class Linux
STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)
10 th International Conference on Software Testing June 18 21, 2013 at Bangalore, INDIA by Sowmya Krishnan, Senior Software QA Engineer, Citrix Copyright: STeP-IN Forum and Quality Solutions for Information
Current Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
Lessons learned from parallel file system operation
Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association
Data Movement and Storage. Drew Dolgert and previous contributors
Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?
NetApp High-Performance Computing Solution for Lustre: Solution Guide
Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5
A Shared File System on SAS Grid Manger in a Cloud Environment
A Shared File System on SAS Grid Manger in a Cloud Environment ABSTRACT Ying-ping (Marie) Zhang, Intel Corporation, Chandler, Arizona Margaret Carver, SAS Institute, Cary, North Carolina Gabriele Paciucci,
Managing MySQL Scale Through Consolidation
Hello Managing MySQL Scale Through Consolidation Percona Live 04/15/15 Chris Merz, @merzdba DB Systems Architect, SolidFire Enterprise Scale MySQL Challenges Many MySQL instances (10s-100s-1000s) Often
Splunk for VMware Virtualization. Marco Bizzantino [email protected] Vmug - 05/10/2011
Splunk for VMware Virtualization Marco Bizzantino [email protected] Vmug - 05/10/2011 Collect, index, organize, correlate to gain visibility to all IT data Using Splunk you can identify problems,
PTC System Monitor Solution Training
PTC System Monitor Solution Training Patrick Kulenkamp June 2012 Agenda What is PTC System Monitor (PSM)? How does it work? Terminology PSM Configuration The PTC Integrity Implementation Drilling Down
Tiered Adaptive Storage
Tiered Adaptive Storage An engineered system for large scale archives, tiered storage, and beyond Harriet G. Coverston Versity Software, Inc. St. Paul, MN USA [email protected] Scott Donoho,
LMT Lustre Monitoring Tools
LMT Lustre Monitoring Tools April 13, 2011 Christopher Morrone, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344
IBRIX Fusion 3.1 Release Notes
Release Date April 2009 Version IBRIX Fusion Version 3.1 Release 46 Compatibility New Features Version 3.1 CLI Changes RHEL 5 Update 3 is supported for Segment Servers and IBRIX Clients RHEL 5 Update 2
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK KEY FEATURES PROVISION FROM BARE- METAL TO PRODUCTION QUICKLY AND EFFICIENTLY Controlled discovery with active control of your hardware Automatically
SAP HANA Disaster Recovery with Asynchronous Storage Replication Using Snap Creator and SnapMirror
Technical Report SAP HANA Disaster Recovery with Asynchronous Storage Replication Using Snap Creator and SnapMirror Nils Bauer, NetApp March 2014 TR-4279 The document describes the setup of a disaster
Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida
Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing
Tools and strategies to monitor the ATLAS online computing farm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Tools and strategies to monitor the ATLAS online computing farm S. Ballestrero 1,2, F. Brasolin 3, G. L. Dârlea 1,4, I. Dumitru 4, D. A. Scannicchio 5, M. S. Twomey
Hadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
High Performance, Open Source, Dell Lustre Storage System. Dell /Cambridge HPC Solution Centre. Wojciech Turek, Paul Calleja July 2010.
High Performance, Open Source, Dell Lustre Storage System Dell /Cambridge HPC Solution Centre Wojciech Turek, Paul Calleja July 2010 Dell Abstract The following paper was produced by the Dell Cambridge
A Shared File System on SAS Grid Manager * in a Cloud Environment
white paper A Shared File System on SAS Grid Manager * in a Cloud Environment Abstract The performance of a shared file system is a critical component of implementing SAS Grid Manager* in a cloud environment.
Maintaining Non-Stop Services with Multi Layer Monitoring
Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems [email protected] www.emindsys.com The approach Non-stop applications can t leave on their
Linux Filesystem Comparisons
Linux Filesystem Comparisons Jerry Feldman Boston Linux and Unix Presentation prepared in LibreOffice Impress Boston Linux and Unix 12/17/2014 Background My Background. I've worked as a computer programmer/software
Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
America s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,[email protected] Presented at IFIP Workshop on Failure Diagnosis, Chicago June
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014
Uncovering degraded application performance with LWM 2 Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 214 Motivation: Performance degradation Internal factors: Inefficient use of hardware
Do it Yourself System Administration
Do it Yourself System Administration Due to a heavy call volume, we are unable to answer your call at this time. Please remain on the line as calls will be answered in the order they were received. We
Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks
Oracle Business Intelligence Enterprise Edition (OBIEE) Training: Working with Oracle Business Intelligence Answers Introduction to Oracle BI Answers Working with requests in Oracle BI Answers Using advanced
Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems
Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The
Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.
Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage
Evolution from the Traditional Data Center to Exalogic: An Operational Perspective
An Oracle White Paper July, 2012 Evolution from the Traditional Data Center to Exalogic: 1 Disclaimer The following is intended to outline our general product capabilities. It is intended for information
Investigation of storage options for scientific computing on Grid and Cloud facilities
Investigation of storage options for scientific computing on Grid and Cloud facilities Overview Context Test Bed Lustre Evaluation Standard benchmarks Application-based benchmark HEPiX Storage Group report
Holistic Performance Analysis of J2EE Applications
Holistic Performance Analysis of J2EE Applications By Madhu Tanikella In order to identify and resolve performance problems of enterprise Java Applications and reduce the time-to-market, performance analysis
LSN 10 Linux Overview
LSN 10 Linux Overview ECT362 Operating Systems Department of Engineering Technology LSN 10 Linux Overview Linux Contemporary open source implementation of UNIX available for free on the Internet Introduced
Linux System Administration on Red Hat
Linux System Administration on Red Hat Kenneth Ingham September 29, 2009 1 Course overview This class is for people who are familiar with Linux or Unix systems as a user (i.e., they know file manipulation,
RHCSA 7RHCE Red Haf Linux Certification Practice
RHCSA 7RHCE Red Haf Linux Certification Practice Exams with Virtual Machines (Exams EX200 & EX300) "IcGraw-Hill is an independent entity from Red Hat, Inc., and is not affiliated with Red Hat, Inc. in
8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status
Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ
MYASIA CLOUD SERVER. Defining next generation of global storage grid User Guide AUG 2010, version 1.1
MYASIA CLOUD SERVER Defining next generation of global storage grid User Guide AUG 2010, version 1.1 Table of Content 1. Introduction.. 3 2. Getting Started... 4 3. Introduction to Cloud Server Management
Monitoring Remedy with BMC Solutions
Monitoring Remedy with BMC Solutions Overview How does BMC Software monitor Remedy with our own solutions? The challenge is many fold with a solution like Remedy and this does not only apply to Remedy,
Understanding Server Configuration Parameters and Their Effect on Server Statistics
Understanding Server Configuration Parameters and Their Effect on Server Statistics Technical Note V2.0, 3 April 2012 2012 Active Endpoints Inc. ActiveVOS is a trademark of Active Endpoints, Inc. All other
Creating a Cray System Management Workstation (SMW) Bootable Backup Drive
Creating a Cray System Management Workstation (SMW) Bootable Backup Drive This technical note provides the procedures to create a System Management Workstation (SMW) bootable backup drive. The purpose
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
February, 2015 Bill Loewe
February, 2015 Bill Loewe Agenda System Metadata, a growing issue Parallel System - Lustre Overview Metadata and Distributed Namespace Test setup and implementation for metadata testing Scaling Metadata
SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009
SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems Ed Simmonds and Jason Harrington 7/20/2009 Introduction For FEF, a monitoring system must be capable of monitoring thousands of servers and tens
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
NERSC File Systems and How to Use Them
NERSC File Systems and How to Use Them David Turner! NERSC User Services Group! Joint Facilities User Forum on Data- Intensive Computing! June 18, 2014 The compute and storage systems 2014 Hopper: 1.3PF,
Restoring a Suse Linux Enterprise Server 9 64 Bit on Dissimilar Hardware with CBMR for Linux 1.02
Cristie Bare Machine Recovery Restoring a Suse Linux Enterprise Server 9 64 Bit on Dissimilar Hardware with CBMR for Linux 1.02 This documentation shows how to restore or migrate a Linux system on dissimilar
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem INTRODUCTION As IT infrastructure has grown more complex, IT administrators and operators have struggled to retain control. Gone
The Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
Bigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
Lustre failover experience
Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012 TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2 Who we are
Computing forensics: a live analysis
April 18th, 2005 1 2 3 Objectives Evidence acquisition Recovery and examination of suspect digital evidence (think Warrick Brown on CSI) Hardware: servers, workstations, laptops, PDAs, mobiles, cameras
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
B.Sc (Computer Science) Database Management Systems UNIT-V
1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used
Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory
Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory A Little Context! The Five Golden Principles of Security! Know your system! Principle
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
The Lustre File System. Eric Barton Lead Engineer, Lustre Group Sun Microsystems
The Lustre File System Eric Barton Lead Engineer, Lustre Group Sun Microsystems 1 Lustre Today What is Lustre Deployments Community Topics Lustre Development Industry Trends Scalability Improvements Lustre
Drupal Performance Tuning
Drupal Performance Tuning By Jeremy Zerr Website: http://www.jeremyzerr.com @jrzerr http://www.linkedin.com/in/jrzerr Overview Basics of Web App Systems Architecture General Web
Red Hat Certifications: Red Hat Certified System Administrator (RHCSA)
Red Hat Certifications: Red Hat Certified System Administrator (RHCSA) Overview Red Hat is pleased to announce a new addition to its line of performance-based certifications Red Hat Certified System Administrator
Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH
Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch
<Insert Picture Here> Private Cloud with Fusion Middleware
Private Cloud with Fusion Middleware Duško Vukmanović Principal Sales Consultant, Oracle [email protected] The following is intended to outline our general product direction.
Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness
Master Thesis: Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness Alexander Fedulov 1 Agenda BonFIRE Project overview Motivation General System Architecture Monitoring
Automating Healthcare Claim Processing
Automating Healthcare Claim Processing How Splunk Software Helps to Manage and Control Both Processes and Costs CUSTOMER PROFILE Splunk customer profiles are a collection of innovative, in-depth use cases
