BRUCE PERENS OPEN SOURCE SERIES www.phptr.com/perens



Similar documents
Best Practices and Initial Investigation

The Definitive Guide. Monitoring the Data Center, Virtual Environments, and the Cloud. Don Jones

5 - Low Cost Ways to Increase Your

Hypercosm. Studio.

VERITAS NetBackup TM 6.0

The Data Access Handbook

Get the Better of Memory Leaks with Valgrind Whitepaper

Optimizing Linux Performance

How to Outsource Without Being a Ninnyhammer

Chapter 14 Analyzing Network Traffic. Ed Crowley

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Example of Standard API

EKT 332/4 COMPUTER NETWORK

elan Technology White Paper Why Linux?

Building Applications Using Micro Focus COBOL

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

Risks with web programming technologies. Steve Branigan Lucent Technologies

CSE 265: System and Network Administration. CSE 265: System and Network Administration

Using New Relic to Monitor Your Servers

Why Alerts Suck and Monitoring Solutions need to become Smarter

Monitoring, Tracing, Debugging (Under Construction)

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Ubuntu Linux Reza Ghaffaripour May 2008

CSE 265: System and Network Administration

Cross-Platform. Mac OS X ЧЯУ

In the same spirit, our QuickBooks 2008 Software Installation Guide has been completely revised as well.

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active

How Traditional Physical Backup Imaging Technology Fits Into a Virtual Backup Solution

STABLE & SECURE BANK lab writeup. Page 1 of 21

Using Term to Pierce an Internet Firewall mini HOWTO

RHCSA 7RHCE Red Haf Linux Certification Practice

Copyright 2010 You have giveaway rights to this report. Feel free to share.

Steps to Migrating to a Private Cloud

CSE 265: System and Network Administration. CSE 265: System and Network Administration

How to use PDFlib products with PHP

Why Endpoint Encryption Can Fail to Deliver

Enhanced Diagnostics Improve Performance, Configurability, and Usability

Cisco Change Management: Best Practices White Paper


Networking. Sixth Edition. A Beginner's Guide BRUCE HALLBERG

HP Insight Diagnostics Online Edition. Featuring Survey Utility and IML Viewer

Disclaimer. The author in no case shall be responsible for any personal or commercial damage that results due to misinterpretation of information.

co Characterizing and Tracing Packet Floods Using Cisco R

Active Directory Recovery: What It Is, and What It Isn t

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Application Setup help topics for printing

Foreword by Martin Fowler *

Project 2: Penetration Testing (Phase II)

Heroix Longitude Quick Start Guide V7.1

Contents. vii. Preface. P ART I THE HONEYNET 1 Chapter 1 The Beginning 3. Chapter 2 Honeypots 17. xix

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

IBM Tivoli Web Response Monitor

The Benefits of Verio Virtual Private Servers (VPS) Verio Virtual Private Server (VPS) CONTENTS

GCCSI. Ihr Dienstleister in:

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL

Introduction to Operating Systems

White Paper Perceived Performance Tuning a system for what really matters

VNC User Guide. Version 5.0. June 2012

Advice for Recommenders: How to write an effective Letter of Recommendation for applicants to the Stanford MBA Program

The Psychic Salesperson Speakers Edition

Becoming Proactive in Application Management and Monitoring

Server & Workstation Installation of Client Profiles for Windows

Why Your Job Search Isn t Working

Welcome to Cogeco Business Digital Phone Service

1735 East Bayshore Road, Suite 6B, Redwood City, CA

Real World Considerations for Implementing Desktop Virtualization

My DevOps Journey by Billy Foss, Engineering Services Architect, CA Technologies

QUICK START GUIDE. Draft twice the documents in half the time starting now.

TNC is an open architecture for network access control. If you re not sure what NAC is, we ll cover that in a second. For now, the main point here is

Introduction To Computer Networking

DevOps and Continuous Configuration Automation by Didier De Cock, Senior Principal Consultant, CA Technologies

Application-Centric Analysis Helps Maximize the Value of Wireshark

CA Nimsoft Monitor. Probe Guide for E2E Application Response Monitoring. e2e_appmon v2.2 series

Tivoli Endpoint Manager BigFix Dashboard

IMPROVE YOUR SUPPORT EXPERIENCE WITH DELL PREMIUM SUPPORT WITH SUPPORTASSIST TECHNOLOGY

Best Practices for Log File Management (Compliance, Security, Troubleshooting)

Chapter 13: Program Development and Programming Languages

Imaging License Server User Guide

Finding and Opening Documents

SharePoint Managed Services: How to Make SharePoint Work for You

Foglight Cartridge for Active Directory Installation Guide

NETKEEPER Help Desk Captain SQL Installation with MSDE

TELE 301 Network Management

Pay per Click Success 5 Easy Ways to Grow Sales and Lower Costs

SGI NAS. Quick Start Guide a

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

SSL Tunnels. Introduction

QAD Enterprise Applications. Training Guide Demand Management 6.1 Technical Training

24x7 Scheduler Multi-platform Edition 5.2

Symantec NetBackup Troubleshooting Guide

using version control in system administration

The Definitive Guide. Active Directory Troubleshooting, Auditing, and Best Practices Edition Don Jones

Get quick control over your Linux server with server commands

Basic System. Vyatta System. REFERENCE GUIDE Using the CLI Working with Configuration System Management User Management Logging VYATTA, INC.

Red Hat Enterprise Linux: The ideal platform for running your Oracle database

The Comprehensive Interview

How to Turn Your Network into a Strategic Business Asset with Purview EBOOK

Operating System Structures

ZCorum s Ask a Broadband Expert Series:

Transcription:

Self-Service Linux

BRUCE PERENS OPEN SOURCE SERIES www.phptr.com/perens Java Application Development on Linux Carl Albing and Michael Schwarz C++ GUI Programming with Qt 3 Jasmin Blanchette and Mark Summerfield Managing Linux Systems with Webmin: System Administration and Module Development Jamie Cameron Understanding the Linux Virtual Memory Manager Mel Gorman PHP 5 Power Programming Andi Gutmans, Stig Bakken, and Derick Rethans Linux Quick Fix Notebook Peter Harrison Implementing CIFS: The Common Internet File System Christopher Hertel Open Source Security Tools: A Practical Guide to Security Applications Tony Howlett Apache Jakarta Commons: Reusable Java Components Will Iverson Embedded Software Development with ecos Anthony Massa Rapid Application Development with Mozilla Nigel McFarlane Subversion Version Control: Using the Subversion Version Control System in Development Projects William Nagel Intrusion Detection with SNORT: Advanced IDS Techniques Using SNORT, Apache, MySQL, PHP, and ACID Rafeeq Ur Rehman Cross-Platform GUI Programming with wxwidgets Julian Smart and Kevin Hock with Stefan Csomor Samba-3 by Example, Second Edition: Practical Exercises to Successful Deployment John H. Terpstra The Official Samba-3 HOWTO and Reference Guide, Second Edition John H. Terpstra and Jelmer R. Vernooij, Editors Self-Service Linux : Mastering the Art of Problem Determination Mark Wilding and Dan Behman

Self-Service Linux Mastering the Art of Problem Determination Mark Wilding and Dan Behman PRENTICE HALL Professional Technical Reference Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo Singapore Mexico City

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U. S. Corporate and Government Sales (800) 382-3419 corpsales@pearsontechgroup.com For sales outside the U. S., please contact: International Sales international@pearsoned.com Visit us on the Web: www.phptr.com Library of Congress Number: 2005927150 Copyright 2006 Pearson Education, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http:// www.opencontent.org/openpub/). ISBN 0-13-147751-X Text printed in the United States on recycled paper at R.R. Donnelley in Crawfordsville, Indiana. First printing, September, 2005

I would like to dedicate this book to my wife, Caryna, whose relentless nagging and badgering forced me to continue working on this book when nothing else could. Just kidding... Without Caryna s support and understanding, I could never have written this book. Not only did she help me find time to write, she also spent countless hours formatting the entire book for production. I would also like to dedicate this book to my two sons, Rhys and Dylan, whose boundless energy acted as inspiration throughout the writing of this book. Mark Wilding Without the enduring love and patience of my wife Kim, this laborous project would have halted long ago. I dedicate this book to her, as well as to my beautiful son Nicholas, my family, and all of the Botzangs and Mayos. Dan Behman

Contents Preface Chapter 1: Best Practices and Initial Investigation Chapter 2: strace and System Call Tracing Explained Chapter 3: The /proc Filesystem Chapter 4: Compiling Chapter 5: The Stack Chapter 6: The GNU Debugger (GDB) Chapter 7: Linux System Crashes and Hangs Chapter 8: Kernel Debugging with KDB Chapter 9: ELF: Executable and Linking Format A: The Toolbox B: Data Collection Script Index

Contents Preface 1 Best Practices and Initial Investigation 1 1.1 Introduction 1 1.2 Getting Your System(s) Ready for Effective Problem Determination 2 1.3 The Four Phases of Investigation 3 1.3.1 Phase #1: Initial Investigation Using Your Own Skills 5 1.3.2 Phase #2: Searching the Internet Effectively 9 1.3.3 Phase #3: Begin Deeper Investigation (Good Problem Investigation Practices) 12 1.3.4 Phase #4: Getting Help or New Ideas 21 1.4 Technical Investigation 28 1.4.1 Symptom Versus Cause 28 1.5 Troubleshooting Commercial Products 38 1.6 Conclusion 39 xvii 2 strace and System Call Tracing Explained 41 2.1 Introduction 41 2.2 What Is strace? 41 2.2.1 More Information from the Kernel Side 45 2.2.2 When to Use It 48 2.2.3 Simple Example 49 2.2.4 Same Program Built Statically 53 2.3 Important strace Options 54 2.3.1 Following Child Processes 54 2.3.2 Timing System Call Activity 55

x Contents 2.3.3 Verbose Mode 57 2.3.4 Tracing a Running Process 59 2.4 Effects and Issues of Using strace 60 2.4.1 strace and EINTR 61 2.5 Real Debugging Examples 62 2.5.1 Reducing Start Up Time by Fixing LD_LIBRARY_PATH 62 2.5.2 The PATH Environment Variable 65 2.5.3 stracing inetd or xinetd (the Super Server) 66 2.5.4 Communication Errors 68 2.5.5 Investigating a Hang Using strace 69 2.5.6 Reverse Engineering (How the strace Tool Itself Works) 71 2.6 System Call Tracing Examples 74 2.6.1 Sample Code 75 2.6.2 The System Call Tracing Code Explained 87 2.7 Conclusion 88 3 The /proc Filesystem 89 3.1 Introduction 89 3.2 Process Information 90 3.2.1 /proc/self 90 3.2.2 /proc/<pid> in More Detail 91 3.2.3 /proc/<pid>/cmdline 107 3.2.4 /proc/<pid>/environ 107 3.2.5 /proc/<pid>/mem 107 3.2.6 /proc/<pid>/fd 108 3.2.7 /proc/<pid>/mapped base 108 3.3 Kernel Information and Manipulation 109 3.3.1 /proc/cmdline 109 3.3.2 /proc/config.gz or /proc/sys/config.gz 109 3.3.3 /proc/cpufreq 109 3.3.4 /proc/cpuinfo 110 3.3.5 /proc/devices 110 3.3.6 /proc/kcore 111 3.3.7 /proc/locks 111 3.3.8 /proc/meminfo 111 3.3.9 /proc/mm 111 3.3.10 /proc/modules 112 3.3.11 /proc/net 112 3.3.12 /proc/partitions 112 3.3.13 /proc/pci 113 3.3.14 /proc/slabinfo 113

Contents xi 3.4 System Information and Manipulation 113 3.4.1 /proc/sys/fs 113 3.4.2 /proc/sys/kernel 115 3.4.3 /proc/sys/vm 120 3.5 Conclusion 120 4 Compiling 121 4.1 Introduction 121 4.2 The GNU Compiler Collection 121 4.2.1 A Brief History of GCC 121 4.2.2 GCC Version Compatibility 122 4.3 Other Compilers 122 4.4 Compiling the Linux Kernel 123 4.4.1 Obtaining the Kernel Source 123 4.4.2 Architecture Specific Source 124 4.4.3 Working with Kernel Source Compile Errors 124 4.4.4 General Compilation Problems 128 4.5 Assembly Listings 133 4.5.1 Purpose of Assembly Listings 134 4.5.2 Generating Assembly Listings 135 4.5.3 Reading and Understanding an Assembly Listing 136 4.6 Compiler Optimizations 140 4.7 Conclusion 149 5 The Stack 151 5.1 Introduction 151 5.2 A Real-World Analogy 152 5.3 Stacks in x86 and x86-64 Architectures 153 5.4 What Is a Stack Frame? 157 5.5 How Does the Stack Work? 159 5.5.1 The BP and SP Registers 159 5.5.2 Function Calling Conventions 162 5.6 Referencing and Modifying Data on the Stack 171 5.7 Viewing the Raw Stack in a Debugger 173 5.8 Examining the Raw Stack in Detail 176 5.8.1 Homegrown Stack Traceback Function 180 5.9 Conclusion 191 6 The GNU Debugger (GDB) 193 6.1 Introduction 193 6.2 When to Use a Debugger 194 6.3 Command Line Editing 195

xii Contents 6.4 Controlling a Process with GDB 196 6.4.1 Running a Program Off the Command Line with GDB 197 6.4.2 Attaching to a Running Process 199 6.4.3 Use a Core File 200 6.5 Examining Data, Memory, and Registers 204 6.5.1 Memory Map 204 6.5.2 Stack 206 6.5.3 Examining Memory and Variables 210 6.5.4 Register Dump 217 6.6 Execution 220 6.6.1 The Basic Commands 221 6.6.2 Settings for Execution Control Commands 223 6.6.3 Breakpoints 228 6.6.4 Watchpoints 230 6.6.5 Display Expression on Stop 234 6.6.6 Working with Shared Libraries 235 6.7 Source Code 238 6.8 Assembly Language 240 6.9 Tips and Tricks 241 6.9.1 Attaching to a Process Revisited 241 6.9.2 Finding the Address of Variables and Functions 244 6.9.3 Viewing Structures in Executables without Debug Symbols 246 6.9.4 Understanding and Dealing with Endian-ness 250 6.10 Working with C++ 252 6.10.1 Global Constructors and Destructors 252 6.10.2 Inline Functions 256 6.10.3 Exceptions 257 6.11 Threads 260 6.11.1 Running Out of Stack Space 265 6.12 Data Display Debugger (DDD) 266 6.12.1 The Data Display Window 268 6.12.2 Source Code Window 272 6.12.3 Machine Language Window 273 6.12.4 GDB Console Window 274 6.13 Conclusion 274 7 Linux System Crashes and Hangs 275 7.1 Introduction 275 7.2 Gathering Information 275 7.2.1 Syslog Explained 276 7.2.2 Setting up a Serial Console 277

Contents xiii 7.2.3 Connecting the Serial Null-Modem Cable 278 7.2.4 Enabling the Serial Console at Startup 279 7.2.5 Using SysRq Kernel Magic 281 7.2.6 Oops Reports 281 7.2.7 Adding a Manual Kernel Trap 281 7.2.8 Examining an Oops Report 284 7.2.9 Determining the Failing Line of Code 289 7.2.10 Kernel Oopses and Hardware 293 7.2.11 Setting up cscope to Index Kernel Sources 294 7.3 Conclusion 295 8 Kernel Debugging with KDB 297 8.1 Introduction 297 8.2 Enabling KDB 297 8.3 Using KDB 299 8.3.1 Activating KDB 299 8.3.2 Resuming Normal Execution 300 8.3.3 Basic Commands 300 8.4 Conclusion 305 9 ELF: Executable and Linking Format 307 9.1 Introduction 307 9.2 Concepts and Definitions 309 9.2.1 Symbol 309 9.2.2 Object Files, Shared Libraries, Executables, and Core Files 311 9.2.3 Linking 314 9.2.4 Run Time Linking 318 9.2.5 Program Interpreter / Run Time Linker 318 9.3 ELF Header 318 9.4 Overview of Segments and Sections 324 9.5 Segments and the Program Header Table 325 9.5.1 Text and Data Segments 329 9.6 Sections and the Section Header Table 331 9.6.1 String Table Format 335 9.6.2 Symbol Table Format 335 9.6.3 Section Names and Types 338 9.7 Relocation and Position Independent Code (PIC) 362 9.7.1 PIC vs. non-pic 363 9.7.2 Relocation and Position Independent Code 366 9.7.3 Relocation and Linking 367 9.8 Stripping an ELF Object 371

xiv Contents 9.9 Program Interpreter 372 9.9.1 Link Map 376 9.10 Symbol Resolution 377 9.11 Use of Weak Symbols for Problem Investigations 382 9.12 Advanced Interception Using Global Offset Table 386 9.13 Source Files 390 9.14 ELF APIs 392 9.15 Other Information 392 9.16 Conclusion 392 A The Toolbox 393 A.1 Introduction 393 A.2 Process Information and Debugging 393 A.2.1 Tool: GDB 393 A.2.2 Tool: ps 393 A.2.3 Tool: strace (system call tracer) 394 A.2.4 Tool: /proc filesystem 394 A.2.5 Tool: DDD (Data Display Debugger) 394 A.2.6 Tool: lsof (List Open Files) 394 A.2.7 Tool: ltrace (library call tracer) 395 A.2.8 Tool: time 395 A.2.9 Tool: top 395 A.2.10 Tool: pstree 396 A.3 Network 396 A.3.1 Tool: traceroute 396 A.3.2 File: /etc/hosts 396 A.3.3 File: /etc/services 396 A.3.4 Tool: netstat 397 A.3.5 Tool: ping 397 A.3.6 Tool: telnet 397 A.3.7 Tool: host/nslookup 397 A.3.8 Tool: ethtool 398 A.3.9 Tool: ethereal 398 A.3.10 File: /etc/nsswitch.conf 398 A.3.11 File: /etc/resolv.conf 398 A.4 System Information 399 A.4.1 Tool: vmstat 399 A.4.2 Tool: iostat 399 A.4.3 Tool: nfsstat 399 A.4.4 Tool: sar 400 A.4.5 Tool: syslogd 400 A.4.6 Tool: dmesg 400

Contents xv A.4.7 Tool: mpstat 400 A.4.8 Tool: procinfo 401 A.4.9 Tool: xosview 401 A.5 Files and Object Files 401 A.5.1 Tool: file 401 A.5.2 Tool: ldd 401 A.5.3 Tool: nm 402 A.5.4 Tool: objdump 402 A.5.5 Tool: od 402 A.5.6 Tool: stat 402 A.5.7 Tool: readelf 403 A.5.8 Tool: strings 403 A.6 Kernel 403 A.6.1 Tool: KDB 403 A.6.2 Tool: KGDB 403 A.6.3 Tool: ksymoops 404 A.7 Miscellaneous 404 A.7.1 Tool: VMWare Workstation 404 A.7.2 Tool: VNC Server 405 A.7.3 Tool: VNC Viewer 405 B Data Collection Script 407 B.1 Overview 407 B.1.1 -thorough 409 B.1.2 -perf, -hang <pid>, -trap, -error <cmd> 409 B.2 Running the Script 410 B.3 The Script Source 410 B.4 Disclaimer 419 Index 421

About the Authors Mark Wilding is a senior developer at IBM who currently specializes in serviceability technologies, UNIX, and Linux. With over 15 years of experience writing software, Mark has extensive expertise in operating systems, networks, C/C++ development, serviceability, quality engineering, and computer hardware. Dan Behman is a member of the DB2 UDB for Linux Platform Exploitation development team at the Toronto IBM Software Lab. He has over 10 years of experience with Linux, and has been involved in porting and enabling DB2 UDB on the latest architectures that Linux supports, including x86-64, zseries, and POWER platforms.

Preface xvii Preface Linux is the ultimate choice for home and business users. It is powerful, as stable as any commercial operating system, secure, and best of all, it is open source. One of the biggest deciding factors for whether to use Linux at home or for your business can be service and support. Because Linux is developed by thousands of volunteers from around the world, it is not always clear who to turn to when something goes wrong. In the true spirit of Linux, there is a slightly different approach to support than the commercial norm. After all, Linux represents an unparalleled community of experts, it includes industry leading problem determination tools, and of course, the product itself includes the source code. These resources are in addition to the professional Linux support services that are available from companies, such as IBM, and the various Linux vendors, such as Redhat and SUSE. Making the most of these additional resources is called self-service and is the main topic covered by this book. Self-service on Linux means different things to different people. For those who use Linux at home, it means a more enjoyable Linux experience. For those

xviii Preface who use Linux at work, being able to quickly and effectively diagnose problems on Linux can increase their value as employees as well as their marketability. For corporate leaders deciding whether to adopt Linux as part of the corporate strategy, self-service for Linux means reduced operation costs and increased Return on Investment (ROI) for any Linux adoption strategy. Regardless of what type of Linux user you are, it is important to make the most of your Linux experience and investment. WHAT IS THIS BOOK ABOUT? In a nutshell, this book is about effectively and efficiently diagnosing problems that occur in the Linux environment. It covers good investigation practices, how to use the information and resources on the Internet, and then dives right into detail describing how to use the most important problem determination tools that Linux has to offer. Chapter 1 is like a crash course on effective problem determination practices, which will help you to diagnose problems like an expert. It covers where and how to look for information on the Internet as well as how to start investigating common types of problems. Chapter 2 covers strace, which is arguably the most frequently used problem determination tool in Linux. This chapter includes both practical usage information as well as details about how strace works. It also includes source code for a simple strace tool and details about how the underlying functionality works with the kernel through the ptrace interface. Chapter 3 is about the /proc filesystem, which contains a wealth of information about the hardware, kernel, and processes that are running on the system. The purpose of this chapter is to point out and examine some of the more advanced features and tricks primarily related to problem determination and system diagnosis. For example, the chapter covers how to use the SysRq Kernel Magic hotkey with /proc/sys/kernel/sysrq. Chapter 4 provides detailed information about compiling. Why does a book about debugging on Linux include a chapter about compiling? Well, the beginning of this preface mentioned that diagnosing problems in Linux is different than that on commercial environments. The main reason behind this is that the source code is freely available for all of the open source tools and the operating system itself. This chapter provides vital information whether you need to recompile an open source application with debug information (as is often the case), whether you need to generate an assembly language listing for a tough problem (that is, to find the line of code for a trap), or whether you run into a problem while recompiling the Linux kernel itself.

Preface xix Chapter 5 covers intimate details about the stack, one of the most important and fundamental concepts of a computer system. Besides explaining all the gory details about the structure of a stack (which is pretty much required knowledge for any Linux expert), the chapter also includes and explains source code that can be used by the readers to generate stack traces from within their own tools and applications. The code examples are not only useful to illustrate how the stack works but they can save real time and debugging effort when included as part of an application s debugging facilities. Chapter 6 takes an in-depth and detailed look at debugging applications with the GNU Debugger (GDB) and includes an overview of the Data Display Debugger (DDD) graphical user interface. Linux has an advantage over most other operating systems in that it includes a feature rich debugger, GDB, for free. Debuggers can be used to debug many types of problems, and given that GDB is free, it is well worth the effort to understand the basic as well as the more advanced features. This chapter covers hard-to-find details about debugging C++ applications, threaded applications, as well as numerous best practices. Have you ever spawned an xterm to attach to a process with GDB? This chapter will show you how and why! Chapter 7 provides a detailed overview of system crashes and hangs. With proprietary operating systems (OSs), a system crash or hang almost certainly requires you to call the OS vendor for help. However with Linux, the end user can debug a kernel problem on his or her own or at least identify key information to search for known problems. If you do need to get an expert involved, knowing what to collect will help you to get the right data quickly for a fast diagnosis. This chapter describes everything from how to attach a serial console to how to find the line of code for a kernel trap (an oops ). For example, the chapter provides step-by-step details for how to manually add a trap in the kernel and then debug it to find the resulting line of code. Chapter 8 covers more details about debugging the kernel or debugging with the kernel debugger, kdb. The chapter covers how to configure and enable kdb on your system as well as some practical commands that most Linux users can use without being a kernel expert. For example, this chapter shows you how to find out what a process is doing from within the kernel, which can be particularly useful if the process is hung and not killable. Chapter 9 is a detailed, head-on look at Executable and Linking Format (ELF). The details behind ELF are often ignored or just assumed to work. This is really unfortunate because a thorough understanding of ELF can lead to a whole new world of debugging techniques. This chapter covers intimate but practical details of the underlying ELF file format as well as tips and tricks that few people know. There is even sample code and step-by-step instructions

xx Preface for how to override functions using LD_PRELOAD and how to use the global offset table and the GDB debugger to intercept functions manually and redirect them to debug versions. Appendix A is a toolbox that outlines the most useful tools, facilities, and files on Linux. For each tool, there is a description of when it is useful and where to get the latest copy. Appendix B includes a production-ready data collection script that is especially useful for mission-critical systems or those who remotely support customers on Linux. The data collection script alone can save many hours or even days for debugging a remote problem. Note: The source code used in this book can be found at http:// www.phptr.com/title/013147751x. Note: A code continuation character,, appears at the beginning of code lines that have wrapped down from the line above it. Lastly, as we wrote this book it became clear to us that we were covering the right information. Reviewers often commented about how they were able to use the information immediately to solve real problems, not the problems that may come in the future or may have happened in the past, but real problems that people were actually struggling with when they reviewed the chapters. We also found ourselves referring to the content of the book to help solve problems as they came up. We hope you find it as useful as it has been to those who have read it thus far. WHO IS THIS BOOK FOR? This book has useful information for any Linux user but is certainly geared more toward the Linux professional. This includes Linux power users, Linux administrators, developers who write software for Linux, and support staff who support products on Linux. Readers who casually use Linux at home will benefit also, as long as they either have a basic understanding of Linux or are at least willing to learn more about it the latter being most important. Ultimately, as Linux increases in popularity, there are many seasoned experts who are facing the challenge of translating their knowledge and experience to the Linux platform. Many are already experts with one or more operating systems except that they lack specific knowledge about the various command line incantations or ways to interpret their knowledge for Linux.

Preface xxi This book will help such experts to quickly adapt their existing skill set and apply it effectively on Linux. This power-packed book contains real industry experience on many topics and very hard-to-find information. Without a doubt, it is a must have for any developer, tester, support analyst, or anyone who uses Linux. ACKNOWLEDGMENTS Anyone who has written a book will agree that it takes an enormous amount of effort. Yes, there is a lot of work for the authors, but without the many key people behind the scenes, writing a book would be nearly impossible. We would like to thank all of the people who reviewed, supported, contributed, or otherwise made this book possible. First, we would like to thank the reviewers for their time, patience, and valuable feedback. Besides the typos, grammatical errors, and technical omissions, in many cases the reviewers allowed us to see other vantage points, which in turn helped to make the content more well-rounded and complete. In particular, we would like to thank Richard Moore, for reviewing the technical content of many chapters; Robert Haskins, for being so thorough with his reviews and comments; Mel Gorman, for his valuable feedback on the ELF (Executable and Linking Format) chapter; Scott Dier, for his many valuable comments; Jan Kritter, for reviewing pretty much the entire book; and Joyce Coleman, Ananth Narayan, Pascale Stephenson, Ben Elliston, Hien Nguyen, Jim Keniston, as well as the IBM Linux Technology Center, for their valuable feedback. We would also like to thank the excellent engineers from SUSE for helping to answer many deep technical questions, especially Andi Kleen, Frank Balzer, and Michael Matz. We would especially like to thank our wives and families for the support, encouragement, and giving us the time to work on this project. Without their support, this book would have never gotten past the casual conversation we had about possibly writing one many months ago. We truly appreciate the sacrifices that they have made to allow us to finish this book. Last of all, we would like to thank the Open Source Community as a whole. The open source movement is a truly remarkable phenomenon that has and will continue to raise the bar for computing at home or for commercial environments. Our thanks to the Open Source Community is not specifically for this book but rather for their tireless dedication and technical prowess that make Linux and all open source products a reality. It is our hope that the content in this book will encourage others to adopt, use or support open source products and of course Linux. Every little bit helps. Thanks for reading this book.

xxii Preface OTHER The history and evolution of the Linux operating system is fascinating and certainly still being written with new twists popping up all the time. Linux itself comprises only the kernel of the whole operating system. Granted, this is the single most important part, but everything else surrounding the Linux kernel is made up mostly of GNU free software. There are two major things that GNU software and the Linux kernel have in common. The first is that the source code for both is freely accessible. The second is that they have been developed and continue to be developed by many thousands of volunteers throughout the world, all connecting and sharing ideas and work through the Internet. Many refer to this collaboration of people and resources as the Open Source Community. The Open Source Community is much like a distributed development team with skills and experience spanning many different areas of computer science. The source code that is written by the Open Source Community is available for anyone and everyone to see. Not only can this make problem determination easier, having such a large and diverse group of people looking at the code can reduce the number of defects and improve the security of the source code. Open source software is open to innovations as much as criticism, both helping to improve the quality and functionality of the software. One of the most common concerns about adopting Linux is service and support. However, Linux has the Open Source Community, a wide range of freely available problem determination tools, the source code, and the Internet itself as a source of information including numerous sites and newsgroups dedicated to Linux. It is important for every Linux user to understand the resources and tools that are available to help them diagnose problems. That is the purpose of this book. It is not intended to be a replacement to a support contract, nor does it require one. If you have one, this book is an enhancement that will be sure to help you make the most of your existing support contract.

C H A P T E R 1 Best Practices and Initial Investigation 1.1 INTRODUCTION Your boss is screaming, your customers are screaming, you re screaming Whatever the situation, there is a problem, and you need to solve it. Remember those old classic MUD games? For those who don t, a Multi-User Dungeon or MUD was the earliest incarnation of the online video game. Users played the game through a completely non-graphical text interface that described the surroundings and options available to the player and then prompted the user with what to do next. You are alone in a dark cubicle. To the North is your boss s office, to the West is your Team Lead s cubicle, to the East is a window opening out to a five-floor drop, and to the South is a kitchenette containing a freshly brewed pot of coffee. You stare at your computer screen in bewilderment as the phone rings for the fifth time in as many minutes indicating that your users are unable to connect to their server. Command> What will you do? Will you run toward the East and dive through the open window? Will you go grab a hot cup of coffee to ensure you stay alert for the long night ahead? A common thing to do in these MUD games was to examine your surroundings further, usually done by the look command. Command> look Your cubicle is a mess of papers and old coffee cups. The message waiting light on your phone is burnt out from flashing for so many months. Your email inbox is overflowing with unanswered emails. On top of the mess is the brand new book you ordered entitled Self-Service Linux. You need a shower. Command> read book Self-Service Linux You still need a shower. 1

2 Best Practices and Initial Investigation Chap. 1 This tongue-in-cheek MUD analogy aside, what can this book really do for you? This book includes chapters that are loaded with useful information to help you diagnose problems quickly and effectively. This first chapter covers best practices for problem determination and points to the more in-depth information found in the chapters throughout this book. The first step is to ensure that your Linux system(s) are configured for effective problem determination. 1.2 GETTING YOUR SYSTEM(S) READY FOR EFFECTIVE PROBLEM DETERMINATION The Linux problem determination tools and facilities are free, which begs the question: Why not install them? Without these tools, a simple problem can turn into a long and painful ordeal that can affect a business and/or your personal time. Before reading through the rest of the book, take some time to make sure the following tools are installed on your system(s). These tools are just waiting to make your life easier and/or your business more productive: strace: The strace tool traces the system calls, special functions that interact with the operating system. You can use this for many types of problems, especially those that relate to the operating system. ltrace: The ltrace tool traces the functions that a process calls. This is similar to strace, but the called functions provide more detail. lsof: The lsof tool lists all of the open files on the operating system (OS). When a file is open, the OS returns a numeric file descriptor to the process to use. This tool lists all of the open files on the OS with their respective process IDs and file descriptors. top: This tool lists the top processes that are running on the system. By default it sorts by the amount of current CPU being consumed by a process. traceroute/tcptraceroute: These tools can be used to trace a network route (or at least one direction of it). ping: Ping simply checks whether a remote system can respond. Sometimes firewalls block the network packets ping uses, but it is still very useful.