1 Self-Service Linux
2 BRUCE PERENS OPEN SOURCE SERIES Java Application Development on Linux Carl Albing and Michael Schwarz C++ GUI Programming with Qt 3 Jasmin Blanchette and Mark Summerfield Managing Linux Systems with Webmin: System Administration and Module Development Jamie Cameron Understanding the Linux Virtual Memory Manager Mel Gorman PHP 5 Power Programming Andi Gutmans, Stig Bakken, and Derick Rethans Linux Quick Fix Notebook Peter Harrison Implementing CIFS: The Common Internet File System Christopher Hertel Open Source Security Tools: A Practical Guide to Security Applications Tony Howlett Apache Jakarta Commons: Reusable Java Components Will Iverson Embedded Software Development with ecos Anthony Massa Rapid Application Development with Mozilla Nigel McFarlane Subversion Version Control: Using the Subversion Version Control System in Development Projects William Nagel Intrusion Detection with SNORT: Advanced IDS Techniques Using SNORT, Apache, MySQL, PHP, and ACID Rafeeq Ur Rehman Cross-Platform GUI Programming with wxwidgets Julian Smart and Kevin Hock with Stefan Csomor Samba-3 by Example, Second Edition: Practical Exercises to Successful Deployment John H. Terpstra The Official Samba-3 HOWTO and Reference Guide, Second Edition John H. Terpstra and Jelmer R. Vernooij, Editors Self-Service Linux : Mastering the Art of Problem Determination Mark Wilding and Dan Behman
3 Self-Service Linux Mastering the Art of Problem Determination Mark Wilding and Dan Behman PRENTICE HALL Professional Technical Reference Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo Singapore Mexico City
4 Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U. S. Corporate and Government Sales (800) For sales outside the U. S., please contact: International Sales Visit us on the Web: Library of Congress Number: Copyright 2006 Pearson Education, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at ISBN X Text printed in the United States on recycled paper at R.R. Donnelley in Crawfordsville, Indiana. First printing, September, 2005
5 I would like to dedicate this book to my wife, Caryna, whose relentless nagging and badgering forced me to continue working on this book when nothing else could. Just kidding... Without Caryna s support and understanding, I could never have written this book. Not only did she help me find time to write, she also spent countless hours formatting the entire book for production. I would also like to dedicate this book to my two sons, Rhys and Dylan, whose boundless energy acted as inspiration throughout the writing of this book. Mark Wilding Without the enduring love and patience of my wife Kim, this laborous project would have halted long ago. I dedicate this book to her, as well as to my beautiful son Nicholas, my family, and all of the Botzangs and Mayos. Dan Behman
7 Contents Preface Chapter 1: Best Practices and Initial Investigation Chapter 2: strace and System Call Tracing Explained Chapter 3: The /proc Filesystem Chapter 4: Compiling Chapter 5: The Stack Chapter 6: The GNU Debugger (GDB) Chapter 7: Linux System Crashes and Hangs Chapter 8: Kernel Debugging with KDB Chapter 9: ELF: Executable and Linking Format A: The Toolbox B: Data Collection Script Index
9 Contents Preface 1 Best Practices and Initial Investigation Introduction Getting Your System(s) Ready for Effective Problem Determination The Four Phases of Investigation Phase #1: Initial Investigation Using Your Own Skills Phase #2: Searching the Internet Effectively Phase #3: Begin Deeper Investigation (Good Problem Investigation Practices) Phase #4: Getting Help or New Ideas Technical Investigation Symptom Versus Cause Troubleshooting Commercial Products Conclusion 39 xvii 2 strace and System Call Tracing Explained Introduction What Is strace? More Information from the Kernel Side When to Use It Simple Example Same Program Built Statically Important strace Options Following Child Processes Timing System Call Activity 55
10 x Contents Verbose Mode Tracing a Running Process Effects and Issues of Using strace strace and EINTR Real Debugging Examples Reducing Start Up Time by Fixing LD_LIBRARY_PATH The PATH Environment Variable stracing inetd or xinetd (the Super Server) Communication Errors Investigating a Hang Using strace Reverse Engineering (How the strace Tool Itself Works) System Call Tracing Examples Sample Code The System Call Tracing Code Explained Conclusion 88 3 The /proc Filesystem Introduction Process Information /proc/self /proc/<pid> in More Detail /proc/<pid>/cmdline /proc/<pid>/environ /proc/<pid>/mem /proc/<pid>/fd /proc/<pid>/mapped base Kernel Information and Manipulation /proc/cmdline /proc/config.gz or /proc/sys/config.gz /proc/cpufreq /proc/cpuinfo /proc/devices /proc/kcore /proc/locks /proc/meminfo /proc/mm /proc/modules /proc/net /proc/partitions /proc/pci /proc/slabinfo 113
11 Contents xi 3.4 System Information and Manipulation /proc/sys/fs /proc/sys/kernel /proc/sys/vm Conclusion Compiling Introduction The GNU Compiler Collection A Brief History of GCC GCC Version Compatibility Other Compilers Compiling the Linux Kernel Obtaining the Kernel Source Architecture Specific Source Working with Kernel Source Compile Errors General Compilation Problems Assembly Listings Purpose of Assembly Listings Generating Assembly Listings Reading and Understanding an Assembly Listing Compiler Optimizations Conclusion The Stack Introduction A Real-World Analogy Stacks in x86 and x86-64 Architectures What Is a Stack Frame? How Does the Stack Work? The BP and SP Registers Function Calling Conventions Referencing and Modifying Data on the Stack Viewing the Raw Stack in a Debugger Examining the Raw Stack in Detail Homegrown Stack Traceback Function Conclusion The GNU Debugger (GDB) Introduction When to Use a Debugger Command Line Editing 195
12 xii Contents 6.4 Controlling a Process with GDB Running a Program Off the Command Line with GDB Attaching to a Running Process Use a Core File Examining Data, Memory, and Registers Memory Map Stack Examining Memory and Variables Register Dump Execution The Basic Commands Settings for Execution Control Commands Breakpoints Watchpoints Display Expression on Stop Working with Shared Libraries Source Code Assembly Language Tips and Tricks Attaching to a Process Revisited Finding the Address of Variables and Functions Viewing Structures in Executables without Debug Symbols Understanding and Dealing with Endian-ness Working with C Global Constructors and Destructors Inline Functions Exceptions Threads Running Out of Stack Space Data Display Debugger (DDD) The Data Display Window Source Code Window Machine Language Window GDB Console Window Conclusion Linux System Crashes and Hangs Introduction Gathering Information Syslog Explained Setting up a Serial Console 277
13 Contents xiii Connecting the Serial Null-Modem Cable Enabling the Serial Console at Startup Using SysRq Kernel Magic Oops Reports Adding a Manual Kernel Trap Examining an Oops Report Determining the Failing Line of Code Kernel Oopses and Hardware Setting up cscope to Index Kernel Sources Conclusion Kernel Debugging with KDB Introduction Enabling KDB Using KDB Activating KDB Resuming Normal Execution Basic Commands Conclusion ELF: Executable and Linking Format Introduction Concepts and Definitions Symbol Object Files, Shared Libraries, Executables, and Core Files Linking Run Time Linking Program Interpreter / Run Time Linker ELF Header Overview of Segments and Sections Segments and the Program Header Table Text and Data Segments Sections and the Section Header Table String Table Format Symbol Table Format Section Names and Types Relocation and Position Independent Code (PIC) PIC vs. non-pic Relocation and Position Independent Code Relocation and Linking Stripping an ELF Object 371
14 xiv Contents 9.9 Program Interpreter Link Map Symbol Resolution Use of Weak Symbols for Problem Investigations Advanced Interception Using Global Offset Table Source Files ELF APIs Other Information Conclusion 392 A The Toolbox 393 A.1 Introduction 393 A.2 Process Information and Debugging 393 A.2.1 Tool: GDB 393 A.2.2 Tool: ps 393 A.2.3 Tool: strace (system call tracer) 394 A.2.4 Tool: /proc filesystem 394 A.2.5 Tool: DDD (Data Display Debugger) 394 A.2.6 Tool: lsof (List Open Files) 394 A.2.7 Tool: ltrace (library call tracer) 395 A.2.8 Tool: time 395 A.2.9 Tool: top 395 A.2.10 Tool: pstree 396 A.3 Network 396 A.3.1 Tool: traceroute 396 A.3.2 File: /etc/hosts 396 A.3.3 File: /etc/services 396 A.3.4 Tool: netstat 397 A.3.5 Tool: ping 397 A.3.6 Tool: telnet 397 A.3.7 Tool: host/nslookup 397 A.3.8 Tool: ethtool 398 A.3.9 Tool: ethereal 398 A.3.10 File: /etc/nsswitch.conf 398 A.3.11 File: /etc/resolv.conf 398 A.4 System Information 399 A.4.1 Tool: vmstat 399 A.4.2 Tool: iostat 399 A.4.3 Tool: nfsstat 399 A.4.4 Tool: sar 400 A.4.5 Tool: syslogd 400 A.4.6 Tool: dmesg 400
15 Contents xv A.4.7 Tool: mpstat 400 A.4.8 Tool: procinfo 401 A.4.9 Tool: xosview 401 A.5 Files and Object Files 401 A.5.1 Tool: file 401 A.5.2 Tool: ldd 401 A.5.3 Tool: nm 402 A.5.4 Tool: objdump 402 A.5.5 Tool: od 402 A.5.6 Tool: stat 402 A.5.7 Tool: readelf 403 A.5.8 Tool: strings 403 A.6 Kernel 403 A.6.1 Tool: KDB 403 A.6.2 Tool: KGDB 403 A.6.3 Tool: ksymoops 404 A.7 Miscellaneous 404 A.7.1 Tool: VMWare Workstation 404 A.7.2 Tool: VNC Server 405 A.7.3 Tool: VNC Viewer 405 B Data Collection Script 407 B.1 Overview 407 B.1.1 -thorough 409 B.1.2 -perf, -hang <pid>, -trap, -error <cmd> 409 B.2 Running the Script 410 B.3 The Script Source 410 B.4 Disclaimer 419 Index 421
16 About the Authors Mark Wilding is a senior developer at IBM who currently specializes in serviceability technologies, UNIX, and Linux. With over 15 years of experience writing software, Mark has extensive expertise in operating systems, networks, C/C++ development, serviceability, quality engineering, and computer hardware. Dan Behman is a member of the DB2 UDB for Linux Platform Exploitation development team at the Toronto IBM Software Lab. He has over 10 years of experience with Linux, and has been involved in porting and enabling DB2 UDB on the latest architectures that Linux supports, including x86-64, zseries, and POWER platforms.
17 Preface xvii Preface Linux is the ultimate choice for home and business users. It is powerful, as stable as any commercial operating system, secure, and best of all, it is open source. One of the biggest deciding factors for whether to use Linux at home or for your business can be service and support. Because Linux is developed by thousands of volunteers from around the world, it is not always clear who to turn to when something goes wrong. In the true spirit of Linux, there is a slightly different approach to support than the commercial norm. After all, Linux represents an unparalleled community of experts, it includes industry leading problem determination tools, and of course, the product itself includes the source code. These resources are in addition to the professional Linux support services that are available from companies, such as IBM, and the various Linux vendors, such as Redhat and SUSE. Making the most of these additional resources is called self-service and is the main topic covered by this book. Self-service on Linux means different things to different people. For those who use Linux at home, it means a more enjoyable Linux experience. For those
18 xviii Preface who use Linux at work, being able to quickly and effectively diagnose problems on Linux can increase their value as employees as well as their marketability. For corporate leaders deciding whether to adopt Linux as part of the corporate strategy, self-service for Linux means reduced operation costs and increased Return on Investment (ROI) for any Linux adoption strategy. Regardless of what type of Linux user you are, it is important to make the most of your Linux experience and investment. WHAT IS THIS BOOK ABOUT? In a nutshell, this book is about effectively and efficiently diagnosing problems that occur in the Linux environment. It covers good investigation practices, how to use the information and resources on the Internet, and then dives right into detail describing how to use the most important problem determination tools that Linux has to offer. Chapter 1 is like a crash course on effective problem determination practices, which will help you to diagnose problems like an expert. It covers where and how to look for information on the Internet as well as how to start investigating common types of problems. Chapter 2 covers strace, which is arguably the most frequently used problem determination tool in Linux. This chapter includes both practical usage information as well as details about how strace works. It also includes source code for a simple strace tool and details about how the underlying functionality works with the kernel through the ptrace interface. Chapter 3 is about the /proc filesystem, which contains a wealth of information about the hardware, kernel, and processes that are running on the system. The purpose of this chapter is to point out and examine some of the more advanced features and tricks primarily related to problem determination and system diagnosis. For example, the chapter covers how to use the SysRq Kernel Magic hotkey with /proc/sys/kernel/sysrq. Chapter 4 provides detailed information about compiling. Why does a book about debugging on Linux include a chapter about compiling? Well, the beginning of this preface mentioned that diagnosing problems in Linux is different than that on commercial environments. The main reason behind this is that the source code is freely available for all of the open source tools and the operating system itself. This chapter provides vital information whether you need to recompile an open source application with debug information (as is often the case), whether you need to generate an assembly language listing for a tough problem (that is, to find the line of code for a trap), or whether you run into a problem while recompiling the Linux kernel itself.
19 Preface xix Chapter 5 covers intimate details about the stack, one of the most important and fundamental concepts of a computer system. Besides explaining all the gory details about the structure of a stack (which is pretty much required knowledge for any Linux expert), the chapter also includes and explains source code that can be used by the readers to generate stack traces from within their own tools and applications. The code examples are not only useful to illustrate how the stack works but they can save real time and debugging effort when included as part of an application s debugging facilities. Chapter 6 takes an in-depth and detailed look at debugging applications with the GNU Debugger (GDB) and includes an overview of the Data Display Debugger (DDD) graphical user interface. Linux has an advantage over most other operating systems in that it includes a feature rich debugger, GDB, for free. Debuggers can be used to debug many types of problems, and given that GDB is free, it is well worth the effort to understand the basic as well as the more advanced features. This chapter covers hard-to-find details about debugging C++ applications, threaded applications, as well as numerous best practices. Have you ever spawned an xterm to attach to a process with GDB? This chapter will show you how and why! Chapter 7 provides a detailed overview of system crashes and hangs. With proprietary operating systems (OSs), a system crash or hang almost certainly requires you to call the OS vendor for help. However with Linux, the end user can debug a kernel problem on his or her own or at least identify key information to search for known problems. If you do need to get an expert involved, knowing what to collect will help you to get the right data quickly for a fast diagnosis. This chapter describes everything from how to attach a serial console to how to find the line of code for a kernel trap (an oops ). For example, the chapter provides step-by-step details for how to manually add a trap in the kernel and then debug it to find the resulting line of code. Chapter 8 covers more details about debugging the kernel or debugging with the kernel debugger, kdb. The chapter covers how to configure and enable kdb on your system as well as some practical commands that most Linux users can use without being a kernel expert. For example, this chapter shows you how to find out what a process is doing from within the kernel, which can be particularly useful if the process is hung and not killable. Chapter 9 is a detailed, head-on look at Executable and Linking Format (ELF). The details behind ELF are often ignored or just assumed to work. This is really unfortunate because a thorough understanding of ELF can lead to a whole new world of debugging techniques. This chapter covers intimate but practical details of the underlying ELF file format as well as tips and tricks that few people know. There is even sample code and step-by-step instructions
20 xx Preface for how to override functions using LD_PRELOAD and how to use the global offset table and the GDB debugger to intercept functions manually and redirect them to debug versions. Appendix A is a toolbox that outlines the most useful tools, facilities, and files on Linux. For each tool, there is a description of when it is useful and where to get the latest copy. Appendix B includes a production-ready data collection script that is especially useful for mission-critical systems or those who remotely support customers on Linux. The data collection script alone can save many hours or even days for debugging a remote problem. Note: The source code used in this book can be found at Note: A code continuation character,, appears at the beginning of code lines that have wrapped down from the line above it. Lastly, as we wrote this book it became clear to us that we were covering the right information. Reviewers often commented about how they were able to use the information immediately to solve real problems, not the problems that may come in the future or may have happened in the past, but real problems that people were actually struggling with when they reviewed the chapters. We also found ourselves referring to the content of the book to help solve problems as they came up. We hope you find it as useful as it has been to those who have read it thus far. WHO IS THIS BOOK FOR? This book has useful information for any Linux user but is certainly geared more toward the Linux professional. This includes Linux power users, Linux administrators, developers who write software for Linux, and support staff who support products on Linux. Readers who casually use Linux at home will benefit also, as long as they either have a basic understanding of Linux or are at least willing to learn more about it the latter being most important. Ultimately, as Linux increases in popularity, there are many seasoned experts who are facing the challenge of translating their knowledge and experience to the Linux platform. Many are already experts with one or more operating systems except that they lack specific knowledge about the various command line incantations or ways to interpret their knowledge for Linux.
21 Preface xxi This book will help such experts to quickly adapt their existing skill set and apply it effectively on Linux. This power-packed book contains real industry experience on many topics and very hard-to-find information. Without a doubt, it is a must have for any developer, tester, support analyst, or anyone who uses Linux. ACKNOWLEDGMENTS Anyone who has written a book will agree that it takes an enormous amount of effort. Yes, there is a lot of work for the authors, but without the many key people behind the scenes, writing a book would be nearly impossible. We would like to thank all of the people who reviewed, supported, contributed, or otherwise made this book possible. First, we would like to thank the reviewers for their time, patience, and valuable feedback. Besides the typos, grammatical errors, and technical omissions, in many cases the reviewers allowed us to see other vantage points, which in turn helped to make the content more well-rounded and complete. In particular, we would like to thank Richard Moore, for reviewing the technical content of many chapters; Robert Haskins, for being so thorough with his reviews and comments; Mel Gorman, for his valuable feedback on the ELF (Executable and Linking Format) chapter; Scott Dier, for his many valuable comments; Jan Kritter, for reviewing pretty much the entire book; and Joyce Coleman, Ananth Narayan, Pascale Stephenson, Ben Elliston, Hien Nguyen, Jim Keniston, as well as the IBM Linux Technology Center, for their valuable feedback. We would also like to thank the excellent engineers from SUSE for helping to answer many deep technical questions, especially Andi Kleen, Frank Balzer, and Michael Matz. We would especially like to thank our wives and families for the support, encouragement, and giving us the time to work on this project. Without their support, this book would have never gotten past the casual conversation we had about possibly writing one many months ago. We truly appreciate the sacrifices that they have made to allow us to finish this book. Last of all, we would like to thank the Open Source Community as a whole. The open source movement is a truly remarkable phenomenon that has and will continue to raise the bar for computing at home or for commercial environments. Our thanks to the Open Source Community is not specifically for this book but rather for their tireless dedication and technical prowess that make Linux and all open source products a reality. It is our hope that the content in this book will encourage others to adopt, use or support open source products and of course Linux. Every little bit helps. Thanks for reading this book.
22 xxii Preface OTHER The history and evolution of the Linux operating system is fascinating and certainly still being written with new twists popping up all the time. Linux itself comprises only the kernel of the whole operating system. Granted, this is the single most important part, but everything else surrounding the Linux kernel is made up mostly of GNU free software. There are two major things that GNU software and the Linux kernel have in common. The first is that the source code for both is freely accessible. The second is that they have been developed and continue to be developed by many thousands of volunteers throughout the world, all connecting and sharing ideas and work through the Internet. Many refer to this collaboration of people and resources as the Open Source Community. The Open Source Community is much like a distributed development team with skills and experience spanning many different areas of computer science. The source code that is written by the Open Source Community is available for anyone and everyone to see. Not only can this make problem determination easier, having such a large and diverse group of people looking at the code can reduce the number of defects and improve the security of the source code. Open source software is open to innovations as much as criticism, both helping to improve the quality and functionality of the software. One of the most common concerns about adopting Linux is service and support. However, Linux has the Open Source Community, a wide range of freely available problem determination tools, the source code, and the Internet itself as a source of information including numerous sites and newsgroups dedicated to Linux. It is important for every Linux user to understand the resources and tools that are available to help them diagnose problems. That is the purpose of this book. It is not intended to be a replacement to a support contract, nor does it require one. If you have one, this book is an enhancement that will be sure to help you make the most of your existing support contract.
23 C H A P T E R 1 Best Practices and Initial Investigation 1.1 INTRODUCTION Your boss is screaming, your customers are screaming, you re screaming Whatever the situation, there is a problem, and you need to solve it. Remember those old classic MUD games? For those who don t, a Multi-User Dungeon or MUD was the earliest incarnation of the online video game. Users played the game through a completely non-graphical text interface that described the surroundings and options available to the player and then prompted the user with what to do next. You are alone in a dark cubicle. To the North is your boss s office, to the West is your Team Lead s cubicle, to the East is a window opening out to a five-floor drop, and to the South is a kitchenette containing a freshly brewed pot of coffee. You stare at your computer screen in bewilderment as the phone rings for the fifth time in as many minutes indicating that your users are unable to connect to their server. Command> What will you do? Will you run toward the East and dive through the open window? Will you go grab a hot cup of coffee to ensure you stay alert for the long night ahead? A common thing to do in these MUD games was to examine your surroundings further, usually done by the look command. Command> look Your cubicle is a mess of papers and old coffee cups. The message waiting light on your phone is burnt out from flashing for so many months. Your inbox is overflowing with unanswered s. On top of the mess is the brand new book you ordered entitled Self-Service Linux. You need a shower. Command> read book Self-Service Linux You still need a shower. 1
24 2 Best Practices and Initial Investigation Chap. 1 This tongue-in-cheek MUD analogy aside, what can this book really do for you? This book includes chapters that are loaded with useful information to help you diagnose problems quickly and effectively. This first chapter covers best practices for problem determination and points to the more in-depth information found in the chapters throughout this book. The first step is to ensure that your Linux system(s) are configured for effective problem determination. 1.2 GETTING YOUR SYSTEM(S) READY FOR EFFECTIVE PROBLEM DETERMINATION The Linux problem determination tools and facilities are free, which begs the question: Why not install them? Without these tools, a simple problem can turn into a long and painful ordeal that can affect a business and/or your personal time. Before reading through the rest of the book, take some time to make sure the following tools are installed on your system(s). These tools are just waiting to make your life easier and/or your business more productive: strace: The strace tool traces the system calls, special functions that interact with the operating system. You can use this for many types of problems, especially those that relate to the operating system. ltrace: The ltrace tool traces the functions that a process calls. This is similar to strace, but the called functions provide more detail. lsof: The lsof tool lists all of the open files on the operating system (OS). When a file is open, the OS returns a numeric file descriptor to the process to use. This tool lists all of the open files on the OS with their respective process IDs and file descriptors. top: This tool lists the top processes that are running on the system. By default it sorts by the amount of current CPU being consumed by a process. traceroute/tcptraceroute: These tools can be used to trace a network route (or at least one direction of it). ping: Ping simply checks whether a remote system can respond. Sometimes firewalls block the network packets ping uses, but it is still very useful.