Maintenance and Technical Support Technical Support Competence Center z/os Unix System Services Dumps - Dump Debugging for Dummies Matthias Korn z/os Virtual Frontend / Unix System Services EMEA Level 2 IBM Deutschland GmbH korn@de.ibm.com 99. z/os Guide Lahnstein 16.März 2011 2011 IBM Corporation
What are we talking about today? The two categories of dumps How to capture an unformatted dump IPCS powerful tool to read unformatted dumps IPCS First steps to navigate IPCS Next steps to navigate IPCS Useful general commands to gather information BPXI070E at shut down Finding the root using a SLIP dump Hiper Apar OA34226 What does a dump show in this case? OMVS Debug HTML Update 2 z/os Unix System Services Dump Debugging 15. Mär 2011
The two categories of dumps There are two categories of dumps: Formatted dumps SYSABEND, SYSUDUMP, SNAP dumps Unformatted dumps SVC dumps, SYSMDUMP abend dumps, stand-alone dumps 3 z/os Unix System Services Dump Debugging 15. Mär 2011
How to capture an unformatted dump System abends i.e. AbendEC6, abend0c4, abend878 dump captured by recovery routines Slip i.e. reason code slip trap under USS slip processing gets control due to the defined conditions slip schedules SVC dump, captures trace records dynamic dump i.e. console dump dump captured via DUMP command no trigger necessary used for permanent situations and comparisons 4 z/os Unix System Services Dump Debugging 15. Mär 2011
How to capture an unformatted dump (cont.) SADUMP program standalone dump program loaded as part of a restart SADUMP captured in hang / loop situations SYSMDUMP DD card dump captured in connection with LE runtime options such as TER(UADUMP), ABT(ABEND), TRAP(ON) RECFM=FBS, LRECL=4160 5 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS powerful tool to read unformatted dumps problem state key 8 program running in TSO/E users address space operates interactively and in batch environments a TSO/E command processor is the base of IPCS TSO/E 'IPCS' command activates the IPCS command processor all commands to perform IPCS functions are sub-commands of the IPCS command for interactive use, IPCS uses ISPF dialog support to run as a full screen application this application uses the IPCS command processor 6 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS (cont.) helps you to format and read component traces, GTF traces format and analyze unformatted dumps Format and display control blocks is able to identify jobs with error return codes resource contentions control block overlays 7 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS First steps to navigate What kind of dump do we have? What was the dump written for? Which slip trap caused the dump to be captured? 8 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS Primary Option Menu 9 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS Selecting the source 10 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS STATUS (IP ST) 11 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS LIST SLIPTRAP (IP L SLIPTRAP) 12 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS Next steps to navigate Which address spaces have been dumped? What are the corresponding jobnames? Has the dump completely been written or is it partial? 13 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS CBF RTCT (IP CBF RTCT) F ASTB 14 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS SELECT ALL 15 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS LIST E0. LENGTH(16) BLOCK(0) Lists the SDRSN SDUMP PARTIAL DUMP REASON CODE control block If all requested bytes are x'0', the dump is complete. Otherwise SDRSN control block in z/os MVS Data Areas Volume 5 (MCSCSA SNAPX) needs to be reviewed for the actual reason. 16 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS Useful general commands Which trace data are available? Does any resource contention exist? How many real storage is available / in use? Which events (abends) have been logged? What can be determined about OMVS? 17 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS VERBX MTRACE The MTRACE verb exit displays the master trace table which corresponds to the syslog of your image. The status of it can be determined via 'D TRACE' and changed via 'TRACE MT' operator command. 18 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS SYSTRACE ASID(1) TIME(LOCAL) The SYSTRACE IPCS command displays the system trace table and formats system trace entries for each address space. The status of it can be determined via 'D TRACE' and changed via 'TRACE ST' operator command. 19 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS ANALYZE RESOURCE Shows contentions against system resources such as OMVS latches 20 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS RSMDATA SUMMARY Shows real storage definitions and utilization 21 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS VERBX LOGDATA Shows the instorage logrec buffers. It invokes the EREP program to format the logrec records. 22 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA Formats OMVS relevant information about processes, threads, files and file systems managed by OMVS and serviced by HFS, ZFS, NFS, TFS. The dump needs to contain the OMVS address space and OMVS data spaces Options: IP OMVSDATA IP OMVSDATA PROCESS IP OMVSDATA FILE IP OMVSDATA STORAGE IP OMVSDATA IPC IP OMVSDATA COMMUNICATION Report Types: SUMMARY DETAIL EXCEPTION 23 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA PROCESS Displays a Unix System Services process summary report including PID, associated user ID, ASID, parent process ID and status (i.e. zombie). 24 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA PROCESS DETAIL Displays a detailed report about each process dubbed to Unix System Services including its different threads (TCBs), active system calls, open file descriptors and sent / received sysplex work. 25 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA PROCESS DETAIL (cont.) Displays a detailed report about each process dubbed to Unix System Services including its different threads (TCBs), active system calls, open file descriptors and sent / received sysplex work. 26 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA PROCESS DETAIL (cont.) 27 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE Displays a report of all mounted file systems known to that system the dump was taken for including file system name, mount point, latch number, token to internal control blocks representing the file system. 28 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE DETAIL Displays a report of all active files in the system. An active file is either open or has recently been referenced. The 'File Serial Number' and the 'Device Number' uniquely identify a file (directory, regular file, character special, FIFO, symbolic link). 29 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA STORAGE Displays a report of all active cell pools in use by z/os Unix. The report contains information about common storage and data space resident cell pools as well as private storage resident cell pools. 30 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS CTRACE COMP(SYSOMVS) FULL LOCAL Formats out the OMVS component trace. The trace data reside in SYSZBPX1 data space, which makes it necessary to always include the OMVS dataspaces into a dump. The trace is at least active in MINIMUM mode for OMVS related problems it is always recommended to activate the trace. For details see the USS Diagnosis HTML file. 31 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS CTRACE QUERY(SYSOMVS) FULL LOCAL Displays the status of the OMVS ctrace at the time when the dump was captured. 32 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down using a slip dump Symptoms: *BPXI066E OMVS SHUTDOWN COULD NOT MOVE OR UNMOUNT ALL FILE SYSTEMS BPXM054I FILE SYSTEM OMVS.ETC.MSYX FAILED TO UNMOUNT. RET CODE = 00000072, RSN CODE = 058800AA BPXM054I FILE SYSTEM SYS1.ROOT.MSYX.OMVSSIDA FAILED TO UNMOUNT. RET CODE = 00000072, RSN CODE = 058800AA *195 BPXI070E USE SETOMVS ON ANOTHER SYSTEM TO MOVE NEEDED FILE SYSTEMS, THEN REPLY WITH ANY KEY TO CONTINUE SHUTDOWN 33 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down (cont.) TSO BPXMTEXT 058800AA BPXFSUMT 03/05/08 JRFsParentFs: The file system has file systems mounted on it. Action: An unmount request can be honored only if there are no file systems mounted anywhere on the requested file system. Use the F BPXOINIT,FILESYS=DISPLAY,ALL command for a shared file system configuration or the D OMVS,FILE command for a non-shared file system configuration to determine which file systems are mounted on the requested file system. Unmount them before retrying this request. 34 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down (cont.) SLIP SET,IF,A=SYNCSVCD,RANGE=(10?+8C?+F0?+1F4?), DATA=(13R??+1B0,EQ,058800AA),DSPNAME=('OMVS'.*), SDATA=(ALLNUC,PSA,CSA,LPA,TRT,SQA,LSQA,RGN,SUM), JL=OMVS,AL=(H,P,S,CU),END 35 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS LIST SLIPTRAP (IP L SLIPTRAP) 36 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE F OMVS.ETC.MSYX 37 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE F '/MSYX/etc' 38 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE F '/MSYX/etc' 39 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS OMVSDATA FILE F '/MSYX/etc' 40 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down Conclusions File system SYS1.ROOT.MSYX.OMVSSIDA mounted at /MSYX failed to unmount because of OMVS.ETC.MSYX still mounted at /MSYX/etc both file systems are owned by system number 02 OMVS.ETC.MSYX failed to unmount because of: OMVS.CRON.MSYX mounted at /MSYX/etc/cron OMVS.SPOOL.CRONLOG.MSYX mounted at /MSYX/etc/spool/cron/cronlog OMVS.SPOOL.MSYX mounted at /MSYX/etc/spool all 3 file systems are remotely owned by system 04 Who are systems 02 and 04? 41 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS BPXWNXMB Formats out the NXMB control block which represents the OMVS XCF group members table Checks if the system is a member of a shared file system environment Gives back information about all members, their state, system name and number as well as the active BPXMCDS couple data set definitions 42 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS BPXWNXMB 43 z/os Unix System Services Dump Debugging 15. Mär 2011
IPCS BPXWNXMB 44 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down Conclusions File systems: OMVS.CRON.MSYX mounted at /MSYX/etc/cron OMVS.SPOOL.CRONLOG.MSYX mounted at /MSYX/etc/spool/cron/cronlog OMVS.SPOOL.MSYX mounted at /MSYX/etc/spool are remotely owned by system MSYS while their parent file system is owned by system MSYX. Due to an unknown reason the ownership has changed. Questions: When has the change occurred? What are the AUTOMOVE settings for these 3 file systems? 45 z/os Unix System Services Dump Debugging 15. Mär 2011
BPXI070E at shut down Conclusions Answers: An internal control block contains a time stamp when the owner of the file system changed the last time. The slip matched at shut down at 06:57:05.980519 local time. The last owner change happened at 06:56:48.120832 local time / same day. These file systems are mounted with AUTOMOVE=Y while the parent is mounted AUTOMOVE=U. 46 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 ORPHANED PPRA SIGNAL LATCHES *MASTER* MEMTERM ABEND0C4 BPXPRTRM SYS.BPX.AP00.PRTB1.PPRA.LSN Shut down of a system (SYS1) in a shared file system environment Latch contention on a different system (SYS2) Reinitialization of SYS1 into the shared file system environment impossible due to latch contention on SYS2 SYS2 performed MemberGoneRecovery for SYS1 contention on the mount latch due to an orphaned PPRA latch 'D OMVS,W' command just shows mount latch activity dump necessary 47 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 D OMVS,W BPXO063I 01.46.02 DISPLAY OMVS 886 OMVS 0010 ACTIVE OMVS=(A0,00,R0,A1) MOUNT LATCH ACTIVITY: USER ASID TCB REASON AGE ------------------------------------------------------------- HOLDER: OMVS 0010 009FC3E8 MemberGone Rcvry 00.00.15 IS DOING: BRLM Wait <----------- misleading! FILE SYSTEM: OESYS.WILY.PRODPLEX.INTRO810.ZFS WAITER(S): OMVS 0010 009A0160 FileSys Unmount 00.00.03 48 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 D GRS,C D GRS,C ISG343I 05.30.00 GRS STATUS LATCH SET NAME: SYS.BPX.AP00.PRTB1.PPRA.LSN CREATOR JOBNAME: OMVS CREATOR ASID: 0010 LATCH NUMBER: 2056 REQUESTOR ASID EXC/SHR OWN/WAIT WORKUNIT TCB ELAPSED *MASTER* 0001 EXCLUSIVE OWN 009DBE88 Y 16:53:59 OMVS 0010 SHARED WAIT 009FC3E8 Y 03:44:12 LATCH SET NAME: SYS.BPX.A000.FSLIT.FILESYS.LSN CREATOR JOBNAME: OMVS CREATOR ASID: 0010 LATCH NUMBER: 2 REQUESTOR ASID EXC/SHR OWN/WAIT WORKUNIT TCB ELAPSED OMVS 0010 EXCLUSIVE OWN 009FC3E8 Y 03:44:12 OMVS 0010 EXCLUSIVE WAIT 009A0160 Y 03:44:00 OMVS 0010 EXCLUSIVE WAIT 009D04E0 Y 03:38:42 49 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? IPCS ANALYZE RESOURCE RESOURCE #0012: NAME=SYS.BPX.A000.FSLIT.FILESYS.LSN ASID=0010 Latch#=2 RESOURCE #0012 IS HELD BY: JOBNAME=OMVS ASID=0010 TCB=009FC3E8 DATA=EXCLUSIVE RETADDR=BD24A324 REQID=001000003D011540 RESOURCE #0012 IS REQUIRED BY: JOBNAME=OMVS ASID=0010 TCB=009A0160 DATA=EXCLUSIVE RETADDR=BD28CD70 REQID=001000001976B8D0 50 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? IPCS ANALYZE RESOURCE (cont.) RESOURCE #0011: NAME=SYS.BPX.AP00.PRTB1.PPRA.LSN ASID=0010 Latch#=2056 RESOURCE #0011 IS HELD BY: JOBNAME=*MASTER* ASID=0001 TCB=009DBE88 DATA=EXCLUSIVE RETADDR=BD421A06 REQID=01E4080841AE2300 RESOURCE #0011 IS REQUIRED BY: JOBNAME=OMVS DATA=SHARED ASID=0010 TCB=009FC3E8 RETADDR=BD421B3A REQID=001000003D011540 51 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? latch is represented by a LQE (Latch Queue Element) within a latch set (LSET). LSET and LQE live in the creators private storage (OMVS) LQE contains a time stamp when the latch was obtained 52 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? 53 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? IPCS LTOD formats out TOD (time of day) stamps IPCS LTOD C7486B4077C11780 Shows, the latch was obtained on 4 th of February 2011, while the contention was reported and the dump taken on the 20th. Why it wasn't released? What happened to the holder? 54 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? CTRACE COMP(SYSOMVS) LOCAL FULL OPTIONS((EXCEPTION)) gathers exceptional information that are written to a different ctrace buffer OMVS ctrace does not need to be switched on shows that the TCB in MASTER address space abended at the time when the latch was obtained. OMVS recovery routines did not release the latch latch got into an orphaned state 55 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 What shows the dump? CTRACE COMP(SYSOMVS) LOCAL FULL OPTIONS((EXCEPTION)) F '02/04/2011' F 9DBE88 56 z/os Unix System Services Dump Debugging 15. Mär 2011
Hiper Apar OA34226 Conclusions USS recovery routine BPXPRTRM was redesigned to ensure latches are released if itself abends during recovery / memory / process termination a dump is always necessary to decide whether the latch is orphaned a latch purge tool is available and can be sent out on demand can avoid an ipl CALLRTM can be tried as well Cannot be made available in general because of data integrity reasons new message BPXM123E is issued if a latch is held by a single task for more than 5 minutes (starting with z/os 1.12) would in this special case point to the PPRA latch held before the contention due to MemberGoneRecovery at a scheduled IPL 57 z/os Unix System Services Dump Debugging 15. Mär 2011
Almost done... Any wishes with regards to topics for the next guide? Any concerns / questions? Thank you for your attention! 58 z/os Unix System Services Dump Debugging 15. Mär 2011