[LU-12461] Contribute epython scripts to aid crash dump analysis Created: 20/Jun/19  Updated: 31/Jul/20  Resolved: 14/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: New Feature Priority: Minor
Reporter: Ann Koehler (Inactive) Assignee: Ann Koehler (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13676 script to show unique backtraces from... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

PyKdump is an open source framework for supporting Python scripting within the Linux crash tool. Cray has been using it quite productively for quite some time and would like to contribute the scripts it has developed for extracting Lustre structs from memory dumps. The scripts were written primarily for client dumps but can be easily extended to support server dumps. The differences are in the Lustre data structures between clients and servers; there's nothing inherently different about the scripting.

Installation instructions for PyKdump are available at:
https://sourceforge.net/p/pykdump/wiki/Home/

The scripts being provided were written for Python2.7. The site above includes documentation for converting to Python3.3.

The following is a summary of the scripts being contributed:
Summary of scripts:

  • cfs_hashes.py Displays summary of cfs_hash tables.
  • cfs_hnodes.py Displays the specified Lustre hash table.
  • debug_flags.py Prints Lustre libcfs_debug flags as strings.
  • dk.py Dumps and sorts the Lustre dk logs.
  • jiffies2date.py Prints the date and time of a given jiffies timestamp.
  • ldlm_dumplocks.py Lists granted and waiting locks by namespace/resource.
  • ldlm_lockflags.py Prints string identifiers for specified LDLM flags.
  • lu_object.py Prints contents of an lu_object.
  • lustre_opcode.py Maps Lustre rpc opcodes to string identifiers.
  • obd_devs.py Displays the contents of global 'obd_devs'.
  • ptlrpc.py Displays the RPC queues of the Lustre ptlrpcd daemons.
  • rpc_stats.py Dumps the client_obd structure given by client argument.
  • sbi_ptrs.py Prints Lustre structs associated with inode.
  • uniqueStacktrace.py Prints stack traces for each task.

The scripts require symbols from the Lustre and LNet modules to be loaded
(mod command in crash). A script is invoked with the command
"epython <script name>" followed by any parameters. To get usage information
for a particular script, enter the following at the crash prompt:
epython <script_name> -h



 Comments   
Comment by Gerrit Updater [ 20/Jun/19 ]

Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/35282
Subject: LU-12461 contrib: Add epython scripts for crash dump analysis
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 17c2675594f2f5b27c40afe4b26154eea448de9f

Comment by Oleg Drokin [ 22/Jun/19 ]

Thanks for contributing these!

Now a question: what do you mean server and client have different data structures? This should not really be happening, so can you please elaborate?

Comment by Ann Koehler (Inactive) [ 24/Jun/19 ]

For example, a ptlrpc_request contains a union with info specific to a client request and separate info specific to the server. The ptlrpc script assumes that the ptlrpc_request contains the rq_cli fields not the rq_srv fields, so if you use the script on a struct with rq_srv fields the script will print garbage for some of the fields.

Another example is the cfs_hashes script. It only summarizes hash tables that are defined on clients. It doesn't show hash tables that are defined only on servers. Could easily be extended. I just don't do a lot of analysis of server dumps so never had a need to do so.

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35282/
Subject: LU-12461 contrib: Add epython scripts for crash dump analysis
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4249b02f5c4c8a14faa0b88479b8eac75b212617

Comment by Peter Jones [ 14/Feb/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:52:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.