[LU-914] Client panic on ptlrpc_free_req() Created: 12/Dec/11  Updated: 07/Apr/12  Resolved: 05/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Marek Magrys Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Lustre 2.1RC2 on servers, mix of 2.1 and 2.1RC2 on clients.
OS is SL5, with many kernels including 2.6.18-274.12.1.el5


Attachments: Text File panics.txt    
Severity: 3
Rank (Obsolete): 6506

 Description   

Some of our clients die with:
5645:0:(client.c:2089:__ptlrpc_free_req()) LBUG

We do not have any reproducer and probably we won't have it anyway as the LBUG is caused by many kinds of binaries. I've attached some panic logs from three clients, we had at least 5 crashes caused by this bug for now.
Could you please have a look?



 Comments   
Comment by Oleg Drokin [ 08/Feb/12 ]

hm, is this still an issue for you?

Can you reproduce with dlmtrace and dentry debug levels added and collect a lustre debug log please?

Comment by Marek Magrys [ 08/Feb/12 ]

The issue didn't hit us for a long time now (since we've opened the ticket), so I guess you can close it for now and if it strikes back again I'll ask for reopen. We don't have any reproducer for this, so for now I think we cannot do anything here.

Comment by Andreas Dilger [ 05/Apr/12 ]

Closing per last comment that it cannot be reproduced.

Comment by Marek Magrys [ 06/Apr/12 ]

Today it striked back on our login node:
Apr 6 12:18:00 ui kernel: LustreError: 4401:0:(client.c:2106:__ptlrpc_free_req()) ASSERTION(!request->rq_replay) failed: req ffff81014444fc00
Apr 6 12:18:00 ui kernel: LustreError: 4401:0:(client.c:2106:__ptlrpc_free_req()) LBUG

Servers are on 2.1, clients on 2.1.1, both Scientific Linux 5.

One of our users claims, that his 'grep' might have caused the crash, which would be more than odd. However I'm not sure if you should reopen this bug, as we still don't have any reproducer.

Comment by Oleg Drokin [ 06/Apr/12 ]

Do you have kernel crashdumping installed and setup? Can you print the request content and backtrace?

Comment by Marek Magrys [ 07/Apr/12 ]

No we don't, but we'll enable crashdumps on our login node. I don't have any detailed logs, as for some reason there's no /tmp/lustre-log file. I will try to set some more verbose debugging options here, do you have any tips on how to obtain as much information as possible, without heavily affecting the performance?

Comment by Oleg Drokin [ 07/Apr/12 ]

Unfortunately extensive debug will slow things down.

But having reliably working crashdumps is always a very good idea and will not affect your performance (other than eating a bit of RAM for the crash kernel image).

Generated at Sat Feb 10 01:11:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.