[LU-6089] qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed Created: 07/Jan/15  Updated: 09/Oct/21  Resolved: 09/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5242 Test hang sanity test_132, test_133: ... Resolved
is related to LU-5331 qsd_handler.c:1139:qsd_op_adjust()) A... Resolved
Severity: 3
Rank (Obsolete): 16950

 Description   

Had this crash happen on the tip of master as of yesterday running test 132 of sanity.sh:

Lustre: 3948:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1420636700/re
al 1420636700]  req@ffff8800290ba380 x1489642625567436/t0(0) o250->MGC192.168.20.154@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1420636706 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3948:0:(client.c:1942:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
LustreError: 39:0:(qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed: 
LustreError: 39:0:(qsd_handler.c:1139:qsd_op_adjust()) LBUG
Pid: 39, comm: kswapd0

Call Trace:
[<ffffffffa0779895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0779e97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0f25028>] qsd_op_adjust+0x478/0x580 [lquota]
[<ffffffffa100a597>] osd_object_delete+0x217/0x2f0 [osd_ldiskfs]
[<ffffffffa091e0c1>] lu_object_free+0x81/0x1a0 [obdclass]
[<ffffffffa091f167>] lu_site_purge+0x2e7/0x4e0 [obdclass]
[<ffffffffa091f4e8>] lu_cache_shrink+0x188/0x310 [obdclass]
[<ffffffff81138dba>] shrink_slab+0x12a/0x1a0
[<ffffffff8113c0da>] balance_pgdat+0x59a/0x820
[<ffffffff8113c494>] kswapd+0x134/0x3b0


 Comments   
Comment by Andreas Dilger [ 07/Jan/15 ]

This has the same symptoms of LU-5331, but I'm running the latest master (v2_6_92_0-14-g37145b3 + http://review.whamcloud.com/12515). I'll try reverting that patch to see if it fixes the problem.

Comment by Andreas Dilger [ 05/Feb/15 ]

Hit a very similar crash again in osd_object_delete->qsd_op_adjust() when unmounting in sanity.sh test_65j and I am again testing http://review.whamcloud.com/12515 from LU-5242. It looks like I restarted another test after my previous run that passed sanity.sh but I don't recall for sure if I reverted the patch at that time. I'll need to test without this patch again.

Comment by Andreas Dilger [ 06/Feb/15 ]

I reverted the 12515 patch, and while I've observed the original LU-5242 problem of not being able to unmount/remount the filesystem quickly, I haven't had any crashes, when I was previously 2-of-2 in sanity.sh while the patch was applied.

Comment by Andreas Dilger [ 11/Feb/15 ]

I've now hit this once while testing http://review.whamcloud.com/11258 on master instead of the 12515 patch, in a thread doing memory reclaim during an unmount operation, though it wasn't a thread involved in the unmount process:

Lustre: Failing over testfs-OST0000
Lustre: server umount testfs-OST0000 complete
Lustre: Failing over testfs-OST0001
general protection fault: 0000 [#1] SMP 
Pid: 2170, comm: java Tainted: P---------------    2.6.32-431.29.2.el6_lustre.g36cd22b.x86_6
RIP: 0010:[<ffffffffa07a4cf6>] [<ffffffffa07a4cf6>] qsd_op_adjust+0xb6/0x580 [lq
uota]
Process java (pid: 2170, threadinfo ffff8800d00ce000, task ffff880037c61540)
Call Trace:
osd_object_delete+0x217/0x2f0 [osd_ldiskfs]
lu_object_free+0x81/0x1a0 [obdclass]
lu_site_purge+0x2e7/0x4e0 [obdclass]
lu_cache_shrink+0x188/0x310 [obdclass]
shrink_slab+0x12a/0x1a0
do_try_to_free_pages+0x3f7/0x610
try_to_free_pages+0x92/0x120
__alloc_pages_nodemask+0x47e/0x8d0
kmem_getpages+0x62/0x170
fallback_alloc+0x1ba/0x270
____cache_alloc_node+0x99/0x160
user_path_parent+0x31/0x80
sys_renameat+0xb8/0x3a0
sys_rename+0x1b/0x20

I've also gone back and retested the latest version of 12515 and not hit this problem in sanity as I had twice before.

Generated at Sat Feb 10 01:57:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.