[LU-6089] qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed Created: 07/Jan/15 Updated: 09/Oct/21 Resolved: 09/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 16950 | ||||||||||||
| Description |
|
Had this crash happen on the tip of master as of yesterday running test 132 of sanity.sh: Lustre: 3948:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1420636700/re al 1420636700] req@ffff8800290ba380 x1489642625567436/t0(0) o250->MGC192.168.20.154@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1420636706 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 3948:0:(client.c:1942:ptlrpc_expire_one_request()) Skipped 2 previous similar messages LustreError: 39:0:(qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed: LustreError: 39:0:(qsd_handler.c:1139:qsd_op_adjust()) LBUG Pid: 39, comm: kswapd0 Call Trace: [<ffffffffa0779895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0779e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f25028>] qsd_op_adjust+0x478/0x580 [lquota] [<ffffffffa100a597>] osd_object_delete+0x217/0x2f0 [osd_ldiskfs] [<ffffffffa091e0c1>] lu_object_free+0x81/0x1a0 [obdclass] [<ffffffffa091f167>] lu_site_purge+0x2e7/0x4e0 [obdclass] [<ffffffffa091f4e8>] lu_cache_shrink+0x188/0x310 [obdclass] [<ffffffff81138dba>] shrink_slab+0x12a/0x1a0 [<ffffffff8113c0da>] balance_pgdat+0x59a/0x820 [<ffffffff8113c494>] kswapd+0x134/0x3b0 |
| Comments |
| Comment by Andreas Dilger [ 07/Jan/15 ] |
|
This has the same symptoms of |
| Comment by Andreas Dilger [ 05/Feb/15 ] |
|
Hit a very similar crash again in osd_object_delete->qsd_op_adjust() when unmounting in sanity.sh test_65j and I am again testing http://review.whamcloud.com/12515 from |
| Comment by Andreas Dilger [ 06/Feb/15 ] |
|
I reverted the 12515 patch, and while I've observed the original |
| Comment by Andreas Dilger [ 11/Feb/15 ] |
|
I've now hit this once while testing http://review.whamcloud.com/11258 on master instead of the 12515 patch, in a thread doing memory reclaim during an unmount operation, though it wasn't a thread involved in the unmount process: Lustre: Failing over testfs-OST0000 Lustre: server umount testfs-OST0000 complete Lustre: Failing over testfs-OST0001 general protection fault: 0000 [#1] SMP Pid: 2170, comm: java Tainted: P--------------- 2.6.32-431.29.2.el6_lustre.g36cd22b.x86_6 RIP: 0010:[<ffffffffa07a4cf6>] [<ffffffffa07a4cf6>] qsd_op_adjust+0xb6/0x580 [lq uota] Process java (pid: 2170, threadinfo ffff8800d00ce000, task ffff880037c61540) Call Trace: osd_object_delete+0x217/0x2f0 [osd_ldiskfs] lu_object_free+0x81/0x1a0 [obdclass] lu_site_purge+0x2e7/0x4e0 [obdclass] lu_cache_shrink+0x188/0x310 [obdclass] shrink_slab+0x12a/0x1a0 do_try_to_free_pages+0x3f7/0x610 try_to_free_pages+0x92/0x120 __alloc_pages_nodemask+0x47e/0x8d0 kmem_getpages+0x62/0x170 fallback_alloc+0x1ba/0x270 ____cache_alloc_node+0x99/0x160 user_path_parent+0x31/0x80 sys_renameat+0xb8/0x3a0 sys_rename+0x1b/0x20 I've also gone back and retested the latest version of 12515 and not hit this problem in sanity as I had twice before. |