Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.5
-
None
-
4
-
11317
Description
MDS hung with the following errors. There is some evidence that it happed right after setting a users quota to zero. Then after reboot/recover it hung again.
— First set of errors messages
<3>LustreError: 8304:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
<3>LustreError: 8304:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
<3>LustreError: 8304:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
<3>LustreError: 8304:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
<3>LustreError: 8304:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
---- SOME STACK TRACES ----
Stack traceback for pid 8262
0xffff883f8749cae0 8262 2 1 15 R 0xffff883f8749d180 ldlm_cb_26
sp ip Function (args)
======================= <nmi>
======================= <normal>
0xffff883f9c8cf900 0xffffffff8151f04c _spin_lock+0x1c (0xffffffff8204c3a0)
0xffff883f9c8cf908 0xffffffff8135ce06 serial8250_console_write+0x156 (unknown, 0xffffffff81dc7012, unknown)
0xffff883f9c8cf958 0xffffffff81069745 __call_console_drivers+0x75 (unknown, unknown)
0xffff883f9c8cf988 0xffffffff810697aa _call_console_drivers+0x4a (unknown, unknown, unknown)
0xffff883f9c8cf9a8 0xffffffff81069d78 release_console_sem+0x108
0xffff883f9c8cf9e8 0xffffffff8106a478 vprintk+0x248 (unknown, unknown)
0xffff883f9c8cfa88 0xffffffff8151c151 printk+0x41
0xffff883f9c8cfae8 0xffffffffa04f8105 [libcfs]cfs_print_to_console+0x75 (unknown, unknown, 0xffff883fd42b9182, unknown, unknown, 0xffffffffa07d7180)
0xffff883f9c8cfb18 0xffffffffa05038de [libcfs]libcfs_debug_vmsg2+0x5de (0xffffffffa083de20, 0xffffffffa07f1c0e, 0xffff883f9c8cfc90, 0x0, 0x3, 0xffff8839c7b1cc00)
0xffff883f9c8cfc88 0xffffffffa0503ea1 [libcfs]libcfs_debug_msg+0x41 (unknown, unknown, 0xfffffffe, 0xfffffffe, 0x3, 0xffff8839c7b1cc00)
0xffff883f9c8cfce8 0xffffffffa075e253 [ptlrpc]target_handle_dqacq_callback+0x6e3 (0xffff881e0ef11400)
0xffff883f9c8cfd38 0xffffffffa076df87 [ptlrpc]ldlm_callback_handler+0xa17 (0xffff881e0ef11400)
0xffff883f9c8cfdc8 0xffffffffa079fb4e [ptlrpc]ptlrpc_main+0xc4e (unknown)
0xffff883f9c8cff48 0xffffffff8100c0ca child_rip+0xa (unknown, unknown)
Enter <q> to end, <cr> to continue:
— second set of errors after recover ----
Lustre: MDS mdd_obd-nbp8-MDT0000: nbp8-OST001a_UUID now active, resetting orphans
LustreError: 6186:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
LustreError: 7771:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
LustreError: 6185:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
LustreError: 7798:0:(ldlm_lib.c:2467:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-2)
— Stack trace after reboot/recover -----
Stack traceback for pid 8063
0xffff883fc6e92aa0 8063 2 1 8 R 0xffff883fc6e93140 qmaster_recovd
sp ip Function (args)
0xffff883fc6ef7a18 0xffffffff8100c313 kdb_interrupt+0x13 (0xffffffffa05ada40, 0x0, 0x4e3a, 0x0, 0x0, 0xffff883f8a5bd940)
0xffff883fc6ef7a78 0xffffffff8151f04e _spin_lock+0x1e (0xffffffffa05ada40)
0xffff883fc6ef7aa8 0xffffffffa0582d76 [lnet]LNetMDBind+0x1d6 (unknown, 0xffff883f9310c978, unknown, unknown, unknown, unknown, 0xffff883fc2e91e00, 0x100000130, 0xc6ef7ba0)
0xffff883fc6ef7af8 0xffffffffa0788478 [ptlrpc]ptl_send_buf+0x128 (0xffff883f9310c978, 0xffff883fc2e91e00, unknown, unknown, 0xffff883f9310c980, 0xffff881e28d19b40, 0xffff883f0000001c, 0x52700f3927367, 0x1500000000)
0xffff883fc6ef7ba8 0xffffffffa078b774 [ptlrpc]ptl_send_rpc+0x544 (0xffff883f9310c800, unknown)
0xffff883fc6ef7c68 0xffffffffa0781993 [ptlrpc]ptlrpc_send_new_req+0x433 (0xffff883f9310c800)
0xffff883fc6ef7cd8 0xffffffffa078553a [ptlrpc]ptlrpc_set_wait+0x6a (0xffff883f8bcdd8c0)
0xffff883fc6ef7d88 0xffffffffa0785cfc [ptlrpc]ptlrpc_queue_wait+0x7c (0xffff883f9310c800)
0xffff883fc6ef7da8 0xffffffffa09c6011 [lquota]client_quota_ctl+0x151 (unknown, unknown, 0xffff883fa47df5c0)
0xffff883fc6ef7de8 0xffffffffa09c6af7 [lquota]lov_quota_ctl+0x427 (unknown, unknown, 0xffff883fa47df5c0)
0xffff883fc6ef7e48 0xffffffffa09cd9d9 [lquota]obd_quotactl+0xd9 (0xffff881e9c387400, 0xffff883fa47df5c0)
0xffff883fc6ef7e78 0xffffffffa09d4ade [lquota]qmaster_recovery_main+0x8ae (unknown)
0xffff883fc6ef7f48 0xffffffff8100c0ca child_rip+0xa (unknown, unknown)
[3]kdb> btp 7664
Stack traceback for pid 7664
0xffff883fcdcad500 7664 2 1 27 R 0xffff883fcdcadba0 qmaster_recovd
sp ip Function (args)
0xffff883fcdcaf998 0xffffffff8100c313 kdb_interrupt+0x13 (0xffffffffa05ada40, 0xffff883fc646e400, 0x4e3a, 0x0, 0x0, 0x130)
0xffff883fcdcaf9f8 0xffffffff8151f051 _spin_lock+0x21 (0xffffffffa05ada40)
0xffff883fcdcafa28 0xffffffffa05889f9 [lnet]lnet_send+0x59 (0x500000a971b3c, 0xffff883fc646e400)
0xffff883fcdcafa98 0xffffffffa0589f04 [lnet]LNetPut+0x2d4 (0x500000a971b3c, unknown, unknown, 0x500000a971b49, 0x3039, unknown, 0x52700f3927364, 0x100000000, 0x0)
0xffff883fcdcafaf8 0xffffffffa078852a [ptlrpc]ptl_send_buf+0x1da (0xffff883fd7360978, 0xffff883f8e6a3c00, unknown, unknown, 0xffff883fd7360980, 0xffff881e28eca740, 0xffff883f0000001c, 0x52700f3927364, 0x1500000000)
0xffff883fcdcafba8 0xffffffffa078b774 [ptlrpc]ptl_send_rpc+0x544 (0xffff883fd7360800, unknown)
0xffff883fcdcafc68 0xffffffffa0781993 [ptlrpc]ptlrpc_send_new_req+0x433 (0xffff883fd7360800)
0xffff883fcdcafcd8 0xffffffffa078553a [ptlrpc]ptlrpc_set_wait+0x6a (0xffff883f8a8b26c0)
0xffff883fcdcafd88 0xffffffffa0785cfc [ptlrpc]ptlrpc_queue_wait+0x7c (0xffff883fd7360800)
0xffff883fcdcafda8 0xffffffffa09c6011 [lquota]client_quota_ctl+0x151 (unknown, unknown, 0xffff883f8e8b3f40)
0xffff883fcdcafde8 0xffffffffa09c6af7 [lquota]lov_quota_ctl+0x427 (unknown, unknown, 0xffff883f8e8b3f40)
0xffff883fcdcafe48 0xffffffffa09cd9d9 [lquota]obd_quotactl+0xd9 (0xffff881e9c387400, 0xffff883f8e8b3f40)
0xffff883fcdcafe78 0xffffffffa09d4ade [lquota]qmaster_recovery_main+0x8ae (unknown)
0xffff883fcdcaff48 0xffffffff8100c0ca child_rip+0xa (unknown, unknown)