Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.3
-
None
-
client and server version lustre2.4.3-7nas
-
3
-
16510
Description
?DUP of LU-10?
1. Client get a odb_ping failed
2. client is evicted
3. client can't reconnect and stuck in ptlrpc_invalidate_import
Nov 12 04:32:30 pfe22 kernel: [69643.816473] LustreError: 11-0: nbp9-OST0075-osc-ffff880a28404400: Communicating with 10.151.26.11@o2ib, operation obd_ping failed with -107. Nov 12 04:32:30 pfe22 kernel: [69643.854158] Lustre: nbp9-OST0075-osc-ffff880a28404400: Connection to nbp9-OST0075 (at 10.151.26.11@o2ib) was lost; in progress operations using this service will wait for recovery to complete Nov 12 04:32:30 pfe22 kernel: [69643.922758] LustreError: 167-0: nbp9-OST0075-osc-ffff880a28404400: This client was evicted by nbp9-OST0075; in progress operations using this service will fail. Nov 12 04:34:11 pfe22 kernel: [69743.757653] LustreError: 92481:0:(import.c:324:ptlrpc_invalidate_import()) nbp9-OST0075_UUID: rc = -110 waiting for callback (1 != 0) Nov 12 04:34:11 pfe22 kernel: [69743.793518] LustreError: 92481:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff8802204cec00 x1484496199485132/t0(0) o4->nbp9-OST0075-osc-ffff880a28404400@10.151.26.11@o2ib:6/4 lens 488/448 e 0 to 0 dl 1415726912 ref 2 fl Rpc:RE/0/ffffffff rc -5/-1 Nov 12 04:34:11 pfe22 kernel: [69743.866993] LustreError: 92481:0:(import.c:366:ptlrpc_invalidate_import()) nbp9-OST0075_UUID: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
We find ldlm_bl_ threads stuck in D state.
Stack traceback for pid 8080 0xffff8802569b2540 8080 2 0 1 D 0xffff8802569b2bb0 ldlm_bl_04 [<ffffffff8146fb6b>] thread_return+0x0/0x295 [<ffffffff81470c58>] __mutex_lock_slowpath+0xf8/0x150 [<ffffffff814706ea>] mutex_lock+0x1a/0x40 [<ffffffffa08bd28e>] cl_lock_mutex_get+0x6e/0xc0 [obdclass] [<ffffffffa0b8988e>] osc_dlm_blocking_ast0+0x5e/0x210 [osc] [<ffffffffa0b89a8c>] osc_ldlm_blocking_ast+0x4c/0x100 [osc] [<ffffffffa09e30a0>] ldlm_handle_bl_callback+0xc0/0x420 [ptlrpc] [<ffffffffa09e3609>] ldlm_bl_thread_main+0x209/0x430 [ptlrpc] [<ffffffff8147ade4>] kernel_thread_helper+0x4/0x10 [0]kdb> btp 8090 Stack traceback for pid 8090 0xffff88024c9a6640 8090 2 0 6 D 0xffff88024c9a6cb0 ldlm_bl_12 [<ffffffff8146fb6b>] thread_return+0x0/0x295 [<ffffffff81470c58>] __mutex_lock_slowpath+0xf8/0x150 [<ffffffff814706ea>] mutex_lock+0x1a/0x40 [<ffffffffa08bd28e>] cl_lock_mutex_get+0x6e/0xc0 [obdclass] [<ffffffffa0b8988e>] osc_dlm_blocking_ast0+0x5e/0x210 [osc] [<ffffffffa0b89a8c>] osc_ldlm_blocking_ast+0x4c/0x100 [osc] [<ffffffffa09e30a0>] ldlm_handle_bl_callback+0xc0/0x420 [ptlrpc] [<ffffffffa09e3609>] ldlm_bl_thread_main+0x209/0x430 [ptlrpc] [<ffffffff8147ade4>] kernel_thread_helper+0x4/0x10 [0]kdb> go