Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.8
-
None
-
Clients: 2.12.0, CentOS 7.6
-
3
-
9223372036854775807
Description
LBUG today on oak-MDT0000, never seem this one before. We have had some big data transfers using dsync going on on Sherlock (2.12.0 clients). Might be related, or not.
[4954375.921845] LustreError: 15102:0:(tgt_handler.c:628:process_req_last_xid()) @@@ Unexpected xid 5d6425ffe4140 vs. last_xid 5d6425ffe418f req@ffffa1597f41f200 x1642955450237248/t0(0) o101->98bbe778-4f70-8a89-d80e-d6a8120c693b@10.8.2.23@o2ib6:663/0 lens 736/0 e 0 to 0 dl 1567111883 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [4954542.487326] LustreError: 15290:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [4954542.517377] LustreError: 15290:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3754300 previous similar messages [4954874.316190] LustreError: 15347:0:(lod_object.c:3919:lod_ah_init()) ASSERTION( !lod_obj_is_striped(child) ) failed: [4954874.351112] LustreError: 15347:0:(lod_object.c:3919:lod_ah_init()) LBUG [4954874.373452] Pid: 15347, comm: mdt01_049 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Mon Oct 8 11:21:37 PDT 2018 [4954874.406359] Call Trace: [4954874.414973] [<ffffffffc08af7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [4954874.437035] [<ffffffffc08af87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [4954874.459664] [<ffffffffc135a89f>] lod_ah_init+0x23f/0xde0 [lod] [4954874.479751] [<ffffffffc13d306b>] mdd_object_make_hint+0xcb/0x190 [mdd] [4954874.502388] [<ffffffffc13bed50>] mdd_create_data+0x330/0x730 [mdd] [4954874.523606] [<ffffffffc129140c>] mdt_mfd_open+0xc5c/0xe70 [mdt] [4954874.544523] [<ffffffffc1291b9b>] mdt_finish_open+0x57b/0x690 [mdt] [4954874.565743] [<ffffffffc1293478>] mdt_reint_open+0x17c8/0x3190 [mdt] [4954874.587229] [<ffffffffc1288cb3>] mdt_reint_rec+0x83/0x210 [mdt] [4954874.607567] [<ffffffffc126a19b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [4954874.630197] [<ffffffffc126a6c2>] mdt_intent_reint+0x162/0x430 [mdt] [4954874.651677] [<ffffffffc126d4cb>] mdt_intent_opc+0x1eb/0xaf0 [mdt] [4954874.672619] [<ffffffffc1275d68>] mdt_intent_policy+0x138/0x320 [mdt] [4954874.694668] [<ffffffffc0be82dd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc] [4954874.719320] [<ffffffffc0c11c03>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc] [4954874.743104] [<ffffffffc0c977f2>] tgt_enqueue+0x62/0x210 [ptlrpc] [4954874.764026] [<ffffffffc0c9b72a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [4954874.787245] [<ffffffffc0c4404b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [4954874.813872] [<ffffffffc0c47792>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [4954874.835628] [<ffffffff8babdf21>] kthread+0xd1/0xe0 [4954874.852252] [<ffffffff8c1255f7>] ret_from_fork_nospec_end+0x0/0x39 [4954874.873448] [<ffffffffffffffff>] 0xffffffffffffffff [4954874.890366] Kernel panic - not syncing: LBUG
I do have a crash dump if you're interested. MDT failover was smooth so not a big deal:
Aug 29 14:04:49 oak-md1-s1 kernel: Lustre: oak-MDT0000: Recovery over after 0:55, of 1464 clients 1464 recovered and 0 were evicted.