[LU-11252] MDS kernel panic when try to umount Created: 14/Aug/18  Updated: 15/Nov/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5, Lustre 2.10.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: llnl, soak
Environment:

2.10.5-RC1 DNE


Issue Links:
Related
is related to LU-10055 mdt_fill_lvbo() message spew on MDS c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Doing mds failover, MDS1(soak-9) crash when umounting the MDT0
the problem seems related with LU-10390 and LU-10635

soak.log

2018-08-14 12:43:14,259:fsmgmt.fsmgmt:INFO     soaked-MDT0000 in status 'RECOVERING'.
2018-08-14 12:43:14,259:fsmgmt.fsmgmt:INFO     Next recovery check in 15s...
2018-08-14 12:43:29,697:fsmgmt.fsmgmt:DEBUG    Recovery Result Record: {'soak-9': {'soaked-MDT0001': 'COMPLETE', 'soaked-MDT0000': 'COMPLETE'}}
2018-08-14 12:43:29,697:fsmgmt.fsmgmt:INFO     Node soak-9: 'soaked-MDT0000' recovery completed
2018-08-14 12:43:29,697:fsmgmt.fsmgmt:INFO     Failing back soaked-MDT0000 ...
2018-08-14 12:43:29,697:fsmgmt.fsmgmt:INFO     Unmounting soaked-MDT0000 on soak-9 ...

soak-9 console log

[ 8931.176903] LustreError: Skipped 91 previous similar messages^M
[ 8952.016266] LDISKFS-fs (dm-3): recovery complete^M
[ 8952.024971] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc^M
[ 8953.758389] Lustre: MGS: Connection restored to 192.168.1.109@o2ib (at 0@lo)^M
[ 8954.257224] Lustre: soaked-MDT0000: Imperative Recovery not enabled, recovery window 300-900^M
[ 8954.313828] Lustre: soaked-MDT0000: in recovery but waiting for the first client to connect^M
[ 8955.666497] Lustre: soaked-MDT0000: Will be in recovery for at least 5:00, or until 29 clients reconnect^M
[ 8979.238342] Lustre: Evicted from MGS (at 192.168.1.109@o2ib) after server handle changed from 0x5be05c011300c6fc to 0x33b9ca6b616b9f53^M
[ 9019.048297] Lustre: MGS: Connection restored to 192.168.1.111@o2ib (at 192.168.1.111@o2ib)^M
[ 9019.061014] Lustre: Skipped 72 previous similar messages^M
[ 9029.239386] LustreError: 167-0: soaked-MDT0000-lwp-MDT0001: This client was evicted by soaked-MDT0000; in progress operations using this service will fail.^M
[ 9056.462524] Lustre: soaked-MDT0000: Recovery over after 1:40, of 29 clients 29 recovered and 0 were evicted.^M
[ 9057.179759] LustreError: 4059:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 416 actual 344.^M
[ 9057.191940] LustreError: 4059:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 5 previous similar messages^M
[ 9063.287830] Lustre: Failing over soaked-MDT0000^M
[ 9066.966848] LustreError: 4060:0:(ldlm_lockd.c:1415:ldlm_handle_enqueue0()) ### lock on destroyed export ffff98fe2e643c00 ns: mdt-soaked-MDT0000_UUID lock: ffff9902823fa800/0x33b9ca6b6194c08e lrc: 1/0,0 mode: --/CR res: [0x20000c78a:0x11d30:0x0].0x0 bits 0x8 rrc: 6 type: IBT flags: 0x54801000000000 nid: 192.168.1.135@o2ib remote: 0x6ee64381e8b43dff expref: 7 pid: 4060 timeout: 0 lvb_type: 3^M
[ 9066.966990] LustreError: 4067:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 752 actual 416.^M
[ 9066.966993] LustreError: 4067:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 9 previous similar messages^M
[ 9066.979183] LustreError: 4081:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff98fe00f51500 x1608769364838608/t0(0) o105->soaked-MDT0000@192.168.1.111@o2ib:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1^M
[ 9066.990686] LustreError: 11324:0:(osp_precreate.c:642:osp_precreate_send()) soaked-OST0003-osc-MDT0000: can't precreate: rc = -5^M
[ 9066.990693] LustreError: 11324:0:(osp_precreate.c:1289:osp_precreate_thread()) soaked-OST0003-osc-MDT0000: cannot precreate objects: rc = -5^M
[ 9067.005697] Lustre: soaked-MDT0000: Not available for connect from 192.168.1.144@o2ib (stopping)^M
[ 9067.005698] Lustre: soaked-MDT0000: Not available for connect from 192.168.1.121@o2ib (stopping)^M
[ 9067.130024] LustreError: 4060:0:(ldlm_lockd.c:1415:ldlm_handle_enqueue0()) Skipped 13 previous similar messages^M
[ 9067.143747] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c^M
[ 9067.154313] IP: [<ffffffffc12a814d>] ldlm_handle_conflict_lock+0x3d/0x330 [ptlrpc]^M
[ 9067.164557] PGD 0 ^M
[ 9067.168351] Oops: 0000 [#1] SMP ^M
[ 9067.173431] Modules linked in: mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) dm_round_robin zfs(POE) zunicode(POE) zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp zcommon(POE) znvpair(POE) spl(OE) intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev ipmi_ssif ipmi_si ipmi_devintf iTCO_wdt iTCO_vendor_support sg ipmi_msghandler pcspkr mei_me lpc_ich mei ioatdma i2c_i801 wmi shpchp dm_multipath dm_mod auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 mlx4_ib(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect ahci igb isci sysimgblt fb_sys_fops ptp mlx4_core(OE) mpt2sas ttm libsas libahci pps_core crct10dif_pclmul drm dca crct10dif_common raid_class libata crc32c_intel i2c_algo_bit mlx_compat(OE) i2c_core scsi_transport_sas devlink^M
[ 9067.305855] CPU: 0 PID: 11591 Comm: ldlm_bl_08 Tainted: P           OE  ------------   3.10.0-862.9.1.el7_lustre.x86_64 #1^M
[ 9067.319633] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M
[ 9067.333638] task: ffff98fe43e43f40 ti: ffff98fe2cf8c000 task.ti: ffff98fe2cf8c000^M
[ 9067.343460] RIP: 0010:[<ffffffffc12a814d>]  [<ffffffffc12a814d>] ldlm_handle_conflict_lock+0x3d/0x330 [ptlrpc]^M
[ 9067.356095] RSP: 0018:ffff98fe2cf8fbc0  EFLAGS: 00010246^M
[ 9067.363466] RAX: 0000000000000001 RBX: ffff990230721600 RCX: 0000000000000000^M
[ 9067.372905] RDX: ffff98fe2cf8fc18 RSI: ffff98fe2cf8fc80 RDI: ffff990230721600^M
[ 9067.382288] RBP: ffff98fe2cf8fbf0 R08: ffff98fe2cf8fcd0 R09: ffff98feae059740^M
[ 9067.391697] R10: ffff990230721600 R11: 000000020000c7a7 R12: ffff98fe2cf8fc18^M
[ 9067.401046] R13: ffff98fe2cf8fc80 R14: ffff98fe2cf8fc18 R15: 0000000000000000^M
[ 9067.410382] FS:  0000000000000000(0000) GS:ffff98feae000000(0000) knlGS:0000000000000000^M
[ 9067.420839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 9067.428646] CR2: 000000000000001c CR3: 00000006e360e000 CR4: 00000000000607f0^M
[ 9067.437961] Call Trace:^M
[ 9067.442072]  [<ffffffffc12d955d>] ldlm_process_inodebits_lock+0xfd/0x400 [ptlrpc]^M
[ 9067.451841]  [<ffffffffc12d9460>] ? ldlm_inodebits_compat_queue+0x390/0x390 [ptlrpc]^M
[ 9067.461888]  [<ffffffffc12a79ed>] ldlm_reprocess_queue+0x13d/0x2a0 [ptlrpc]^M
[ 9067.471035]  [<ffffffffc12a858d>] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc]^M
[ 9067.480154]  [<ffffffffc12a8b46>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]^M
[ 9067.488864]  [<ffffffffc0aa0c50>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]^M
[ 9067.498240]  [<ffffffffc12a8b20>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]^M
[ 9067.507403]  [<ffffffffc12a8b20>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]^M
[ 9067.516516]  [<ffffffffc0aa3fe5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]^M
[ 9067.525834]  [<ffffffffc12a8b8c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]^M
[ 9067.535543]  [<ffffffffc12a983c>] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]^M
[ 9067.544910]  [<ffffffffc12d2c08>] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc]^M
[ 9067.553787]  [<ffffffffc12d2740>] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]^M
[ 9067.563206]  [<ffffffffac8bb621>] kthread+0xd1/0xe0^M
[ 9067.569815]  [<ffffffffac8bb550>] ? insert_kthread_work+0x40/0x40^M
[ 9067.577760]  [<ffffffffacf205f7>] ret_from_fork_nospec_begin+0x21/0x21^M
[ 9067.586159]  [<ffffffffac8bb550>] ? insert_kthread_work+0x40/0x40^M
[ 9067.594055] Code: 49 89 f5 41 54 53 48 89 fb 48 83 ec 08 f6 05 26 14 81 ff 01 48 89 4d d0 4c 8b 7f 48 74 0d f6 05 1b 14 81 ff 01 0f 85 63 01 00 00 <41> 8b 47 1c 85 c0 0f 84 6b 02 00 00 48 8d 43 60 48 39 43 60 0f ^M
[ 9067.618124] RIP  [<ffffffffc12a814d>] ldlm_handle_conflict_lock+0x3d/0x330 [ptlrpc]^M
[ 9067.627815]  RSP <ffff98fe2cf8fbc0>^M
[ 9067.632829] CR2: 000000000000001c^M
[ 9067.640738] Lustre: soaked-MDT0000: Not available for connect from 192.168.1.107@o2ib (stopping)^M
[ 9067.652220] Lustre: Skipped 12 previous similar messages^M
[ 9067.725141] ---[ end trace 2c0ee3d783e754cb ]---^M
[ 9067.800713] Kernel panic - not syncing: Fatal exception^M
[ 9067.807687] Kernel Offset: 0x2b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)^M


 Comments   
Comment by Andreas Dilger [ 14/Aug/18 ]

I see there is an error being printed in mdt_lvbo_fill() that is fixed in master via patch https://review.whamcloud.com/30004 "LU-10055 mdt: use max_mdsize in reply for layout intent" that we might consider backporting to b2_10.

Comment by Sarah Liu [ 12/Mar/19 ]

Hit the similar issue when testing 2.10.7-rc1 after running about 36 hours

[ 5959.580924] Lustre: soaked-MDT0000: Received LWP connection from 0@lo, removing former export from 192.168.1.109@o2ib
[ 5959.582722] Lustre: soaked-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[ 5959.647889] LustreError: 13409:0:(tgt_lastrcvd.c:1360:tgt_last_rcvd_update()) soaked-MDT0000: trying to overwrite bigger transno:on-disk: 549760023477, new: 549760023476 r
eplay: 0. See LU-617.
[ 5959.763769] Lustre: soaked-MDT0001: in recovery but waiting for the first client to connect
[ 5965.084661] Lustre: soaked-MDT0001: Will be in recovery for at least 2:30, or until 28 clients reconnect
[ 5965.095348] Lustre: soaked-MDT0001: Connection restored to 131ca863-02f6-898f-5cde-3df7edf12ce4 (at 192.168.1.131@o2ib)
[ 5965.107412] Lustre: Skipped 1 previous similar message
[ 5969.305876] Lustre: soaked-MDT0001: Connection restored to 5b571358-830f-2cb5-a308-7105d98c7dc4 (at 192.168.1.116@o2ib)
[ 5969.317960] Lustre: Skipped 8 previous similar messages
[ 5973.500188] LNet: 12359:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 192.168.1.109@o2ib: 3 seconds
[ 5978.159050] Lustre: soaked-MDT0001: Connection restored to c484a96c-d785-2400-a4e6-0e5ebd24971a (at 192.168.1.121@o2ib)
[ 5978.171136] Lustre: Skipped 10 previous similar messages
[ 5999.071901] Lustre: soaked-MDT0001: Connection restored to 192.168.1.106@o2ib (at 192.168.1.106@o2ib)
[ 5999.082230] Lustre: Skipped 12 previous similar messages
[ 6001.949346] Lustre: soaked-MDT0001: Recovery over after 0:37, of 28 clients 28 recovered and 0 were evicted.
[ 6008.249648] Lustre: Failing over soaked-MDT0001
[ 6009.117114] Lustre: soaked-MDT0001: Not available for connect from 192.168.1.135@o2ib (stopping)
[ 6013.073130] Lustre: soaked-MDT0001: Not available for connect from 192.168.1.107@o2ib (stopping)
[ 6013.073132] Lustre: soaked-MDT0001: Not available for connect from 192.168.1.107@o2ib (stopping)
[ 6013.092844] Lustre: Skipped 2 previous similar messages
[ 6013.802207] LustreError: 15467:0:(ldlm_lockd.c:1395:ldlm_handle_enqueue0()) ### lock on destroyed export ffff990a44e51c00 ns: mdt-soaked-MDT0001_UUID lock: ffff990a4715aa0
0/0x5488417a94e2d76 lrc: 1/0,0 mode: --/CR res: [0x240048072:0x16717:0x0].0x0 bits 0x8 rrc: 15 type: IBT flags: 0x54801000000000 nid: 192.168.1.135@o2ib remote: 0xcc4d6288204
95bf5 expref: 5 pid: 15467 timeout: 0 lvb_type: 3
[ 6013.802461] LustreError: 13269:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff99062d036c00 x1627394594877232/t0(0) o900->soaked-MDT0000-lwp-MDT0001@0
@lo:29/10 lens 264/248 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
[ 6013.802473] LustreError: 13269:0:(lod_lov.c:831:lod_gen_component_ea()) soaked-MDT0001-mdtlov: Can not locate [0x440000401:0x14824d2:0x0]: rc = -5
[ 6013.879523] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 6013.888300] IP: [<ffffffffc0d45493>] ldlm_process_inodebits_lock+0x63/0x400 [ptlrpc]
[ 6013.897027] PGD 0 
[ 6013.899287] Oops: 0000 [#1] SMP 
[ 6013.902922] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm dm_round_robin irqbypass ipmi_ssif iTCO_wdt iTCO_vendor_support crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si i2c_i801 cryptd ipmi_devintf ioatdma mei_me ipmi_msghandler sg joydev lpc_ich mei pcspkr wmi dm_multipath dm_mod auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm mlx4_core(OE) ahci isci libahci crct10dif_pclmul crct10dif_common mpt2sas ptp libsas crc32c_intel drm devlink pps_core raid_class libata dca scsi_transport_sas mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit
[ 6014.014529] CPU: 2 PID: 12442 Comm: ldlm_bl_02 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.1.3.el7_lustre.x86_64 #1
[ 6014.028230] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 6014.040758] task: ffff990a68394100 ti: ffff9902ec374000 task.ti: ffff9902ec374000
[ 6014.049118] RIP: 0010:[<ffffffffc0d45493>]  [<ffffffffc0d45493>] ldlm_process_inodebits_lock+0x63/0x400 [ptlrpc]
[ 6014.060525] RSP: 0018:ffff9902ec377c00  EFLAGS: 00010287
[ 6014.066452] RAX: 0000000000000030 RBX: ffff990a626a7600 RCX: ffff9902ec377c7c
[ 6014.074417] RDX: 0000000000000002 RSI: ffff9902ec377c80 RDI: ffff990a626a7600
[ 6014.082380] RBP: ffff9902ec377c58 R08: ffff9902ec377cd0 R09: ffff99066e25b780
[ 6014.090342] R10: ffff990a626a7600 R11: 0000000240048071 R12: ffff990a68e7c280
[ 6014.098305] R13: 0000000000000000 R14: ffff9902ec377c80 R15: ffff990a626a7660
[ 6014.106273] FS:  0000000000000000(0000) GS:ffff99066e080000(0000) knlGS:0000000000000000
[ 6014.115303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6014.121716] CR2: 0000000000000030 CR3: 0000000454410000 CR4: 00000000000607e0
[ 6014.129694] Call Trace:
[ 6014.132461]  [<ffffffffc0d45430>] ? ldlm_inodebits_compat_queue+0x390/0x390 [ptlrpc]
[ 6014.141138]  [<ffffffffc0d13a0d>] ldlm_reprocess_queue+0x13d/0x2a0 [ptlrpc]
[ 6014.148943]  [<ffffffffc0d145ad>] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc]
[ 6014.156750]  [<ffffffffc0d14b66>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
[ 6014.164152]  [<ffffffffc09ead10>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[ 6014.172232]  [<ffffffffc0d14b40>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[ 6014.180128]  [<ffffffffc0d14b40>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
[ 6014.187993]  [<ffffffffc09ee0a5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[ 6014.196084]  [<ffffffffc0d14bac>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[ 6014.204573]  [<ffffffffc0d1585c>] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]
[ 6014.212762]  [<ffffffffc0d3ebd8>] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc]
[ 6014.220440]  [<ffffffff936d67b0>] ? wake_up_state+0x20/0x20
[ 6014.226696]  [<ffffffffc0d3e710>] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
[ 6014.234975]  [<ffffffff936c1c31>] kthread+0xd1/0xe0
[ 6014.240420]  [<ffffffff936c1b60>] ? insert_kthread_work+0x40/0x40
[ 6014.247247]  [<ffffffff93d74c37>] ret_from_fork_nospec_begin+0x21/0x21
[ 6014.254533]  [<ffffffff936c1b60>] ? insert_kthread_work+0x40/0x40
[ 6014.261335] Code: ff 01 4c 8b 6f 48 74 0d f6 05 eb e0 cb ff 01 0f 85 d3 01 00 00 8b 83 98 00 00 00 39 83 9c 00 00 00 0f 84 2a 03 00 00 49 8d 45 30 <49> 39 45 30 0f 85 ea 02 00 00 41 8b 45 1c 4c 8d 65 c0 4c 89 65 
[ 6014.283100] RIP  [<ffffffffc0d45493>] ldlm_process_inodebits_lock+0x63/0x400 [ptlrpc]
[ 6014.291880]  RSP <ffff9902ec377c00>
[ 6014.295777] CR2: 0000000000000030
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu

Comment by Olaf Faaland [ 15/Nov/19 ]

Encountered the same BUG in Lustre 2.10.8 during shutdown of MDT0000 backed by ZFS.

https://github.com/LLNL/lustre
tag 2.10.8_4.chaos

Lustre: Failing over ls1-MDT0000            
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffc1434ce3>] ldlm_process_inodebits_lock+0x63/0x400 [ptlrpc] 
PGD 0                                                                    
Oops: 0000 [#1] SMP                                                      
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE)[358180.891888] Lustre: ls1-MDT0000: Not available for connect from 192.168.135.201@o2ib27 (stopping)                                                                                                                                                       
Lustre: Skipped 40 previous similar messages                                                                                                                
 osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) rpcrdma ib_iser mlx5_ib iTCO_wdt iTCO_vendor_support sch_fq_codel sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi joydev mlx5_core kvm dm_round_robin mlxfw devlink irqbypass pcspkr lpc_ich i2c_i801 ioatdma ses enclosure sg ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter pcc_cpufreq acpi_cpufreq zfs(POE) zunicode(POE) zavl(POE) icp(POE) binfmt_misc zcommon(POE) znvpair(POE) spl(OE) msr_safe(OE) ib_ipoib rdma_ucm ib_uverbs ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm iw_cxgb3 ib_core ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay(T) ext4 mbcache jbd2 dm_service_time sd_mod crc_t10dif crct10dif_generic be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs mgag200 i2c_algo_bit 8021q garp drm_kms_helper mrp syscopyarea stp sysfillrect sysimgblt llc fb_sys_fops crct10dif_pclmul crct10dif_common crc32_pclmul ttm crc32c_intel ixgbe(OE) dca ghash_clmulni_intel mxm_wmi drm mpt3sas ahci aesni_intel libahci lrw gf128mul glue_helper ablk_helper ptp raid_class drm_panel_orientation_quirks libata cryptd scsi_transport_sas pps_core dm_multipath wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
CPU: 6 PID: 10992 Comm: ldlm_bl_06 Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1062.1.1.1chaos.ch6.x86_64 #1
Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018
task: ffff929300a15230 ti: ffff928f15db4000 task.ti: ffff928f15db4000
RIP: 0010:[<ffffffffc1434ce3>]  [<ffffffffc1434ce3>] ldlm_process_inodebits_lock+0x63/0x400 [ptlrpc]
RSP: 0018:ffff928f15db7bf8  EFLAGS: 00010297
RAX: 0000000000000030 RBX: ffff925228d03800 RCX: ffff928f15db7c74
RDX: 0000000000000002 RSI: ffff928f15db7c78 RDI: ffff925228d03800
RBP: ffff928f15db7c50 R08: ffff928f15db7cd0 R09: ffff92527fbdb840
R10: ffff925228d03800 R11: 0000000580009c04 R12: ffff928f158eba80
R13: 0000000000000000 R14: ffff928f15db7c78 R15: ffff92^@5228d03860
FS:  0000000000000000(0000) GS:ffff92527fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000003f1674a000 CR4: 00000000003607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 [<ffffffffc1434c80>] ? ldlm_inodebits_compat_queue+0x390/0x390 [ptlrpc]
 [<ffffffffc1402a9d>] ldlm_reprocess_queue+0x13d/0x2a0 [ptlrpc]
 [<ffffffffc1403666>] __ldlm_reprocess_all+0x166/0x3c0 [ptlrpc]
 [<ffffffffc1403c26>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
 [<ffffffffc110bf33>] cfs_hash_for_each_relax+0x263/0x480 [libcfs]
 [<ffffffffc1403c00>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
 [<ffffffffc1403c00>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
 [<ffffffffc110f325>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
 [<ffffffffc1403c6c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
 [<ffffffffc140491c>] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]
 [<ffffffffc142e418>] ldlm_bl_thread_main+0x4d8/0x710 [ptlrpc]
 [<ffffffff984e1420>] ? wake_up_state+0x20/0x20
 [<ffffffffc142df40>] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
 [<ffffffff984cb2f1>] kthread+0xd1/0xe0
 [<ffffffff984cb220>] ? insert_kthread_work+0x40/0x40
 [<ffffffff98bb9f77>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff984cb220>] ? insert_kthread_work+0x40/0x40
Generated at Sat Feb 10 02:42:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.