Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.2
-
None
-
CentOS 7.4
-
3
-
9223372036854775807
Description
Never seen before but already twice with the latest 2.10.2 version: the MGS is crashing when stopping:
[77225.855547] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [77225.864304] IP: [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc] [77225.872614] PGD 0 [77225.874864] Oops: 0000 [#1] SMP [77225.878480] Modules linked in: mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat uas usb_storage mpt2sas mptctl mptbase dell_rbu rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core intel_powerclamp dm_service_time coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper iTCO_wdt ablk_helper iTCO_vendor_support cryptd dm_round_robin pcspkr mxm_wmi dcdbas sg ipmi_si ipmi_devintf ipmi_msghandler mei_me mei lpc_ich shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc dm_mod ip_tables ext4 mbcache [77225.957975] jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core drm tg3 ahci libahci crct10dif_pclmul mpt3sas crct10dif_common raid_class ptp crc32c_intel libata megaraid_sas i2c_core devlink scsi_transport_sas pps_core [77225.986851] CPU: 23 PID: 27105 Comm: ldlm_bl_14 Tainted: G OE ------------ 3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1 [77225.999661] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.3.4 11/08/2016 [77226.008010] task: ffff88103323cf10 ti: ffff881012ab8000 task.ti: ffff881012ab8000 [77226.016359] RIP: 0010:[<ffffffffc0ba48ce>] [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc] [77226.027355] RSP: 0018:ffff881012abbbe0 EFLAGS: 00010287 [77226.033280] RAX: 0000000000000000 RBX: ffff881011f7d400 RCX: ffff881012abbc7c [77226.041240] RDX: 0000000000000002 RSI: ffff881012abbc80 RDI: ffff881011f7d400 [77226.049201] RBP: ffff881012abbc58 R08: ffff881012abbcd0 R09: ffff88103d0d7880 [77226.057162] R10: ffff881011f7d400 R11: 7fffffffffffffff R12: ffff880168287540 [77226.065123] R13: 0000000000000002 R14: ffff881012abbcd0 R15: ffff881011f7d460 [77226.073085] FS: 0000000000000000(0000) GS:ffff88203c8c0000(0000) knlGS:0000000000000000 [77226.082111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [77226.088521] CR2: 000000000000001c CR3: 00000000019f2000 CR4: 00000000001407e0 [77226.096482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [77226.104443] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [77226.112403] Stack: [77226.114642] ffff881012abbc7c ffff881012abbcd0 ffff881012abbc80 0000000000000000 [77226.122930] ffff880168287520 0000001000000001 ffff880100000010 ffff881012abbc18 [77226.131219] ffff881012abbc18 00000000bd734aeb 0000000000000002 ffff880168287540 [77226.139507] Call Trace: [77226.142252] [<ffffffffc0ba4860>] ? ldlm_errno2error+0x60/0x60 [ptlrpc] [77226.149649] [<ffffffffc0b8f9db>] ldlm_reprocess_queue+0x13b/0x2a0 [ptlrpc] [77226.157434] [<ffffffffc0b9057d>] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc] [77226.165220] [<ffffffffc0b90b30>] ldlm_reprocess_res+0x20/0x30 [ptlrpc] [77226.172611] [<ffffffffc0866bef>] cfs_hash_for_each_relax+0x21f/0x400 [libcfs] [77226.180687] [<ffffffffc0b90b10>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc] [77226.188571] [<ffffffffc0b90b10>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc] [77226.196441] [<ffffffffc0869d95>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [77226.204518] [<ffffffffc0b90b7c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [77226.212983] [<ffffffffc0b917bc>] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc] [77226.221162] [<ffffffffc0bbada8>] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc] [77226.228836] [<ffffffff816a8fad>] ? __schedule+0x39d/0x8b0 [77226.234977] [<ffffffffc0bba8e0>] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc] [77226.243232] [<ffffffff810b098f>] kthread+0xcf/0xe0 [77226.248672] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [77226.255472] [<ffffffff816b4f58>] ret_from_fork+0x58/0x90 [77226.261494] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [77226.268292] Code: 89 45 a0 74 0d f6 05 b3 ac cd ff 01 0f 85 34 06 00 00 8b 83 98 00 00 00 39 83 9c 00 00 00 89 45 b8 0f 84 57 09 00 00 48 8b 45 a0 <8b> 40 1c 85 c0 0f 84 7a 09 00 00 48 8b 4d a0 48 89 c8 48 83 c0 [77226.289871] RIP [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc] [77226.298248] RSP <ffff881012abbbe0> [77226.302135] CR2: 000000000000001c
Best,
Stephane
Attachments
Issue Links
- is related to
-
LU-10635 MGS kernel panic when configuring nodemaps and filesets
- Resolved