Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.0, Lustre 2.12.2, Lustre 2.12.4, Lustre 2.12.5
-
3
-
9223372036854775807
Description
tag-2.11.55
MDS crash
[15650.670434] device-mapper: multipath: Failing path 8:96.^M [15650.765276] BUG: unable to handle kernel NULL pointer dereference at (null)^M [15650.775741] IP: [< (null)>] (null)^M [15650.783081] PGD 0 ^M [15650.786948] Oops: 0010 [#1] SMP ^M [15650.792218] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) dm_round_robin zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt joydev pcspkr ipmi_ssif iTCO_vendor_support sg ipmi_si ipmi_devintf shpchp ipmi_msghandler i2c_i801 mei_me ioatdma mei lpc_ich wmi dm_multipath dm_mod auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb isci ahci ptp mlx4_core(OE) mpt3sas libsas libahci pps_core dca crct10dif_pclmul devlink i2c_algo_bit crct10dif_common raid_class crc32c_intel libata i2c_core mlx_compat(OE) scsi_transport_sas^M [15650.934649] CPU: 14 PID: 9491 Comm: mdt_rdpg01_008 Tainted: P OE ------------ 3.10.0-862.9.1.el7_lustre.x86_64 #1^M [15650.952002] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M [15650.966961] task: ffff8be6a1253f40 ti: ffff8be68a044000 task.ti: ffff8be68a044000^M [15650.977791] RIP: 0010:[<0000000000000000>] [< (null)>] (null)^M [15650.988693] RSP: 0018:ffff8be68a047b58 EFLAGS: 00010246^M [15650.997167] RAX: 0000000000000000 RBX: ffff8be68b820000 RCX: 0000000000000002^M [15651.007733] RDX: ffffffffc164c7b0 RSI: ffff8be68a047b60 RDI: ffff8be68b820008^M [15651.018326] RBP: ffff8be68a047b98 R08: 0000000000000004 R09: 0000000000000000^M [15651.028930] R10: 0000000000000001 R11: 00000000007fffff R12: ffff8be26f9fab00^M [15651.039547] R13: ffff8be279a448a0 R14: ffff8be68a160000 R15: ffff8be68b820008^M [15651.050168] FS: 0000000000000000(0000) GS:ffff8be6ad980000(0000) knlGS:0000000000000000^M [15651.061880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [15651.070980] CR2: 0000000000000000 CR3: 000000042c3b6000 CR4: 00000000000607e0^M [15651.081660] Call Trace:^M [15651.087091] [<ffffffffc164ac3e>] ? osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs]^M [15651.098058] [<ffffffffc164ae17>] osd_it_ea_load+0x37/0x100 [osd_ldiskfs]^M [15651.108370] [<ffffffffc188eb47>] lod_it_load+0x27/0x90 [lod]^M [15651.117554] [<ffffffffc0f48808>] dt_index_walk+0xf8/0x430 [obdclass]^M [15651.127457] [<ffffffffc1915080>] ? mdd_object_lock+0xe0/0xe0 [mdd]^M [15651.137132] [<ffffffffc1916d9f>] mdd_readpage+0x25f/0x5a0 [mdd]^M [15651.146553] [<ffffffffc1782bda>] mdt_readpage+0x63a/0x880 [mdt]^M [15651.155992] [<ffffffffc11e82ca>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M [15651.166379] [<ffffffffc11c02e1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]^M [15651.177493] [<ffffffffc0dfcbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]^M [15651.188033] [<ffffffffc118b48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M [15651.199251] [<ffffffffc1188315>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]^M [15651.209399] [<ffffffff83ccf682>] ? default_wake_function+0x12/0x20^M [15651.218931] [<ffffffff83cc52ab>] ? __wake_up_common+0x5b/0x90^M [15651.228026] [<ffffffffc118ecc4>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]^M [15651.237575] [<ffffffffc118e1b0>] ? ptlrpc_register_service+0xe90/0xe90 [ptlrpc]^M [15651.248365] [<ffffffff83cbb621>] kthread+0xd1/0xe0^M [15651.256344] [<ffffffff83cbb550>] ? insert_kthread_work+0x40/0x40^M [15651.265688] [<ffffffff843205f7>] ret_from_fork_nospec_begin+0x21/0x21^M [15651.275475] [<ffffffff83cbb550>] ? insert_kthread_work+0x40/0x40^M [15651.284736] Code: Bad RIP value.^M [15651.290946] RIP [< (null)>] (null)^M [15651.299236] RSP <ffff8be68a047b58>^M [15651.305543] CR2: 0000000000000000^M [15651.315778] ---[ end trace 4ae4238c00f9aeec ]---^M [15651.336386] Kernel panic - not syncing: Fatal exception^M [15651.344613] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)^M [15651.369289] ------------[ cut here ]------------^M [15651.376397] WARNING: CPU: 14 PID: 9491 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x65/0x70^M [15651.388915] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) dm_round_robin zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt joydev pcspkr ipmi_ssif iTCO_vendor_support sg ipmi_si ipmi_devintf shpchp ipmi_msghandler i2c_i801 mei_me ioatdma mei lpc_ich wmi dm_multipath dm_mod auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb isci ahci ptp mlx4_core(OE) mpt3sas libsas libahci pps_core dca crct10dif_pclmul devlink i2c_algo_bit crct10dif_common raid_class crc32c_intel libata i2c_core mlx_compat(OE) scsi_transport_sas^M [15651.529620] CPU: 14 PID: 9491 Comm: mdt_rdpg01_008 Tainted: P D OE ------------ 3.10.0-862.9.1.el7_lustre.x86_64 #1^M [15651.546472] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M [15651.561156] Call Trace:^M [15651.566023] <IRQ> [<ffffffff8430e84e>] dump_stack+0x19/0x1b^M [15651.574646] [<ffffffff83c91e18>] __warn+0xd8/0x100^M [15651.582224] [<ffffffff83c91f5d>] warn_slowpath_null+0x1d/0x20^M [15651.590851] [<ffffffff83c54e95>] native_smp_send_reschedule+0x65/0x70^M [15651.600279] [<ffffffff83cddf81>] trigger_load_balance+0x191/0x280^M [15651.609280] [<ffffffff83ccdc0a>] scheduler_tick+0x10a/0x150^M [15651.617702] [<ffffffff83d01c10>] ? tick_sched_do_timer+0x50/0x50^M [15651.626619] [<ffffffff83ca4f65>] update_process_times+0x65/0x80^M [15651.635416] [<ffffffff83d01a10>] tick_sched_handle+0x30/0x70^M [15651.643916] [<ffffffff83d01c49>] tick_sched_timer+0x39/0x80^M [15651.652315] [<ffffffff83cbf7e6>] __hrtimer_run_queues+0xd6/0x260^M [15651.661210] [<ffffffff83cbfd7f>] hrtimer_interrupt+0xaf/0x1d0^M [15651.669814] [<ffffffff83c5847b>] local_apic_timer_interrupt+0x3b/0x60^M [15651.679184] [<ffffffff84325063>] smp_apic_timer_interrupt+0x43/0x60^M [15651.688352] [<ffffffff843217b2>] apic_timer_interrupt+0x162/0x170^M [15651.697316] <EOI> [<ffffffff84308c3d>] ? panic+0x1d5/0x21f^M [15651.705715] [<ffffffff84308ba1>] ? panic+0x139/0x21f^M [15651.713430] [<ffffffff84318745>] oops_end+0xc5/0xe0^M [15651.721020] [<ffffffff8430807e>] no_context+0x285/0x2a8^M [15651.728984] [<ffffffff84308115>] __bad_area_nosemaphore+0x74/0x1d1^M [15651.738014] [<ffffffff84308286>] bad_area_nosemaphore+0x14/0x16^M [15651.746760] [<ffffffff8431b6e0>] __do_page_fault+0x330/0x4f0^M [15651.755199] [<ffffffff8431b8d5>] do_page_fault+0x35/0x90^M [15651.763264] [<ffffffff84317758>] page_fault+0x28/0x30^M [15651.771013] [<ffffffffc164c7b0>] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs]^M [15651.781105] [<ffffffffc164ac3e>] ? osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs]^M [15651.791402] [<ffffffffc164ae17>] osd_it_ea_load+0x37/0x100 [osd_ldiskfs]^M [15651.801028] [<ffffffffc188eb47>] lod_it_load+0x27/0x90 [lod]^M [15651.809517] [<ffffffffc0f48808>] dt_index_walk+0xf8/0x430 [obdclass]^M [15651.818761] [<ffffffffc1915080>] ? mdd_object_lock+0xe0/0xe0 [mdd]^M [15651.827808] [<ffffffffc1916d9f>] mdd_readpage+0x25f/0x5a0 [mdd]^M [15651.836533] [<ffffffffc1782bda>] mdt_readpage+0x63a/0x880 [mdt]^M [15651.845269] [<ffffffffc11e82ca>] tgt_request_handle+0xaea/0x1580 [ptlrpc]^M [15651.854937] [<ffffffffc11c02e1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]^M [15651.865302] [<ffffffffc0dfcbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]^M [15651.875084] [<ffffffffc118b48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]^M [15651.885522] [<ffffffffc1188315>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]^M [15651.894881] [<ffffffff83ccf682>] ? default_wake_function+0x12/0x20^M [15651.903621] [<ffffffff83cc52ab>] ? __wake_up_common+0x5b/0x90^M [15651.911880] [<ffffffffc118ecc4>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]^M [15651.920585] [<ffffffffc118e1b0>] ? ptlrpc_register_service+0xe90/0xe90 [ptlrpc]^M [15651.930460] [<ffffffff83cbb621>] kthread+0xd1/0xe0^M [15651.937468] [<ffffffff83cbb550>] ? insert_kthread_work+0x40/0x40^M [15651.945794] [<ffffffff843205f7>] ret_from_fork_nospec_begin+0x21/0x21^M [15651.954564] [<ffffffff83cbb550>] ? insert_kthread_work+0x40/0x40^M [15651.962806] ---[ end trace 4ae4238c00f9aeed ]---^M