Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.8.0
-
3.10.0-327.28.2.1chaos.ch6.x86_64
-
3
-
9223372036854775807
Description
Ran an lfsck namespace with -C and got the following LBUG on multiple MDTs.
2016-09-22 10:04:23 [493341.943717] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed: 2016-09-22 10:04:23 [493341.958848] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) LBUG 2016-09-22 10:04:23 [493341.968781] Pid: 127771, comm: lfsck
Have the following call stack on two MDTs.
2016-09-22 10:03:52 Sep 22 10:03:52 [493315.464373] Kernel panic - not syncing: LBUG 2016-09-22 10:03:52 jet6 kernel: [49[493315.470430] CPU: 2 PID: 111809 Comm: lfsck Tainted: P OE ------------ 3.10.0-327.28.2.1chaos.ch6.x86_64 #1 2016-09-22 10:03:52 3315.297027] Lus[493315.484175] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 2016-09-22 10:03:52 treError: 111809[493315.497715] ffffffffa079be0f 0000000055805053 ffff882757e4fc78 ffffffff8164cae7 2016-09-22 10:03:52 :0:(lfsck_namesp[493315.507701] ffff882757e4fcf8 ffffffff81645adf ffffffff00000008 ffff882757e4fd08 2016-09-22 10:03:52 ace.c:4452:lfsck[493315.517684] ffff882757e4fca8 0000000055805053 ffffffffa1070e70 0000000000000246 2016-09-22 10:03:52 _namespace_doubl[493315.527666] Call Trace: 2016-09-22 10:03:52 e_scan()) ASSERT[493315.532060] [<ffffffff8164cae7>] dump_stack+0x19/0x1b 2016-09-22 10:03:52 ION( list_empty([493315.539478] [<ffffffff81645adf>] panic+0xd8/0x1e7 2016-09-22 10:03:52 &lad->lad_req_li[493315.546501] [<ffffffffa077fdeb>] lbug_with_loc+0xab/0xc0 [libcfs] 2016-09-22 10:03:52 st) ) failed: 2016-09-22 10:03:52 [493315.555082] [<ffffffffa102c2a6>] lfsck_namespace_double_scan+0x106/0x140 [lfsck] 2016-09-22 10:03:52 Sep 22 10:03:52 [493315.565122] [<ffffffffa10234f9>] lfsck_double_scan+0x59/0x200 [lfsck] 2016-09-22 10:03:52 jet6 kernel: [49[493315.574086] [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs] 2016-09-22 10:03:52 3315.311863] Lus[493315.583931] [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs] 2016-09-22 10:03:52 treError: 111809[493315.593765] [<ffffffff811c8bad>] ? kfree+0x12d/0x170 2016-09-22 10:03:52 :0:(lfsck_namesp[493315.601075] [<ffffffffa1028044>] lfsck_master_engine+0x434/0x1310 [lfsck] 2016-09-22 10:03:52 ace.c:4452:lfsck[493315.610415] [<ffffffff81015588>] ? __switch_to+0xf8/0x4d0 2016-09-22 10:03:52 _namespace_doubl[493315.618212] [<ffffffff810bd4f0>] ? wake_up_state+0x20/0x20 2016-09-22 10:03:52 e_scan()) LBUG 2016-09-22 10:03:52 [493315.626108] [<ffffffffa1027c10>] ? lfsck_master_oit_engine+0x1430/0x1430 [lfsck] 2016-09-22 10:03:52 [493315.636145] [<ffffffff810a99bf>] kthread+0xcf/0xe0 2016-09-22 10:03:52 [493315.642238] [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140 2016-09-22 10:03:52 [493315.650187] [<ffffffff8165d9d8>] ret_from_fork+0x58/0x90 2016-09-22 10:03:52 [493315.656864] [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140 2016-09-22 10:03:52 [493315.711916] drm_kms_helper: panic occurred, switching back to text console 2016-09-22 10:03:52 [493315.720378] ------------[ cut here ]------------ 2016-09-22 10:03:52 [493315.726202] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() 2016-09-22 10:03:52 [493315.735902] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) nfsv3 iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl kvm mlx5_ib pcspkr mlx5_core sb_edac lpc_ich edac_core mfd_core mei_me mei zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) ses enclosure ipmi_devintf spl(OE) zlib_deflate sg i2c_i801 ioatdma shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl ip_tables auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache dm_round_robin sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul mgag200 crc32c_intel syscopyarea sysfillrect sysimgblt ghash_clmulni_intel i2c_algo_bit drm_kms_helper mxm_wmi ttm aesni_intel ixgbe lrw gf128mul ahci drm dca glue_helper mpt3sas libahci ptp i2c_core ablk_helper cryptd libata raid_class pps_core scsi_transport_sas mdio wmi sunrpc dm_mirror dm_region_hash dm_log scsi_transport_iscsi dm_multipath dm_mod 2016-09-22 10:03:52 [493315.859970] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P OE ------------ 3.10.0-327.28.2.1chaos.ch6.x86_64 #1 2016-09-22 10:03:52 [493315.872734] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 2016-09-22 10:03:53 [493315.885407] 0000000000000000 bcf7d7e5812e0014 ffff883f7e683d78 ffffffff8164cae7 2016-09-22 10:03:53 [493315.894536] ffff883f7e683db0 ffffffff8107d6d0 0000000000000000 ffff883f7e6967c0 2016-09-22 10:03:53 [493315.903668] 000000011d5cacb8 ffff883f7e6167c0 0000000000000002 ffff883f7e683dc0 2016-09-22 10:03:53 [493315.912796] Call Trace: 2016-09-22 10:03:53 [493315.916347] <IRQ> [<ffffffff8164cae7>] dump_stack+0x19/0x1b 2016-09-22 10:03:53 [493315.923621] [<ffffffff8107d6d0>] warn_slowpath_common+0x70/0xb0 2016-09-22 10:03:53 [493315.931168] [<ffffffff8107d81a>] warn_slowpath_null+0x1a/0x20 2016-09-22 10:03:53 [493315.938512] [<ffffffff81048fdf>] native_smp_send_reschedule+0x5f/0x70 2016-09-22 10:03:53 [493315.946646] [<ffffffff810cb04d>] trigger_load_balance+0x18d/0x250 2016-09-22 10:03:53 [493315.954390] [<ffffffff810bbdd3>] scheduler_tick+0x103/0x150 2016-09-22 10:03:53 [493315.961553] [<ffffffff810e5800>] ? tick_sched_handle.isra.14+0x60/0x60 2016-09-22 10:03:53 [493315.969775] [<ffffffff81091a06>] update_process_times+0x66/0x80 2016-09-22 10:03:53 [493315.977304] [<ffffffff810e57c5>] tick_sched_handle.isra.14+0x25/0x60 2016-09-22 10:03:53 [493315.985310] [<ffffffff810e5841>] tick_sched_timer+0x41/0x70 2016-09-22 10:03:53 [493315.992432] [<ffffffff810adeda>] __hrtimer_run_queues+0xea/0x2c0 2016-09-22 10:03:53 [493316.000042] [<ffffffff810ae4e0>] hrtimer_interrupt+0xb0/0x1e0 2016-09-22 10:03:53 [493316.007351] [<ffffffff8104be47>] local_apic_timer_interrupt+0x37/0x60 2016-09-22 10:03:53 [493316.015442] [<ffffffff8166000f>] smp_apic_timer_interrupt+0x3f/0x60 2016-09-22 10:03:53 [493316.023338] [<ffffffff8165e6dd>] apic_timer_interrupt+0x6d/0x80 2016-09-22 10:03:53 [493316.030848] <EOI> [<ffffffff810dd69c>] ? ktime_get+0x4c/0xd0 2016-09-22 10:03:53 [493316.038194] [<ffffffff810b8da6>] ? finish_task_switch+0x56/0x180 2016-09-22 10:03:53 [493316.045803] [<ffffffff81651df0>] __schedule+0x2e0/0x940 2016-09-22 10:03:53 [493316.052533] [<ffffffff81653709>] schedule_preempt_disabled+0x39/0x90 2016-09-22 10:03:53 [493316.060533] [<ffffffff810db1f4>] cpu_startup_entry+0x184/0x2d0 2016-09-22 10:03:53 [493316.067949] [<ffffffff81049eea>] start_secondary+0x1ca/0x240 2016-09-22 10:03:53 [493316.075162] ---[ end trace 28897805122ddeee ]---
Filesystem info:
16 MDS, 4 OSS, running ZFS 0.7.0-0.3llnl and lustre 2.8.0 on a RHEL 7.2 based operating system (3.10.0-327.28.2.1chaos.ch6.x86_64).
Also worth noting, once we have a directory with files that exhibit this "bad address" error, the directory cannot be removed.
Let me know if you need more info.