Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8647

lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3.10.0-327.28.2.1chaos.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      Ran an lfsck namespace with -C and got the following LBUG on multiple MDTs.

      2016-09-22 10:04:23 [493341.943717] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed: 
      2016-09-22 10:04:23 [493341.958848] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) LBUG
      2016-09-22 10:04:23 [493341.968781] Pid: 127771, comm: lfsck
      

      Have the following call stack on two MDTs.

      2016-09-22 10:03:52 Sep 22 10:03:52 [493315.464373] Kernel panic - not syncing: LBUG
      2016-09-22 10:03:52 jet6 kernel: [49[493315.470430] CPU: 2 PID: 111809 Comm: lfsck Tainted: P           OE  ------------   3.10.0-327.28.2.1chaos.ch6.x86_64 #1
      2016-09-22 10:03:52 3315.297027] Lus[493315.484175] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-09-22 10:03:52 treError: 111809[493315.497715]  ffffffffa079be0f 0000000055805053 ffff882757e4fc78 ffffffff8164cae7
      2016-09-22 10:03:52 :0:(lfsck_namesp[493315.507701]  ffff882757e4fcf8 ffffffff81645adf ffffffff00000008 ffff882757e4fd08
      2016-09-22 10:03:52 ace.c:4452:lfsck[493315.517684]  ffff882757e4fca8 0000000055805053 ffffffffa1070e70 0000000000000246
      2016-09-22 10:03:52 _namespace_doubl[493315.527666] Call Trace:
      2016-09-22 10:03:52 e_scan()) ASSERT[493315.532060]  [<ffffffff8164cae7>] dump_stack+0x19/0x1b
      2016-09-22 10:03:52 ION( list_empty([493315.539478]  [<ffffffff81645adf>] panic+0xd8/0x1e7
      2016-09-22 10:03:52 &lad->lad_req_li[493315.546501]  [<ffffffffa077fdeb>] lbug_with_loc+0xab/0xc0 [libcfs]
      2016-09-22 10:03:52 st) ) failed: 
      2016-09-22 10:03:52 [493315.555082]  [<ffffffffa102c2a6>] lfsck_namespace_double_scan+0x106/0x140 [lfsck]
      2016-09-22 10:03:52 Sep 22 10:03:52 [493315.565122]  [<ffffffffa10234f9>] lfsck_double_scan+0x59/0x200 [lfsck]
      2016-09-22 10:03:52 jet6 kernel: [49[493315.574086]  [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs]
      2016-09-22 10:03:52 3315.311863] Lus[493315.583931]  [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs]
      2016-09-22 10:03:52 treError: 111809[493315.593765]  [<ffffffff811c8bad>] ? kfree+0x12d/0x170
      2016-09-22 10:03:52 :0:(lfsck_namesp[493315.601075]  [<ffffffffa1028044>] lfsck_master_engine+0x434/0x1310 [lfsck]
      2016-09-22 10:03:52 ace.c:4452:lfsck[493315.610415]  [<ffffffff81015588>] ? __switch_to+0xf8/0x4d0
      2016-09-22 10:03:52 _namespace_doubl[493315.618212]  [<ffffffff810bd4f0>] ? wake_up_state+0x20/0x20
      2016-09-22 10:03:52 e_scan()) LBUG
      2016-09-22 10:03:52 [493315.626108]  [<ffffffffa1027c10>] ? lfsck_master_oit_engine+0x1430/0x1430 [lfsck]
      2016-09-22 10:03:52 [493315.636145]  [<ffffffff810a99bf>] kthread+0xcf/0xe0
      2016-09-22 10:03:52 [493315.642238]  [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140
      2016-09-22 10:03:52 [493315.650187]  [<ffffffff8165d9d8>] ret_from_fork+0x58/0x90
      2016-09-22 10:03:52 [493315.656864]  [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140
      2016-09-22 10:03:52 [493315.711916] drm_kms_helper: panic occurred, switching back to text console
      2016-09-22 10:03:52 [493315.720378] ------------[ cut here ]------------
      2016-09-22 10:03:52 [493315.726202] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70()
      2016-09-22 10:03:52 [493315.735902] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) nfsv3 iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl kvm mlx5_ib pcspkr mlx5_core sb_edac lpc_ich edac_core mfd_core mei_me mei zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) ses enclosure ipmi_devintf spl(OE) zlib_deflate sg i2c_i801 ioatdma shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl ip_tables auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache dm_round_robin sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul mgag200 crc32c_intel syscopyarea sysfillrect sysimgblt ghash_clmulni_intel i2c_algo_bit drm_kms_helper mxm_wmi ttm aesni_intel ixgbe lrw gf128mul ahci drm dca glue_helper mpt3sas libahci ptp i2c_core ablk_helper cryptd libata raid_class pps_core scsi_transport_sas mdio wmi sunrpc dm_mirror dm_region_hash dm_log scsi_transport_iscsi dm_multipath dm_mod
      2016-09-22 10:03:52 [493315.859970] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           OE  ------------   3.10.0-327.28.2.1chaos.ch6.x86_64 #1
      2016-09-22 10:03:52 [493315.872734] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-09-22 10:03:53 [493315.885407]  0000000000000000 bcf7d7e5812e0014 ffff883f7e683d78 ffffffff8164cae7
      2016-09-22 10:03:53 [493315.894536]  ffff883f7e683db0 ffffffff8107d6d0 0000000000000000 ffff883f7e6967c0
      2016-09-22 10:03:53 [493315.903668]  000000011d5cacb8 ffff883f7e6167c0 0000000000000002 ffff883f7e683dc0
      2016-09-22 10:03:53 [493315.912796] Call Trace:
      2016-09-22 10:03:53 [493315.916347]  <IRQ>  [<ffffffff8164cae7>] dump_stack+0x19/0x1b
      2016-09-22 10:03:53 [493315.923621]  [<ffffffff8107d6d0>] warn_slowpath_common+0x70/0xb0
      2016-09-22 10:03:53 [493315.931168]  [<ffffffff8107d81a>] warn_slowpath_null+0x1a/0x20
      2016-09-22 10:03:53 [493315.938512]  [<ffffffff81048fdf>] native_smp_send_reschedule+0x5f/0x70
      2016-09-22 10:03:53 [493315.946646]  [<ffffffff810cb04d>] trigger_load_balance+0x18d/0x250
      2016-09-22 10:03:53 [493315.954390]  [<ffffffff810bbdd3>] scheduler_tick+0x103/0x150
      2016-09-22 10:03:53 [493315.961553]  [<ffffffff810e5800>] ? tick_sched_handle.isra.14+0x60/0x60
      2016-09-22 10:03:53 [493315.969775]  [<ffffffff81091a06>] update_process_times+0x66/0x80
      2016-09-22 10:03:53 [493315.977304]  [<ffffffff810e57c5>] tick_sched_handle.isra.14+0x25/0x60
      2016-09-22 10:03:53 [493315.985310]  [<ffffffff810e5841>] tick_sched_timer+0x41/0x70
      2016-09-22 10:03:53 [493315.992432]  [<ffffffff810adeda>] __hrtimer_run_queues+0xea/0x2c0
      2016-09-22 10:03:53 [493316.000042]  [<ffffffff810ae4e0>] hrtimer_interrupt+0xb0/0x1e0
      2016-09-22 10:03:53 [493316.007351]  [<ffffffff8104be47>] local_apic_timer_interrupt+0x37/0x60
      2016-09-22 10:03:53 [493316.015442]  [<ffffffff8166000f>] smp_apic_timer_interrupt+0x3f/0x60
      2016-09-22 10:03:53 [493316.023338]  [<ffffffff8165e6dd>] apic_timer_interrupt+0x6d/0x80
      2016-09-22 10:03:53 [493316.030848]  <EOI>  [<ffffffff810dd69c>] ? ktime_get+0x4c/0xd0
      2016-09-22 10:03:53 [493316.038194]  [<ffffffff810b8da6>] ? finish_task_switch+0x56/0x180
      2016-09-22 10:03:53 [493316.045803]  [<ffffffff81651df0>] __schedule+0x2e0/0x940
      2016-09-22 10:03:53 [493316.052533]  [<ffffffff81653709>] schedule_preempt_disabled+0x39/0x90
      2016-09-22 10:03:53 [493316.060533]  [<ffffffff810db1f4>] cpu_startup_entry+0x184/0x2d0
      2016-09-22 10:03:53 [493316.067949]  [<ffffffff81049eea>] start_secondary+0x1ca/0x240
      2016-09-22 10:03:53 [493316.075162] ---[ end trace 28897805122ddeee ]---
      

      Filesystem info:
      16 MDS, 4 OSS, running ZFS 0.7.0-0.3llnl and lustre 2.8.0 on a RHEL 7.2 based operating system (3.10.0-327.28.2.1chaos.ch6.x86_64).

      Also worth noting, once we have a directory with files that exhibit this "bad address" error, the directory cannot be removed.

      Let me know if you need more info.

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              dinatale2 Giuseppe Di Natale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: