Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8647

lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3.10.0-327.28.2.1chaos.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      Ran an lfsck namespace with -C and got the following LBUG on multiple MDTs.

      2016-09-22 10:04:23 [493341.943717] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed: 
      2016-09-22 10:04:23 [493341.958848] LustreError: 127771:0:(lfsck_namespace.c:4452:lfsck_namespace_double_scan()) LBUG
      2016-09-22 10:04:23 [493341.968781] Pid: 127771, comm: lfsck
      

      Have the following call stack on two MDTs.

      2016-09-22 10:03:52 Sep 22 10:03:52 [493315.464373] Kernel panic - not syncing: LBUG
      2016-09-22 10:03:52 jet6 kernel: [49[493315.470430] CPU: 2 PID: 111809 Comm: lfsck Tainted: P           OE  ------------   3.10.0-327.28.2.1chaos.ch6.x86_64 #1
      2016-09-22 10:03:52 3315.297027] Lus[493315.484175] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-09-22 10:03:52 treError: 111809[493315.497715]  ffffffffa079be0f 0000000055805053 ffff882757e4fc78 ffffffff8164cae7
      2016-09-22 10:03:52 :0:(lfsck_namesp[493315.507701]  ffff882757e4fcf8 ffffffff81645adf ffffffff00000008 ffff882757e4fd08
      2016-09-22 10:03:52 ace.c:4452:lfsck[493315.517684]  ffff882757e4fca8 0000000055805053 ffffffffa1070e70 0000000000000246
      2016-09-22 10:03:52 _namespace_doubl[493315.527666] Call Trace:
      2016-09-22 10:03:52 e_scan()) ASSERT[493315.532060]  [<ffffffff8164cae7>] dump_stack+0x19/0x1b
      2016-09-22 10:03:52 ION( list_empty([493315.539478]  [<ffffffff81645adf>] panic+0xd8/0x1e7
      2016-09-22 10:03:52 &lad->lad_req_li[493315.546501]  [<ffffffffa077fdeb>] lbug_with_loc+0xab/0xc0 [libcfs]
      2016-09-22 10:03:52 st) ) failed: 
      2016-09-22 10:03:52 [493315.555082]  [<ffffffffa102c2a6>] lfsck_namespace_double_scan+0x106/0x140 [lfsck]
      2016-09-22 10:03:52 Sep 22 10:03:52 [493315.565122]  [<ffffffffa10234f9>] lfsck_double_scan+0x59/0x200 [lfsck]
      2016-09-22 10:03:52 jet6 kernel: [49[493315.574086]  [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs]
      2016-09-22 10:03:52 3315.311863] Lus[493315.583931]  [<ffffffffa0d88fc4>] ? osd_zfs_otable_it_fini+0x64/0x110 [osd_zfs]
      2016-09-22 10:03:52 treError: 111809[493315.593765]  [<ffffffff811c8bad>] ? kfree+0x12d/0x170
      2016-09-22 10:03:52 :0:(lfsck_namesp[493315.601075]  [<ffffffffa1028044>] lfsck_master_engine+0x434/0x1310 [lfsck]
      2016-09-22 10:03:52 ace.c:4452:lfsck[493315.610415]  [<ffffffff81015588>] ? __switch_to+0xf8/0x4d0
      2016-09-22 10:03:52 _namespace_doubl[493315.618212]  [<ffffffff810bd4f0>] ? wake_up_state+0x20/0x20
      2016-09-22 10:03:52 e_scan()) LBUG
      2016-09-22 10:03:52 [493315.626108]  [<ffffffffa1027c10>] ? lfsck_master_oit_engine+0x1430/0x1430 [lfsck]
      2016-09-22 10:03:52 [493315.636145]  [<ffffffff810a99bf>] kthread+0xcf/0xe0
      2016-09-22 10:03:52 [493315.642238]  [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140
      2016-09-22 10:03:52 [493315.650187]  [<ffffffff8165d9d8>] ret_from_fork+0x58/0x90
      2016-09-22 10:03:52 [493315.656864]  [<ffffffff810a98f0>] ? kthread_create_on_node+0x140/0x140
      2016-09-22 10:03:52 [493315.711916] drm_kms_helper: panic occurred, switching back to text console
      2016-09-22 10:03:52 [493315.720378] ------------[ cut here ]------------
      2016-09-22 10:03:52 [493315.726202] WARNING: at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70()
      2016-09-22 10:03:52 [493315.735902] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) rpcsec_gss_krb5 ko2iblnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) nfsv3 iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl kvm mlx5_ib pcspkr mlx5_core sb_edac lpc_ich edac_core mfd_core mei_me mei zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) ses enclosure ipmi_devintf spl(OE) zlib_deflate sg i2c_i801 ioatdma shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfsd nfs_acl ip_tables auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache dm_round_robin sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul mgag200 crc32c_intel syscopyarea sysfillrect sysimgblt ghash_clmulni_intel i2c_algo_bit drm_kms_helper mxm_wmi ttm aesni_intel ixgbe lrw gf128mul ahci drm dca glue_helper mpt3sas libahci ptp i2c_core ablk_helper cryptd libata raid_class pps_core scsi_transport_sas mdio wmi sunrpc dm_mirror dm_region_hash dm_log scsi_transport_iscsi dm_multipath dm_mod
      2016-09-22 10:03:52 [493315.859970] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           OE  ------------   3.10.0-327.28.2.1chaos.ch6.x86_64 #1
      2016-09-22 10:03:52 [493315.872734] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      2016-09-22 10:03:53 [493315.885407]  0000000000000000 bcf7d7e5812e0014 ffff883f7e683d78 ffffffff8164cae7
      2016-09-22 10:03:53 [493315.894536]  ffff883f7e683db0 ffffffff8107d6d0 0000000000000000 ffff883f7e6967c0
      2016-09-22 10:03:53 [493315.903668]  000000011d5cacb8 ffff883f7e6167c0 0000000000000002 ffff883f7e683dc0
      2016-09-22 10:03:53 [493315.912796] Call Trace:
      2016-09-22 10:03:53 [493315.916347]  <IRQ>  [<ffffffff8164cae7>] dump_stack+0x19/0x1b
      2016-09-22 10:03:53 [493315.923621]  [<ffffffff8107d6d0>] warn_slowpath_common+0x70/0xb0
      2016-09-22 10:03:53 [493315.931168]  [<ffffffff8107d81a>] warn_slowpath_null+0x1a/0x20
      2016-09-22 10:03:53 [493315.938512]  [<ffffffff81048fdf>] native_smp_send_reschedule+0x5f/0x70
      2016-09-22 10:03:53 [493315.946646]  [<ffffffff810cb04d>] trigger_load_balance+0x18d/0x250
      2016-09-22 10:03:53 [493315.954390]  [<ffffffff810bbdd3>] scheduler_tick+0x103/0x150
      2016-09-22 10:03:53 [493315.961553]  [<ffffffff810e5800>] ? tick_sched_handle.isra.14+0x60/0x60
      2016-09-22 10:03:53 [493315.969775]  [<ffffffff81091a06>] update_process_times+0x66/0x80
      2016-09-22 10:03:53 [493315.977304]  [<ffffffff810e57c5>] tick_sched_handle.isra.14+0x25/0x60
      2016-09-22 10:03:53 [493315.985310]  [<ffffffff810e5841>] tick_sched_timer+0x41/0x70
      2016-09-22 10:03:53 [493315.992432]  [<ffffffff810adeda>] __hrtimer_run_queues+0xea/0x2c0
      2016-09-22 10:03:53 [493316.000042]  [<ffffffff810ae4e0>] hrtimer_interrupt+0xb0/0x1e0
      2016-09-22 10:03:53 [493316.007351]  [<ffffffff8104be47>] local_apic_timer_interrupt+0x37/0x60
      2016-09-22 10:03:53 [493316.015442]  [<ffffffff8166000f>] smp_apic_timer_interrupt+0x3f/0x60
      2016-09-22 10:03:53 [493316.023338]  [<ffffffff8165e6dd>] apic_timer_interrupt+0x6d/0x80
      2016-09-22 10:03:53 [493316.030848]  <EOI>  [<ffffffff810dd69c>] ? ktime_get+0x4c/0xd0
      2016-09-22 10:03:53 [493316.038194]  [<ffffffff810b8da6>] ? finish_task_switch+0x56/0x180
      2016-09-22 10:03:53 [493316.045803]  [<ffffffff81651df0>] __schedule+0x2e0/0x940
      2016-09-22 10:03:53 [493316.052533]  [<ffffffff81653709>] schedule_preempt_disabled+0x39/0x90
      2016-09-22 10:03:53 [493316.060533]  [<ffffffff810db1f4>] cpu_startup_entry+0x184/0x2d0
      2016-09-22 10:03:53 [493316.067949]  [<ffffffff81049eea>] start_secondary+0x1ca/0x240
      2016-09-22 10:03:53 [493316.075162] ---[ end trace 28897805122ddeee ]---
      

      Filesystem info:
      16 MDS, 4 OSS, running ZFS 0.7.0-0.3llnl and lustre 2.8.0 on a RHEL 7.2 based operating system (3.10.0-327.28.2.1chaos.ch6.x86_64).

      Also worth noting, once we have a directory with files that exhibit this "bad address" error, the directory cannot be removed.

      Let me know if you need more info.

      Attachments

        Issue Links

          Activity

            [LU-8647] lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed
            cliffw Cliff White (Inactive) made changes -
            Link New: This issue is related to LU-10134 [ LU-10134 ]
            mdiep Minh Diep made changes -
            Link Original: This issue is related to JFC-17 [ JFC-17 ]
            mdiep Minh Diep made changes -
            Link Original: This issue is related to LDEV-341 [ LDEV-341 ]
            jamesanunez James Nunez (Inactive) made changes -
            Link New: This issue is related to LU-8863 [ LU-8863 ]
            ofaaland Olaf Faaland made changes -
            Labels Original: llnl topllnl New: llnl
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-10 [ JFC-10 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-19 [ JFC-19 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-17 [ JFC-17 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-341 [ LDEV-341 ]
            yong.fan nasf (Inactive) made changes -
            Fix Version/s New: Lustre 2.9.0 [ 11891 ]
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

            People

              yong.fan nasf (Inactive)
              dinatale2 Giuseppe Di Natale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: