Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9488

soft lockup in osd_inode_iteration()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • Soak test cluster
    • 3
    • 9223372036854775807

    Description

      Restarted testing with latest master, 3577
      Competed two failovers with lfsck turned off.
      Restarted with lfsck turned on
      soak-5 (OSS) completed failover:

      2017-05-09 20:08:27
      ,327:fsmgmt.fsmgmt:INFO     oss_failover completed, running lfsck
      

      MDS reported a single error:

      May  9 20:08:39 soak-8 kernel: LustreError: 5550:0:(lfsck_lib.c:2680:lfsck_load_one_trace_file()) soaked-MDT0000-osd: unlink lfsck sub trace file lfsck_namespace_00: rc = 0
      

      Before soak hits timeout, MDS has wedged:

      May  9 20:12:41 soak-8 kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [OI_scrub:5551]
      May  9 20:12:41 soak-8 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd dm_round_robin ipmi_ssif sb_edac ipmi_devintf ntb sg iTCO_wdt ioatdma shpchp edac_core mei_me iTCO_vendor_support mei lpc_ich ipmi_si pcspkr i2c_i801
      May  9 20:12:41 soak-8 kernel: ipmi_msghandler wmi nfsd dm_multipath dm_mod nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci igb ttm ahci crct10dif_pclmul crct10dif_common ptp libsas crc32c_intel libahci pps_core drm mlx4_core mpt2sas libata dca raid_class i2c_algo_bit scsi_transport_sas devlink i2c_core fjes
      May  9 20:12:41 soak-8 kernel: CPU: 6 PID: 5551 Comm: OI_scrub Tainted: P           OE  ------------   3.10.0-514.16.1.el7_lustre.x86_64 #1
      May  9 20:12:41 soak-8 kernel: Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      May  9 20:12:41 soak-8 kernel: task: ffff88083fde6dd0 ti: ffff880703600000 task.ti: ffff880703600000
      May  9 20:12:41 soak-8 kernel: RIP: 0010:[<ffffffffa121d1d9>]  [<ffffffffa121d1d9>] osd_inode_iteration+0x489/0xcc0 [osd_ldiskfs]
      May  9 20:12:41 soak-8 kernel: RSP: 0018:ffff880703603d18  EFLAGS: 00000293
      May  9 20:12:41 soak-8 kernel: RAX: 0000000000000004 RBX: 0000000023f30a01 RCX: 0000000000000000
      May  9 20:12:41 soak-8 kernel: RDX: ffff880703603d78 RSI: ffff8800b2a36000 RDI: ffff8803162f6000
      May  9 20:12:41 soak-8 kernel: RBP: ffff880703603df0 R08: ffff880703603d57 R09: 0000000000000004
      May  9 20:12:41 soak-8 kernel: R10: 0000000023f30a01 R11: ffffea000c8fcc00 R12: 0000000023f30a01
      May  9 20:12:41 soak-8 kernel: R13: ffff880703603d08 R14: 0000000023f30a01 R15: ffff880703603d08
      May  9 20:12:41 soak-8 kernel: FS:  0000000000000000(0000) GS:ffff88042e180000(0000) knlGS:0000000000000000
      May  9 20:12:41 soak-8 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      May  9 20:12:41 soak-8 kernel: CR2: 00007f64d55202e0 CR3: 00000000019be000 CR4: 00000000000407e0
      May  9 20:12:41 soak-8 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      May  9 20:12:41 soak-8 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      May  9 20:12:41 soak-8 kernel: Stack:
      May  9 20:12:41 soak-8 kernel: ffffffffa121b990 ffffffffa1217a20 ffff8800b2a36000 00000000810d354f
      May  9 20:12:41 soak-8 kernel: ffff8803162f6000 ffff8800b2a37468 0000000020000000 010000000000000c
      May  9 20:12:41 soak-8 kernel: 0000000000000000 0000000000000000 ffff8800b2a36000 0000000000000000
      May  9 20:12:41 soak-8 kernel: Call Trace:
      May  9 20:12:41 soak-8 kernel: [<ffffffffa121b990>] ? osd_ios_ROOT_scan+0x300/0x300 [osd_ldiskfs]
      May  9 20:12:41 soak-8 kernel: [<ffffffffa1217a20>] ? osd_preload_next+0xb0/0xb0 [osd_ldiskfs]
      May  9 20:12:41 soak-8 kernel: [<ffffffffa121e370>] osd_scrub_main+0x960/0xf30 [osd_ldiskfs]
      May  9 20:12:41 soak-8 kernel: [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20
      May  9 20:12:41 soak-8 kernel: [<ffffffffa121da10>] ? osd_inode_iteration+0xcc0/0xcc0 [osd_ldiskfs]
      May  9 20:12:41 soak-8 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
      May  9 20:12:41 soak-8 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      May  9 20:12:41 soak-8 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
      May  9 20:12:41 soak-8 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
      May  9 20:12:41 soak-8 kernel: Code: 00 e8 7c eb 97 ff e9 0f fc ff ff 0f 1f 80 00 00 00 00 45 89 e9 4c 8d 85 67 ff ff ff 48 8b 4d a8 48 8d 55 88 48 8b b5 38 ff ff ff <48> 8b bd 48 ff ff ff 48 8b 85 28 ff ff ff ff d0 85 c0 41 89 c5
      

      lfsck hit 600 second timeout, abort attempted:

      2017-05-09 20:18:46,982:fsmgmt.fsmgmt:ERROR    lfsck still running after 600s, aborting
      2017-05-09 20:18:46,983:fsmgmt.fsmgmt:INFO     executing cmd: lctl lfsck_stop -M soaked-MDT0000
      

      One the MDS hits the lockup, it hits it over and over and within minutes is doing nothing other that hitting the lockup.
      At this point, decided not to wait for the crash, dumped stacks, them forced a crash dump.
      Crash dump is available on soak

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: