Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9040

Soft lockup on CPU during lfsck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.9.0, Lustre 2.10.0
    • Soak test cluster. Lustre 2.9.0 GA Release
    • 3
    • 9223372036854775807

    Description

      This looks like a few bugs we've seen before, but not enough like any specific one.

      • Started soak
      • Started and completed router delay fault
      • LFSCK triggered.
      • LFSCK dones not complete, runs > 1905s
        MGS/MDS0 has soft lockup on CPU.
        Jan 19 14:06:24 lola-8 kernel: BUG: soft lockup - CPU#9 stuck for 67s! [OI_scrub:19279]
        Jan 19 14:06:24 lola-8 kernel: Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) joydev sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
        Jan 19 14:06:24 lola-8 kernel: CPU 9
        Jan 19 14:06:24 lola-8 kernel: Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) joydev sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
        Jan 19 14:06:24 lola-8 kernel:
        Jan 19 14:06:24 lola-8 kernel: Pid: 19279, comm: OI_scrub Tainted: P           --L------------    2.6.32-573.26.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
        Jan 19 14:06:24 lola-8 kernel: RIP: 0010:[<ffffffffa10d1e2f>]  [<ffffffffa10d1e2f>] osd_scrub_exec+0xf/0x1dc0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: RSP: 0018:ffff88049472bc20  EFLAGS: 00000282
        Jan 19 14:06:24 lola-8 kernel: RAX: 0000000000000004 RBX: ffff88049472bd00 RCX: ffff8803ac48ced0
        Jan 19 14:06:24 lola-8 kernel: RDX: ffff88049472bda0 RSI: ffff880216078000 RDI: ffff88038a74d000
        Jan 19 14:06:24 lola-8 kernel: RBP: ffffffff8100bc0e R08: ffff88049472bdcf R09: 0000000000000004
        Jan 19 14:06:24 lola-8 kernel: R10: ffff8803ac48cec0 R11: ffff8803ac48ced0 R12: ffffffff81fe6920
        Jan 19 14:06:24 lola-8 kernel: R13: ffff8802a5151210 R14: ffff88038a74d000 R15: ffffffffa08549a8
        Jan 19 14:06:24 lola-8 kernel: FS:  0000000000000000(0000) GS:ffff88044e420000(0000) knlGS:0000000000000000
        Jan 19 14:06:24 lola-8 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
        Jan 19 14:06:24 lola-8 kernel: CR2: 0000003903673b90 CR3: 0000000001a8d000 CR4: 00000000000407e0
        Jan 19 14:06:24 lola-8 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        Jan 19 14:06:24 lola-8 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Jan 19 14:06:24 lola-8 kernel: Process OI_scrub (pid: 19279, threadinfo ffff880494728000, task ffff880802c30ab0)
        Jan 19 14:06:24 lola-8 kernel: Stack:
        Jan 19 14:06:24 lola-8 kernel: ffff880216078000 ffff88038a74d000 ffffffffffffff02 ffffffffa10d3c9a
        Jan 19 14:06:24 lola-8 kernel: <d> 0000000000000010 0000000000000202 ffff88049472bc60 0000000000000018
        Jan 19 14:06:24 lola-8 kernel: <d> ffff8803ac48cee0 0000000000000001 ffff8802160790e0 ffff880216079000
        Jan 19 14:06:24 lola-8 kernel: Call Trace:
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d3c9a>] ? osd_scrub_next+0xba/0x4b0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d6ec9>] ? osd_inode_iteration+0x4c9/0xd80 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
        Jan 19 14:06:24 lola-8 kernel: Code: 00 00 00 e8 54 fe 7b ff e9 14 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 81 ec e0 00 00 00 48 89 5d d8 <4c> 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 0f 1f 44 00 00
        an 19 14:06:24 lola-8 kernel: Code: 00 00 00 e8 54 fe 7b ff e9 14 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 81 ec e0 00 00 00 48 89 5d d8 <4c> 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 0f 1f 44 00 00
        Jan 19 14:06:24 lola-8 kernel: Call Trace:
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d3c9a>] ? osd_scrub_next+0xba/0x4b0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d6ec9>] ? osd_inode_iteration+0x4c9/0xd80 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
        Jan 19 14:06:24 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs]
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
        Jan 19 14:06:24 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
        

        Attempting to get a stack dump, but not going well. System is available is anyone wishes to have a look

      Attachments

        1. vmcore-dmesg.txt
          512 kB
        2. lustre-log.1486672102.6086.txt.gz
          11.50 MB
        3. lu-9040-soak-8.console.log.txt.gz
          218 kB
        4. lola-8.lfsck.hang.stacks.txt
          1.13 MB
        5. lola7.dmesg
          238 kB
        6. lola-7.console.20170208.txt.gz
          6.29 MB
        7. lfsck_namespace.txt
          2 kB

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: