Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.13.0, Lustre 2.12.3
    • Upstream
    • Red Hat 7.7 on VMware
      Red Hat 7.7 on HPE ProLiant DL380 Gen10
      Red Hat 7.7 on HPE Synergy 480 Gen10

    Description

      After successfully creating packages for Red Hat 7.7

      (e.g. lustre-2.12.57_35_g55a7e2d-1.el7.x86_64.rpm)

      I get CPU soft lockups when trying to create an MGS with LDISKFS backend.

      NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [mkfs.lustre:31220]

      More details from log:

      Sep  6 10:41:00 mgs1 kernel: Call Trace:Sep  6 10:41:00 mgs1 kernel: [<ffffffff9bd73365>] queued_spin_lock_slowpath+0xb/0xf
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9bd81ad0>] _raw_spin_lock+0x20/0x30
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b865e2e>] igrab+0x1e/0x60
      Sep  6 10:41:00 mgs1 kernel: [<ffffffffc06bd88b>] ldiskfs_quota_off+0x3b/0x130 [ldiskfs]
      Sep  6 10:41:00 mgs1 kernel: [<ffffffffc06c091d>] ldiskfs_put_super+0x4d/0x400 [ldiskfs]
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b84b13d>] generic_shutdown_super+0x6d/0x100
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b84b5b7>] kill_block_super+0x27/0x70
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b84b91e>] deactivate_locked_super+0x4e/0x70
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b84c0a6>] deactivate_super+0x46/0x60
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b86abff>] cleanup_mnt+0x3f/0x80
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b86ac92>] __cleanup_mnt+0x12/0x20
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b6c1c0b>] task_work_run+0xbb/0xe0
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9b62cc65>] do_notify_resume+0xa5/0xc0
      Sep  6 10:41:00 mgs1 kernel: [<ffffffff9bd8c23b>] int_signal+0x12/0x17
      Sep  6 10:41:00 mgs1 kernel: Code: 47 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 66 90 b9 01 00 00 00 8b 17 85 d2 74 0d 83 fa 03 74 08 f3 90 <8b> 17 85 d2 75 f3 89 d0 f0 0f b1 0f 39 c2 75 e3 5d 66 90 c3 0f

      I also tried to go for an MDS/MGS pair on the DL380 but mkfs.lustre got stuck the same way 

      as seen on VMware.

      Attachments

        Activity

          [LU-12755] CPU soft lockup on mkfs.lustre
          pjones Peter Jones added a comment -

          Great. Thanks for confirming kazinczy

          pjones Peter Jones added a comment - Great. Thanks for confirming kazinczy

          I can confirm that mkfs.lustre for patchless LDISKFS from b2_12 works fine for me now.

          kazinczy Tamas Kazinczy (Inactive) added a comment - I can confirm that mkfs.lustre for patchless LDISKFS from b2_12 works fine for me now.

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36270/
          Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 820e374624a584ec0c0a326ec96bf0abeb50cf40

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36270/ Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 820e374624a584ec0c0a326ec96bf0abeb50cf40
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36203/
          Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: d780f15a2d63c8bde5ae6345aed85b4b44904fb5

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36203/ Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel Project: fs/lustre-release Branch: master Current Patch Set: Commit: d780f15a2d63c8bde5ae6345aed85b4b44904fb5

          Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36270
          Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: e56b4bc0970275ce6413883794bd52d2dfa7b164

          gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36270 Subject: LU-12755 ldiskfs: fix project quota unpon unpatched kernel Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: e56b4bc0970275ce6413883794bd52d2dfa7b164

          Li Xi,

          I walkrounded that issue by set PROJ active flag temporariy, and then call dquot_initialize() to call ext4_get_projid().
          I just tested fixed version, it seems work:

          +	sb_dqopt(sb)->flags |= dquot_state_flag(DQUOT_USAGE_ENABLED, PRJQUOTA);
          +	dquot_initialize(root);
          +	sb_dqopt(sb)->flags &= ~dquot_state_flag(DQUOT_USAGE_ENABLED, PRJQUOTA);
          

          By adding some printk debug, it seems detection works.

          wshilong Wang Shilong (Inactive) added a comment - Li Xi, I walkrounded that issue by set PROJ active flag temporariy, and then call dquot_initialize() to call ext4_get_projid(). I just tested fixed version, it seems work: + sb_dqopt(sb)->flags |= dquot_state_flag(DQUOT_USAGE_ENABLED, PRJQUOTA); + dquot_initialize(root); + sb_dqopt(sb)->flags &= ~dquot_state_flag(DQUOT_USAGE_ENABLED, PRJQUOTA); By adding some printk debug, it seems detection works.
          lixi_wc Li Xi added a comment -

          Shilong, I assume you mean the updated patch https://review.whamcloud.com/#/c/36203/

          Any reason why ext4_get_projid() is called before ext4_enable_quotas()? I thought ext4_enable_quotas() is always called before any one calls ext4_get_projid().

          lixi_wc Li Xi added a comment - Shilong, I assume you mean the updated patch https://review.whamcloud.com/#/c/36203/ Any reason why ext4_get_projid() is called before ext4_enable_quotas()? I thought ext4_enable_quotas() is always called before any one calls ext4_get_projid().
          yujian Jian Yu added a comment -

          Sure, Shilong.

          yujian Jian Yu added a comment - Sure, Shilong.

          The tricky way seems work with limited testing, Yu Jian could you continue some testing on refreshed patch:

          1) Built and installed. ldiskfs on patched kernel and then switch it to unpatched kernel.
          2) Built and installed ldiskfs on unpatched kernel and then switch it to patched kernel.

          To make sure mount and umount Lustre works.

          wshilong Wang Shilong (Inactive) added a comment - The tricky way seems work with limited testing, Yu Jian could you continue some testing on refreshed patch: 1) Built and installed. ldiskfs on patched kernel and then switch it to unpatched kernel. 2) Built and installed ldiskfs on unpatched kernel and then switch it to patched kernel. To make sure mount and umount Lustre works.

          People

            yujian Jian Yu
            kazinczy Tamas Kazinczy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: