Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13072

High OSS load due to possible deadlock w/ ofd_create_hdl and ofd_quotactl backtraces

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.12.3
    • None
    • lustre-2.12.3_4_g142b4d4-1.el7.x86_64 3.10.0-957.27.2.el7_lustre.pl2.x86_64 MOFED 4.7
    • 3
    • 9223372036854775807

    Description

      We found an OSS on Fir (2.12.3_4) heavily loaded and with threads hung this morning. This started during last night. We took a crash dump that I uploaded to the FTP as vmcore_fir-io8-s1_2019-12-12-07-42-58 along with kernel-debuginfo (kernel-debuginfo-3.10.0-957.27.2.el7_lustre.pl2.x86_64.rpm and kernel-debuginfo-common-x86_64-3.10.0-957.27.2.el7_lustre.pl2.x86_64.rpm).

      The last messages on the console before the crash dump were:

      [264018.187097] LNet: Service thread pid 33911 was inactive for 1200.94s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [264018.204204] LNet: Skipped 1 previous similar message
      [264018.209269] Pid: 33911, comm: ll_ost03_106 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
      [264018.219804] Call Trace:
      [264018.222354]  [<ffffffffc155deb3>] ofd_create_hdl+0xcb3/0x20e0 [ofd]
      [264018.228754]  [<ffffffffc0f5036a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [264018.235831]  [<ffffffffc0ef724b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [264018.243637]  [<ffffffffc0efabac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
      [264018.250067]  [<ffffffffa7ac2e81>] kthread+0xd1/0xe0
      [264018.255066]  [<ffffffffa8177c24>] ret_from_fork_nospec_begin+0xe/0x21
      [264018.261642]  [<ffffffffffffffff>] 0xffffffffffffffff
      [264018.266754] LustreError: dumping log to /tmp/lustre-log.1576165038.33911
      [264126.221225] LNet: Service thread pid 34617 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [264126.238268] Pid: 34617, comm: ll_ost03_111 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
      [264126.248785] Call Trace:
      [264126.251360]  [<ffffffffc02c3085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [264126.258376]  [<ffffffffc02c3368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [264126.265544]  [<ffffffffc02c35e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [264126.272211]  [<ffffffffc02c3a93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [264126.278951]  [<ffffffffc131b309>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [264126.286491]  [<ffffffffc1312ad3>] ldiskfs_acquire_dquot+0x53/0xb0 [ldiskfs]
      [264126.293573]  [<ffffffffa7cb026a>] dqget+0x3fa/0x450
      [264126.298588]  [<ffffffffa7cb1074>] dquot_get_dqblk+0x14/0x1f0
      [264126.304371]  [<ffffffffc143c6d5>] osd_acct_index_lookup+0x235/0x480 [osd_ldiskfs]
      [264126.311997]  [<ffffffffc13a380d>] lquotactl_slv+0x27d/0x9d0 [lquota]
      [264126.318488]  [<ffffffffc1556a3c>] ofd_quotactl+0x13c/0x380 [ofd]
      [264126.324630]  [<ffffffffc0f5036a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [264126.331681]  [<ffffffffc0ef724b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [264126.339496]  [<ffffffffc0efabac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
      [264126.345912]  [<ffffffffa7ac2e81>] kthread+0xd1/0xe0
      [264126.350926]  [<ffffffffa8177c24>] ret_from_fork_nospec_begin+0xe/0x21
      [264126.357490]  [<ffffffffffffffff>] 0xffffffffffffffff
      [264126.362604] LustreError: dumping log to /tmp/lustre-log.1576165146.34617
      [264136.461417] Pid: 34429, comm: ll_ost03_110 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
      [264136.471940] Call Trace:
      [264136.474507]  [<ffffffffc02c3085>] wait_transaction_locked+0x85/0xd0 [jbd2]
      [264136.481514]  [<ffffffffc02c3368>] add_transaction_credits+0x268/0x2f0 [jbd2]
      [264136.488699]  [<ffffffffc02c35e1>] start_this_handle+0x1a1/0x430 [jbd2]
      [264136.495351]  [<ffffffffc02c3a93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
      [264136.502103]  [<ffffffffc131b309>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
      [264136.509629]  [<ffffffffc1312ad3>] ldiskfs_acquire_dquot+0x53/0xb0 [ldiskfs]
      [264136.516725]  [<ffffffffa7cb026a>] dqget+0x3fa/0x450
      [264136.521728]  [<ffffffffa7cb1074>] dquot_get_dqblk+0x14/0x1f0
      [264136.527514]  [<ffffffffc143c6d5>] osd_acct_index_lookup+0x235/0x480 [osd_ldiskfs]
      [264136.535127]  [<ffffffffc13a380d>] lquotactl_slv+0x27d/0x9d0 [lquota]
      [264136.541618]  [<ffffffffc1556a3c>] ofd_quotactl+0x13c/0x380 [ofd]
      [264136.547746]  [<ffffffffc0f5036a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [264136.554806]  [<ffffffffc0ef724b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [264136.562610]  [<ffffffffc0efabac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
      [264136.569039]  [<ffffffffa7ac2e81>] kthread+0xd1/0xe0
      [264136.574042]  [<ffffffffa8177c24>] ret_from_fork_nospec_begin+0xe/0x21
      [264136.580632]  [<ffffffffffffffff>] 0xffffffffffffffff
      [264136.585737] LustreError: dumping log to /tmp/lustre-log.1576165156.34429
      [264349.590511] SysRq : Trigger a crash
      

      Attaching:

      We restarted the OSS and a few OSTs couldn't complete their recovery (stuck at 0s) but a manual abort_recovery did work.

      Thanks,
      Stephane

      Attachments

        Activity

          [LU-13072] High OSS load due to possible deadlock w/ ofd_create_hdl and ofd_quotactl backtraces
          ys Yang Sheng added a comment -

          Please reopen it while hit again.

          ys Yang Sheng added a comment - Please reopen it while hit again.

          Hi YangSheng,

          Thanks, next time I'll try to collect disk I/O info, that's a great idea indeed. But so far, this OST has been running fine since this event.

          Best,

          Stephane

          sthiell Stephane Thiell added a comment - Hi YangSheng, Thanks, next time I'll try to collect disk I/O info, that's a great idea indeed. But so far, this OST has been running fine since this event. Best, Stephane
          ys Yang Sheng added a comment -

          Hi, Stephane,

          I have investigated all of thread carefully. I cannot find any clue for deadlock. I suggest you can collect disk io info(like iostat) while this issue be hit again. Then compare it with normal data(collect without this issue) to determine whether it is a storage problem.

          Thanks,
          YangSheng

          ys Yang Sheng added a comment - Hi, Stephane, I have investigated all of thread carefully. I cannot find any clue for deadlock. I suggest you can collect disk io info(like iostat) while this issue be hit again. Then compare it with normal data(collect without this issue) to determine whether it is a storage problem. Thanks, YangSheng

          Hi YangSheng,

          Thank you very much for your time on this, I really appreciate it.  After the reboot, all OSTs came back online and everything was fine on all targets, even on /dev/md2. I wonder if this could be some kind of mdraid deadlock of some sort, or perhaps a very transient slowdown of a raid6 array.

          Stephane

          sthiell Stephane Thiell added a comment - Hi YangSheng, Thank you very much for your time on this, I really appreciate it.  After the reboot, all OSTs came back online and everything was fine on all targets, even on /dev/md2. I wonder if this could be some kind of mdraid deadlock of some sort, or perhaps a very transient slowdown of a raid6 array. Stephane
          ys Yang Sheng added a comment -

          Hi, Stephane,

          After investigating, I think this issue should relate to some storage issue. The all of ofd_create_hdl threads were block by wait_transaction_locked. The ofd_quotactl also same blocked on wait_transaction_locked. They are waiting to start a new trans. The transaction was block by

          PID: 67875  TASK: ffff891241b65140  CPU: 9   COMMAND: "ll_ost_io01_019"
           #0 [ffff891241e2f8e8] __schedule at ffffffffa816aa72
           #1 [ffff891241e2f978] schedule at ffffffffa816af19
           #2 [ffff891241e2f988] osd_trans_stop at ffffffffc140fe75 [osd_ldiskfs]
           #3 [ffff891241e2fa40] ofd_trans_stop at ffffffffc1569c75 [ofd]
           #4 [ffff891241e2fa50] ofd_commitrw_write at ffffffffc1570c34 [ofd]
           #5 [ffff891241e2fad8] ofd_commitrw at ffffffffc157510c [ofd]
           #6 [ffff891241e2fb58] tgt_brw_write at ffffffffc0f5460b [ptlrpc]
           #7 [ffff891241e2fcd0] tgt_request_handle at ffffffffc0f5036a [ptlrpc]
           #8 [ffff891241e2fd58] ptlrpc_server_handle_request at ffffffffc0ef724b [ptlrpc]
           #9 [ffff891241e2fdf8] ptlrpc_main at ffffffffc0efabac [ptlrpc]
          #10 [ffff891241e2fec8] kthread at ffffffffa7ac2e81
          #11 [ffff891241e2ff50] ret_from_fork_nospec_begin at ffffffffa8177c24
          

          The thread was waiting some data to write out. It is waiting for a few threads like:

          PID: 67874  TASK: ffff891241b64100  CPU: 42  COMMAND: "ll_ost_io02_019"
           #0 [ffff891241e2b4a8] __schedule at ffffffffa816aa72
           #1 [ffff891241e2b538] schedule at ffffffffa816af19
           #2 [ffff891241e2b548] bitmap_startwrite at ffffffffa7f9cc65
           #3 [ffff891241e2b5c8] add_stripe_bio at ffffffffc1383471 [raid456]
           #4 [ffff891241e2b628] raid5_make_request at ffffffffc1388db4 [raid456]
           #5 [ffff891241e2b710] md_handle_request at ffffffffa7f8fa30
           #6 [ffff891241e2b780] md_make_request at ffffffffa7f8fb99
           #7 [ffff891241e2b7a8] generic_make_request at ffffffffa7d46e37
           #8 [ffff891241e2b800] submit_bio at ffffffffa7d470e0
           #9 [ffff891241e2b858] osd_submit_bio at ffffffffc14255ac [osd_ldiskfs]
          #10 [ffff891241e2b868] osd_do_bio at ffffffffc1427925 [osd_ldiskfs]
          #11 [ffff891241e2b9d0] osd_write_commit at ffffffffc142820c [osd_ldiskfs]
          #12 [ffff891241e2ba50] ofd_commitrw_write at ffffffffc157121e [ofd]
          #13 [ffff891241e2bad8] ofd_commitrw at ffffffffc157510c [ofd]
          #14 [ffff891241e2bb58] tgt_brw_write at ffffffffc0f5460b [ptlrpc]
          #15 [ffff891241e2bcd0] tgt_request_handle at ffffffffc0f5036a [ptlrpc]
          #16 [ffff891241e2bd58] ptlrpc_server_handle_request at ffffffffc0ef724b [ptlrpc]
          #17 [ffff891241e2bdf8] ptlrpc_main at ffffffffc0efabac [ptlrpc]
          #18 [ffff891241e2bec8] kthread at ffffffffa7ac2e81
          #19 [ffff891241e2bf50] ret_from_fork_nospec_begin at ffffffffa8177c24
          

          They were running on /dev/md2 and looks like this is a software raid6 device. I suggest you should investigate write performance for this device to measure whether it is slowly than other.

          Thanks,
          YangSheng

          ys Yang Sheng added a comment - Hi, Stephane, After investigating, I think this issue should relate to some storage issue. The all of ofd_create_hdl threads were block by wait_transaction_locked. The ofd_quotactl also same blocked on wait_transaction_locked. They are waiting to start a new trans. The transaction was block by PID: 67875 TASK: ffff891241b65140 CPU: 9 COMMAND: "ll_ost_io01_019" #0 [ffff891241e2f8e8] __schedule at ffffffffa816aa72 #1 [ffff891241e2f978] schedule at ffffffffa816af19 #2 [ffff891241e2f988] osd_trans_stop at ffffffffc140fe75 [osd_ldiskfs] #3 [ffff891241e2fa40] ofd_trans_stop at ffffffffc1569c75 [ofd] #4 [ffff891241e2fa50] ofd_commitrw_write at ffffffffc1570c34 [ofd] #5 [ffff891241e2fad8] ofd_commitrw at ffffffffc157510c [ofd] #6 [ffff891241e2fb58] tgt_brw_write at ffffffffc0f5460b [ptlrpc] #7 [ffff891241e2fcd0] tgt_request_handle at ffffffffc0f5036a [ptlrpc] #8 [ffff891241e2fd58] ptlrpc_server_handle_request at ffffffffc0ef724b [ptlrpc] #9 [ffff891241e2fdf8] ptlrpc_main at ffffffffc0efabac [ptlrpc] #10 [ffff891241e2fec8] kthread at ffffffffa7ac2e81 #11 [ffff891241e2ff50] ret_from_fork_nospec_begin at ffffffffa8177c24 The thread was waiting some data to write out. It is waiting for a few threads like: PID: 67874 TASK: ffff891241b64100 CPU: 42 COMMAND: "ll_ost_io02_019" #0 [ffff891241e2b4a8] __schedule at ffffffffa816aa72 #1 [ffff891241e2b538] schedule at ffffffffa816af19 #2 [ffff891241e2b548] bitmap_startwrite at ffffffffa7f9cc65 #3 [ffff891241e2b5c8] add_stripe_bio at ffffffffc1383471 [raid456] #4 [ffff891241e2b628] raid5_make_request at ffffffffc1388db4 [raid456] #5 [ffff891241e2b710] md_handle_request at ffffffffa7f8fa30 #6 [ffff891241e2b780] md_make_request at ffffffffa7f8fb99 #7 [ffff891241e2b7a8] generic_make_request at ffffffffa7d46e37 #8 [ffff891241e2b800] submit_bio at ffffffffa7d470e0 #9 [ffff891241e2b858] osd_submit_bio at ffffffffc14255ac [osd_ldiskfs] #10 [ffff891241e2b868] osd_do_bio at ffffffffc1427925 [osd_ldiskfs] #11 [ffff891241e2b9d0] osd_write_commit at ffffffffc142820c [osd_ldiskfs] #12 [ffff891241e2ba50] ofd_commitrw_write at ffffffffc157121e [ofd] #13 [ffff891241e2bad8] ofd_commitrw at ffffffffc157510c [ofd] #14 [ffff891241e2bb58] tgt_brw_write at ffffffffc0f5460b [ptlrpc] #15 [ffff891241e2bcd0] tgt_request_handle at ffffffffc0f5036a [ptlrpc] #16 [ffff891241e2bd58] ptlrpc_server_handle_request at ffffffffc0ef724b [ptlrpc] #17 [ffff891241e2bdf8] ptlrpc_main at ffffffffc0efabac [ptlrpc] #18 [ffff891241e2bec8] kthread at ffffffffa7ac2e81 #19 [ffff891241e2bf50] ret_from_fork_nospec_begin at ffffffffa8177c24 They were running on /dev/md2 and looks like this is a software raid6 device. I suggest you should investigate write performance for this device to measure whether it is slowly than other. Thanks, YangSheng
          green Oleg Drokin added a comment -
          PID: 66168  TASK: ffff8912f5c68000  CPU: 0   COMMAND: "jbd2/md2-8"
           #0 [ffff8912f34cfc08] __schedule at ffffffffa816aa72
           #1 [ffff8912f34cfc98] schedule at ffffffffa816af19
           #2 [ffff8912f34cfca8] jbd2_journal_commit_transaction at ffffffffc02c62bc [jbd2]
           #3 [ffff8912f34cfe48] kjournald2 at ffffffffc02cce89 [jbd2]
           #4 [ffff8912f34cfec8] kthread at ffffffffa7ac2e81
           #5 [ffff8912f34cff50] ret_from_fork_nospec_begin at ffffffffa8177c24
          

          So jbd is commiting a transaction and it takes a long time so nobody else can start a new transaction for a long time. I also see many other threads waiting on IO including debugfs that I presume reads from the same device? but also many ost io threads. If there's a lot of IO contention it's somewhat understandable that IO is slow including slow transaction commits that impacts things like starting new transactions.

          green Oleg Drokin added a comment - PID: 66168 TASK: ffff8912f5c68000 CPU: 0 COMMAND: "jbd2/md2-8" #0 [ffff8912f34cfc08] __schedule at ffffffffa816aa72 #1 [ffff8912f34cfc98] schedule at ffffffffa816af19 #2 [ffff8912f34cfca8] jbd2_journal_commit_transaction at ffffffffc02c62bc [jbd2] #3 [ffff8912f34cfe48] kjournald2 at ffffffffc02cce89 [jbd2] #4 [ffff8912f34cfec8] kthread at ffffffffa7ac2e81 #5 [ffff8912f34cff50] ret_from_fork_nospec_begin at ffffffffa8177c24 So jbd is commiting a transaction and it takes a long time so nobody else can start a new transaction for a long time. I also see many other threads waiting on IO including debugfs that I presume reads from the same device? but also many ost io threads. If there's a lot of IO contention it's somewhat understandable that IO is slow including slow transaction commits that impacts things like starting new transactions.
          pjones Peter Jones added a comment -

          Yang Sheng

          Could you please advise

          Thanks

          Peter

          pjones Peter Jones added a comment - Yang Sheng Could you please advise Thanks Peter

          People

            ys Yang Sheng
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: