Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15800

Fallocate causes transaction deadlock

Details

    • 3
    • 9223372036854775807

    Description

      PID: 74368  TASK: ffff9600eaeac740  CPU: 9   COMMAND: "ll_ost_io02_069"
       #0 [ffffa3f1a7a57830] __schedule at ffffffff9034e1d4
       #1 [ffffa3f1a7a578c8] schedule at ffffffff9034e648
       #2 [ffffa3f1a7a578d8] rwsem_down_read_slowpath at ffffffff903511d0
       #3 [ffffa3f1a7a57978] osd_read_lock at ffffffffc1a3379d [osd_ldiskfs]
                                      <--     rc = dt_trans_start_local(env, ofd->ofd_osd , th);
                                              ofd_read_lock(env, ofd_obj);
       #4 [ffffa3f1a7a57998] ofd_write_attr_set at ffffffffc186b6cc [ofd]
       #5 [ffffa3f1a7a57a00] ofd_commitrw_write at ffffffffc186c812 [ofd]
       #6 [ffffa3f1a7a57aa0] ofd_commitrw at ffffffffc18721f1 [ofd]
       #7 [ffffa3f1a7a57b60] finish_wait at ffffffff8fb2e5ac
       #8 [ffffa3f1a7a57bd8] tgt_brw_write at ffffffffc1255544 [ptlrpc]
      
      PID: 73559  TASK: ffff9601653a97c0  CPU: 11  COMMAND: "ll_ost02_046"
       #0 [ffffa3f1a0817970] __schedule at ffffffff9034e1d4
       #1 [ffffa3f1a0817a08] schedule at ffffffff9034e648
       #2 [ffffa3f1a0817a18] wait_transaction_locked at ffffffffc0ad2089 [jbd2]
       #3 [ffffa3f1a0817a68] add_transaction_credits at ffffffffc0ad21c4 [jbd2]
       #4 [ffffa3f1a0817ac0] start_this_handle at ffffffffc0ad250a [jbd2]
       #5 [ffffa3f1a0817b40] jbd2__journal_restart at ffffffffc0ad2ad0 [jbd2]
       #6 [ffffa3f1a0817b80] osd_fallocate_preallocate at ffffffffc1a5b6d2 [osd_ldiskfs]
       #7 [ffffa3f1a0817c18] osd_fallocate at ffffffffc1a5b98d [osd_ldiskfs]
                              <--     ofd_trans_start(env, ofd, fo, th);
                                      ofd_write_lock(env, fo);
       #8 [ffffa3f1a0817c50] ofd_object_fallocate at ffffffffc18682f9 [ofd]
       #9 [ffffa3f1a0817cb8] ofd_fallocate_hdl at ffffffffc185912f [ofd]
      #10 [ffffa3f1a0817d50] tgt_request_handle at ffffffffc1256a53 [ptlrpc]

      The deadlock was added by :

       Commit:         93f700ca241a98630fc5ff19a041e35fbdbf0385
       Author:         Arshad Hussain <arshad.super@gmail.com>
       Committer:      Oleg Drokin <green@whamcloud.com>
       Author Date:    Thu 10 Sep 2020 02:18:13 AM EEST
       Committer Date: Thu 29 Oct 2020 06:28:42 AM EET
      
       LU-13765 osd-ldiskfs: Extend credit correctly for fallocate
      

      Attachments

        Issue Links

          Activity

            [LU-15800] Fallocate causes transaction deadlock

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51702/
            Subject: LU-15800 ofd: take a read lock for fallocate
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 8299b3fd77ebcc372b5d929eaa08231fc703c431

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51702/ Subject: LU-15800 ofd: take a read lock for fallocate Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 8299b3fd77ebcc372b5d929eaa08231fc703c431

            We also hit this OSS deadlock with Lustre 2.15.3 yesterday. The backtraces seem to match:

            PID: 61557  TASK: ffff919ef006a100  CPU: 39  COMMAND: "ll_ost_io00_078"
             #0 [ffff919cea263778] __schedule at ffffffff92db78d8
             #1 [ffff919cea2637e0] schedule at ffffffff92db7ca9
             #2 [ffff919cea2637f0] rwsem_down_read_failed at ffffffff92db9705
             #3 [ffff919cea263878] call_rwsem_down_read_failed at ffffffff929ae568
             #4 [ffff919cea2638c8] down_read at ffffffff92db7120
             #5 [ffff919cea2638e0] osd_read_lock at ffffffffc16d4e7c [osd_ldiskfs]
             #6 [ffff919cea263908] ofd_write_attr_set at ffffffffc1863129 [ofd]
             #7 [ffff919cea263978] ofd_commitrw_write at ffffffffc1863fd2 [ofd]
             #8 [ffff919cea263a30] ofd_commitrw at ffffffffc18698e0 [ofd]
             #9 [ffff919cea263ac0] tgt_brw_write at ffffffffc140c695 [ptlrpc]
            #10 [ffff919cea263ca8] tgt_request_handle at ffffffffc140f25f [ptlrpc]
            #11 [ffff919cea263d38] ptlrpc_server_handle_request at ffffffffc13b8aa3 [ptlrpc]
            #12 [ffff919cea263df0] ptlrpc_main at ffffffffc13ba734 [ptlrpc]
            #13 [ffff919cea263ec8] kthread at ffffffff926cb621
            #14 [ffff919cea263f50] ret_from_fork_nospec_begin at ffffffff92dc51dd
            
            
            PID: 40363  TASK: ffff915f2d945280  CPU: 10  COMMAND: "ll_ost00_123"
             #0 [ffff9159d062f8f0] __schedule at ffffffff92db78d8
             #1 [ffff9159d062f958] schedule at ffffffff92db7ca9
             #2 [ffff9159d062f968] wait_transaction_locked at ffffffffc03ca085 [jbd2]
             #3 [ffff9159d062f9c0] add_transaction_credits at ffffffffc03ca378 [jbd2]
             #4 [ffff9159d062fa20] start_this_handle at ffffffffc03ca601 [jbd2]
             #5 [ffff9159d062fab8] jbd2__journal_restart at ffffffffc03cacf2 [jbd2]
             #6 [ffff9159d062faf8] jbd2_journal_restart at ffffffffc03cad63 [jbd2]
             #7 [ffff9159d062fb08] osd_extend_restart_trans at ffffffffc1700d8c [osd_ldiskfs]
             #8 [ffff9159d062fb28] osd_fallocate at ffffffffc1702dc4 [osd_ldiskfs]
             #9 [ffff9159d062fbb0] ofd_object_fallocate at ffffffffc185fb4f [ofd]
            #10 [ffff9159d062fc18] ofd_fallocate_hdl at ffffffffc1848835 [ofd]
            #11 [ffff9159d062fca8] tgt_request_handle at ffffffffc140f25f [ptlrpc]
            #12 [ffff9159d062fd38] ptlrpc_server_handle_request at ffffffffc13b8aa3 [ptlrpc]
            #13 [ffff9159d062fdf0] ptlrpc_main at ffffffffc13ba734 [ptlrpc]
            #14 [ffff9159d062fec8] kthread at ffffffff926cb621
            #15 [ffff9159d062ff50] ret_from_fork_nospec_begin at ffffffff92dc51dd
            

            We will try the proposed patch (thanks!).

            sthiell Stephane Thiell added a comment - We also hit this OSS deadlock with Lustre 2.15.3 yesterday. The backtraces seem to match: PID: 61557 TASK: ffff919ef006a100 CPU: 39 COMMAND: "ll_ost_io00_078" #0 [ffff919cea263778] __schedule at ffffffff92db78d8 #1 [ffff919cea2637e0] schedule at ffffffff92db7ca9 #2 [ffff919cea2637f0] rwsem_down_read_failed at ffffffff92db9705 #3 [ffff919cea263878] call_rwsem_down_read_failed at ffffffff929ae568 #4 [ffff919cea2638c8] down_read at ffffffff92db7120 #5 [ffff919cea2638e0] osd_read_lock at ffffffffc16d4e7c [osd_ldiskfs] #6 [ffff919cea263908] ofd_write_attr_set at ffffffffc1863129 [ofd] #7 [ffff919cea263978] ofd_commitrw_write at ffffffffc1863fd2 [ofd] #8 [ffff919cea263a30] ofd_commitrw at ffffffffc18698e0 [ofd] #9 [ffff919cea263ac0] tgt_brw_write at ffffffffc140c695 [ptlrpc] #10 [ffff919cea263ca8] tgt_request_handle at ffffffffc140f25f [ptlrpc] #11 [ffff919cea263d38] ptlrpc_server_handle_request at ffffffffc13b8aa3 [ptlrpc] #12 [ffff919cea263df0] ptlrpc_main at ffffffffc13ba734 [ptlrpc] #13 [ffff919cea263ec8] kthread at ffffffff926cb621 #14 [ffff919cea263f50] ret_from_fork_nospec_begin at ffffffff92dc51dd PID: 40363 TASK: ffff915f2d945280 CPU: 10 COMMAND: "ll_ost00_123" #0 [ffff9159d062f8f0] __schedule at ffffffff92db78d8 #1 [ffff9159d062f958] schedule at ffffffff92db7ca9 #2 [ffff9159d062f968] wait_transaction_locked at ffffffffc03ca085 [jbd2] #3 [ffff9159d062f9c0] add_transaction_credits at ffffffffc03ca378 [jbd2] #4 [ffff9159d062fa20] start_this_handle at ffffffffc03ca601 [jbd2] #5 [ffff9159d062fab8] jbd2__journal_restart at ffffffffc03cacf2 [jbd2] #6 [ffff9159d062faf8] jbd2_journal_restart at ffffffffc03cad63 [jbd2] #7 [ffff9159d062fb08] osd_extend_restart_trans at ffffffffc1700d8c [osd_ldiskfs] #8 [ffff9159d062fb28] osd_fallocate at ffffffffc1702dc4 [osd_ldiskfs] #9 [ffff9159d062fbb0] ofd_object_fallocate at ffffffffc185fb4f [ofd] #10 [ffff9159d062fc18] ofd_fallocate_hdl at ffffffffc1848835 [ofd] #11 [ffff9159d062fca8] tgt_request_handle at ffffffffc140f25f [ptlrpc] #12 [ffff9159d062fd38] ptlrpc_server_handle_request at ffffffffc13b8aa3 [ptlrpc] #13 [ffff9159d062fdf0] ptlrpc_main at ffffffffc13ba734 [ptlrpc] #14 [ffff9159d062fec8] kthread at ffffffff926cb621 #15 [ffff9159d062ff50] ret_from_fork_nospec_begin at ffffffff92dc51dd We will try the proposed patch (thanks!).

            "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51702
            Subject: LU-15800 ofd: take a read lock for fallocate
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: a29082fa9985ce97d3e02b8c1009161e54f11f9a

            gerrit Gerrit Updater added a comment - "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51702 Subject: LU-15800 ofd: take a read lock for fallocate Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: a29082fa9985ce97d3e02b8c1009161e54f11f9a
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47268/
            Subject: LU-15800 ofd: take a read lock for fallocate
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5fae80066162ea637c8649f6439fc14e1d9a7cf8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47268/ Subject: LU-15800 ofd: take a read lock for fallocate Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5fae80066162ea637c8649f6439fc14e1d9a7cf8

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47268
            Subject: LU-15800 ofd: take a read lock for fallocate
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ce637a38fd863f07c1e9a35f9a7c0731d858c23e

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47268 Subject: LU-15800 ofd: take a read lock for fallocate Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ce637a38fd863f07c1e9a35f9a7c0731d858c23e
            pjones Peter Jones added a comment -

            Given that this issue existed in 2.14, I think that it should be ok to descope it from 2.15.0 and include in a future 2.15.x maintenance release.

            pjones Peter Jones added a comment - Given that this issue existed in 2.14, I think that it should be ok to descope it from 2.15.0 and include in a future 2.15.x maintenance release.

            People

              arshad512 Arshad Hussain
              askulysh Andriy Skulysh
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: