Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17851

A long osd_fallocate_preallocate blocks other fs writers

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      A long osd_fallocate_preallocate() call:

      crash> bt 574060    
      PID: 574060   TASK: ffff8cef4b1c4740  CPU: 23   COMMAND: "ll_ost05_010"
       #0 [fffffe000049ee48] crash_nmi_callback at ffffffffbb254863
       #1 [fffffe000049ee50] nmi_handle at ffffffffbb224c83
       #2 [fffffe000049eea8] default_do_nmi at ffffffffbbb41f89
       #3 [fffffe000049eec8] do_nmi at ffffffffbb22518e
       #4 [fffffe000049eef0] end_repeat_nmi at ffffffffbbc015c4
          [exception RIP: ldiskfs_mb_regular_allocator+677]
          RIP: ffffffffc1d25b85  RSP: ffffb789670b38f0  RFLAGS: 00000202
          RAX: ffff8ccd88348428  RBX: 00000000000d7263  RCX: 0000000000000006
          RDX: 0000000000000010  RSI: 00000000002dd3df  RDI: 00000000000035c9
          RBP: 000000000000059d   R8: 00000000000d64fe   R9: ffff8cd28bd2f000
          R10: 0000000000000002  R11: ffff8cd28bd2d800  R12: ffff8cd28bd2f000
          R13: 00000000002dd3df  R14: ffff8cec5f7e56e8  R15: 00000000000d7000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #5 [ffffb789670b38f0] ldiskfs_mb_regular_allocator at ffffffffc1d25b85 [ldiskfs]
       #6 [ffffb789670b3988] ldiskfs_mb_new_blocks at ffffffffc1d26cd2 [ldiskfs]
       #7 [ffffb789670b3a40] ldiskfs_ext_map_blocks at ffffffffc1d5512c [ldiskfs]
       #8 [ffffb789670b3b08] ldiskfs_map_blocks at ffffffffc1d5b26d [ldiskfs]
       #9 [ffffb789670b3b78] osd_fallocate_preallocate at ffffffffc1df4eb8 [osd_ldiskfs]
      #10 [ffffb789670b3c08] osd_fallocate at ffffffffc1df546d [osd_ldiskfs]
      #11 [ffffb789670b3c40] ofd_object_fallocate at ffffffffc1c00326 [ofd]
      #12 [ffffb789670b3cc0] ofd_fallocate_hdl at ffffffffc1bed3bf [ofd]
      #13 [ffffb789670b3d50] tgt_request_handle at ffffffffc165b053 [ptlrpc]
      #14 [ffffb789670b3dd0] ptlrpc_server_handle_request at ffffffffc160aac3 [ptlrpc]
      #15 [ffffb789670b3e38] ptlrpc_main at ffffffffc160c578 [ptlrpc]
      #16 [ffffb789670b3f10] kthread at ffffffffbb3043a6
      #17 [ffffb789670b3f50] ret_from_fork at ffffffffbbc0023f
      crash> 
      

      blocks other fs writers b/c they are want to open transaction handles but the current transaction is kept open by the fallocate threads despite the transaction state is T_LOCKED and the transaction handle was created 8 min ago:

      crash> kmem -s ffff8cd560fdd968
      CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
      ffff8cd3dfc0c1c0       56        978      6424     88     4k  jbd2_journal_handle
        SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
        fffffd008583f740  ffff8cd560fdd000     1     73         26    47
        FREE / [ALLOCATED]
        [ffff8cd560fdd968]
      crash> jbd2_journal_handle
      struct jbd2_journal_handle {
          union {
              transaction_t *h_transaction;
              journal_t *h_journal;
          };
          handle_t *h_rsv_handle;
          int h_total_credits;
          int h_revoke_credits;
          int h_revoke_credits_requested;
          int h_ref;
          int h_err;
          unsigned int h_sync : 1;
          unsigned int h_jdata : 1;
          unsigned int h_reserved : 1;
          unsigned int h_aborted : 1;
          unsigned int h_type : 8;
          unsigned int h_line_no : 16;
          unsigned long h_start_jiffies;
          unsigned int h_requested_credits;
          unsigned int saved_alloc_context;
      }
      SIZE: 56
      crash> jbd2_journal_handle ffff8cd560fdd968
      struct jbd2_journal_handle {
        {
          h_transaction = 0xffff8cd8d415a100,
          h_journal = 0xffff8cd8d415a100
        },
        h_rsv_handle = 0x0,
        h_total_credits = 167,
        h_revoke_credits = 8,
        h_revoke_credits_requested = 8,
        h_ref = 1,
        h_err = 0,
        h_sync = 0,
        h_jdata = 0,
        h_reserved = 0,
        h_aborted = 0,
        h_type = 0,
        h_line_no = 1993,
        h_start_jiffies = 11051536247,
        h_requested_credits = 313,
        saved_alloc_context = 0
      }
      crash> ps -t 574060
      PID: 574060   TASK: ffff8cef4b1c4740  CPU: 23   COMMAND: "ll_ost05_010"
          RUN TIME: 78 days, 03:05:51
        START TIME: 7050951549099
             UTIME: 0
             STIME: 878360794630
      
      crash>
      crash: command not found: <D0><BE><D1><88><D0>
      crash> jiffies
      jiffies = $1 = 11052069069
      crash> p 11052069069 - 11051536247
      p: gdb request failed: p 11052069069 11051536247
      crash> 11052069069 - 11051536247
      crash: command not found: 11052069069
      crash> gdb p 11052069069 - 11051536247
      $2 = 532822
      crash> gdb p (11052069069 - 11051536247 ) / 1000 / 60
      $3 = 8
      crash> 
      crash> transaction_t 0xffff8cd8d415a100 | head    
      struct transaction_t {
        t_journal = 0xffff8cd28bd2a000,
        t_tid = 366403773,
        t_state = T_LOCKED,
        t_log_start = 0,
        t_nr_buffers = 669,
        t_reserved_list = 0x0,
        t_buffers = 0xffff8cde42cd3708,
        t_forget = 0x0,
        t_checkpoint_list = 0x0,
      crash> 
      

      Attachments

        Activity

          [LU-17851] A long osd_fallocate_preallocate blocks other fs writers
          pjones Peter Jones added a comment -

          Merged for 2.16

          pjones Peter Jones added a comment - Merged for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55111/
          Subject: LU-17851 ldiskfs: restart long fallocate tx
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f317b5c30e478fdecceea4bd07c85ff305e9d81d

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55111/ Subject: LU-17851 ldiskfs: restart long fallocate tx Project: fs/lustre-release Branch: master Current Patch Set: Commit: f317b5c30e478fdecceea4bd07c85ff305e9d81d

          the patch https://review.whamcloud.com/c/fs/lustre-release/+/55111 (LU-17851 ldiskfs: restart long fallocate tx) looks for me as a preferable fix for Lustre as it doesn't need any additional ldiskfs patches .

          zam Alexander Zarochentsev added a comment - the patch https://review.whamcloud.com/c/fs/lustre-release/+/55111 ( LU-17851 ldiskfs: restart long fallocate tx) looks for me as a preferable fix for Lustre as it doesn't need any additional ldiskfs patches .

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55112
          Subject: LU-17851 ldiskfs: ensure_credits to check tx state
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: a3f2075df9834c6efa916c9a19ae02ac3e924cfb

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55112 Subject: LU-17851 ldiskfs: ensure_credits to check tx state Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a3f2075df9834c6efa916c9a19ae02ac3e924cfb

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55111
          Subject: LU-17851 ldiskfs: restart long fallocate tx
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 2bfa7c9ea2d7703ec3c4b4bbd3d51380cd39a1b2

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55111 Subject: LU-17851 ldiskfs: restart long fallocate tx Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2bfa7c9ea2d7703ec3c4b4bbd3d51380cd39a1b2

          People

            zam Alexander Zarochentsev
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: