Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16032

Truncate for large objects can lead to a thread hung

Details

    • 3
    • 9223372036854775807

    Description

      Truncate for large objects can lead to a thread hung with this

      call stack:

       Net: Service thread pid 1739 was inactive for 200.16s.
          The thread might be hung, or it might only be slow and will resume later.
          Dumping the stack trace for debugging purposes:
          __wait_on_buffer+0x2a/0x30
          ldiskfs_wait_block_bitmap+0xe0/0xf0 [ldiskfs]
          ldiskfs_read_block_bitmap+0x31/0x60 [ldiskfs]
          ldiskfs_free_blocks+0x329/0xbb0 [ldiskfs]
          ldiskfs_ext_remove_space+0x8a9/0x1150 [ldiskfs]
          ldiskfs_ext_truncate+0xb0/0xe0 [ldiskfs]
          ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs]
          ldiskfs_evict_inode+0x58a/0x630 [ldiskfs]
          evict+0xb4/0x180
          iput+0xfc/0x190
          osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
          lu_object_free.isra.30+0x68/0x170 [obdclass]
          lu_object_put+0xc5/0x3e0 [obdclass]
          ofd_destroy_by_fid+0x20e/0x500 [ofd]
          ofd_destroy_hdl+0x267/0x9f0 [ofd] 
       
          tgt_request_handle+0xaee/0x15f0 [ptlrpc]
          ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
          ptlrpc_main+0xb34/0x1470 [ptlrpc]
          kthread+0xd1/0xe0
      

      As solution truncate can be moved to a separate thread if inode size > 1TB

       

      Attachments

        Issue Links

          Activity

            [LU-16032] Truncate for large objects can lead to a thread hung

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57770
            Subject: LU-16032 osd: move unlink of large objects to separate thread
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: d389c311a69587007876549b1273c5e12e3a5878

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57770 Subject: LU-16032 osd: move unlink of large objects to separate thread Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: d389c311a69587007876549b1273c5e12e3a5878
            adilger Andreas Dilger made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Priority Original: Major [ 3 ] New: Critical [ 2 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53218/
            Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: beaa2e03765655656f5a3befdb4b8d8cccfa60e8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53218/ Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360 Project: fs/lustre-release Branch: master Current Patch Set: Commit: beaa2e03765655656f5a3befdb4b8d8cccfa60e8

            This patch is crashing during unmount in some cases (probably when there are large files deleted. I've filed LU-17332 to track that issue.

            [26282.338565] Lustre: server umount lustre-OST0004 complete
            [26282.411017] ------------[ cut here ]------------
            [26282.412061] kernel BUG at fs/jbd2/transaction.c:378!
            [26282.413171] invalid opcode: 0000 [#1] SMP PTI
            [26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1
            [26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [26282.435639] Call Trace:
            [26282.438083]  jbd2__journal_start+0xee/0x1f0 [jbd2]
            [26282.439047]  jbd2_journal_start+0x19/0x20 [jbd2]
            [26282.439979]  flush_stashed_stats_work+0x36/0x90 [ldiskfs]
            [26282.441086]  process_one_work+0x1a7/0x360
            [26282.442753]  worker_thread+0x30/0x390
            [26282.444311]  kthread+0x134/0x150
            
            adilger Andreas Dilger added a comment - This patch is crashing during unmount in some cases (probably when there are large files deleted. I've filed LU-17332 to track that issue. [26282.338565] Lustre: server umount lustre-OST0004 complete [26282.411017] ------------[ cut here ]------------ [26282.412061] kernel BUG at fs/jbd2/transaction.c:378! [26282.413171] invalid opcode: 0000 [#1] SMP PTI [26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1 [26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [26282.435639] Call Trace: [26282.438083] jbd2__journal_start+0xee/0x1f0 [jbd2] [26282.439047] jbd2_journal_start+0x19/0x20 [jbd2] [26282.439979] flush_stashed_stats_work+0x36/0x90 [ldiskfs] [26282.441086] process_one_work+0x1a7/0x360 [26282.442753] worker_thread+0x30/0x390 [26282.444311] kthread+0x134/0x150
            adilger Andreas Dilger made changes -
            Assignee Original: Artem Blagodarenko [ ablagodarenko ] New: Andreas Dilger [ adilger ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17332 [ LU-17332 ]
            adilger Andreas Dilger made changes -
            Link Original: This issue is related to LU-17148 [ LU-17148 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17148 [ LU-17148 ]
            adilger Andreas Dilger made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

            People

              adilger Andreas Dilger
              ablagodarenko Artem Blagodarenko
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: