Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16032

Truncate for large objects can lead to a thread hung

Details

    • 3
    • 9223372036854775807

    Description

      Truncate for large objects can lead to a thread hung with this

      call stack:

       Net: Service thread pid 1739 was inactive for 200.16s.
          The thread might be hung, or it might only be slow and will resume later.
          Dumping the stack trace for debugging purposes:
          __wait_on_buffer+0x2a/0x30
          ldiskfs_wait_block_bitmap+0xe0/0xf0 [ldiskfs]
          ldiskfs_read_block_bitmap+0x31/0x60 [ldiskfs]
          ldiskfs_free_blocks+0x329/0xbb0 [ldiskfs]
          ldiskfs_ext_remove_space+0x8a9/0x1150 [ldiskfs]
          ldiskfs_ext_truncate+0xb0/0xe0 [ldiskfs]
          ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs]
          ldiskfs_evict_inode+0x58a/0x630 [ldiskfs]
          evict+0xb4/0x180
          iput+0xfc/0x190
          osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
          lu_object_free.isra.30+0x68/0x170 [obdclass]
          lu_object_put+0xc5/0x3e0 [obdclass]
          ofd_destroy_by_fid+0x20e/0x500 [ofd]
          ofd_destroy_hdl+0x267/0x9f0 [ofd] 
       
          tgt_request_handle+0xaee/0x15f0 [ptlrpc]
          ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
          ptlrpc_main+0xb34/0x1470 [ptlrpc]
          kthread+0xd1/0xe0
      

      As solution truncate can be moved to a separate thread if inode size > 1TB

       

      Attachments

        Issue Links

          Activity

            [LU-16032] Truncate for large objects can lead to a thread hung

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57770
            Subject: LU-16032 osd: move unlink of large objects to separate thread
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: d389c311a69587007876549b1273c5e12e3a5878

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57770 Subject: LU-16032 osd: move unlink of large objects to separate thread Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: d389c311a69587007876549b1273c5e12e3a5878

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53218/
            Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: beaa2e03765655656f5a3befdb4b8d8cccfa60e8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53218/ Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360 Project: fs/lustre-release Branch: master Current Patch Set: Commit: beaa2e03765655656f5a3befdb4b8d8cccfa60e8

            This patch is crashing during unmount in some cases (probably when there are large files deleted. I've filed LU-17332 to track that issue.

            [26282.338565] Lustre: server umount lustre-OST0004 complete
            [26282.411017] ------------[ cut here ]------------
            [26282.412061] kernel BUG at fs/jbd2/transaction.c:378!
            [26282.413171] invalid opcode: 0000 [#1] SMP PTI
            [26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1
            [26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [26282.435639] Call Trace:
            [26282.438083]  jbd2__journal_start+0xee/0x1f0 [jbd2]
            [26282.439047]  jbd2_journal_start+0x19/0x20 [jbd2]
            [26282.439979]  flush_stashed_stats_work+0x36/0x90 [ldiskfs]
            [26282.441086]  process_one_work+0x1a7/0x360
            [26282.442753]  worker_thread+0x30/0x390
            [26282.444311]  kthread+0x134/0x150
            
            adilger Andreas Dilger added a comment - This patch is crashing during unmount in some cases (probably when there are large files deleted. I've filed LU-17332 to track that issue. [26282.338565] Lustre: server umount lustre-OST0004 complete [26282.411017] ------------[ cut here ]------------ [26282.412061] kernel BUG at fs/jbd2/transaction.c:378! [26282.413171] invalid opcode: 0000 [#1] SMP PTI [26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1 [26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [26282.435639] Call Trace: [26282.438083] jbd2__journal_start+0xee/0x1f0 [jbd2] [26282.439047] jbd2_journal_start+0x19/0x20 [jbd2] [26282.439979] flush_stashed_stats_work+0x36/0x90 [ldiskfs] [26282.441086] process_one_work+0x1a7/0x360 [26282.442753] worker_thread+0x30/0x390 [26282.444311] kthread+0x134/0x150

            Reopened to track fixes to the delayed iput change.

            adilger Andreas Dilger added a comment - Reopened to track fixes to the delayed iput change.

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53242
            Subject: LU-16032 osd-ldiskfs: track backlog of unlinked objects
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c1bab788fc3038aae79aa05d0de162aa1e703c7b

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53242 Subject: LU-16032 osd-ldiskfs: track backlog of unlinked objects Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c1bab788fc3038aae79aa05d0de162aa1e703c7b
            adilger Andreas Dilger added a comment - - edited

            It would probably also be useful if writing to force_sync printed the number of objects awaiting unlink to the console with CWARN(), if any, and is silent otherwise. This would give an idea of how much unlink work is backlogged, since I think this might become a problem in real life as well.

            adilger Andreas Dilger added a comment - - edited It would probably also be useful if writing to force_sync printed the number of objects awaiting unlink to the console with CWARN() , if any, and is silent otherwise. This would give an idea of how much unlink work is backlogged, since I think this might become a problem in real life as well.

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53218
            Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8fa0580fd64fe7cbe969817ece87a161c517c4c3

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53218 Subject: LU-16032 tests: restore delay_unlink_mb in sanity/360 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8fa0580fd64fe7cbe969817ece87a161c517c4c3
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47995/
            Subject: LU-16032 osd: move unlink of large objects to separate thread
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a772e90243ea0ff1de6ae9c67e1f6384c431d200

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47995/ Subject: LU-16032 osd: move unlink of large objects to separate thread Project: fs/lustre-release Branch: master Current Patch Set: Commit: a772e90243ea0ff1de6ae9c67e1f6384c431d200

            "Artem Blagodarenko <ablagodarenko@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47995
            Subject: LU-16032 osd: execute truncate in separate thread for objects > 1GB
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fff9a3c7e986f217979ee1f2ef17f91ae118ae02

            gerrit Gerrit Updater added a comment - "Artem Blagodarenko <ablagodarenko@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47995 Subject: LU-16032 osd: execute truncate in separate thread for objects > 1GB Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fff9a3c7e986f217979ee1f2ef17f91ae118ae02

            People

              adilger Andreas Dilger
              ablagodarenko Artem Blagodarenko
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: