Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14918

too many ldiskfs transaction credits for llog when unlinking overstriped files

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.14.0, Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      Removing widely overstriped files from an ldiskfs MDT causes excessively many transaction credits to be reserved. This can be seen in the MDS console logs:

      Lustre: DEBUG MARKER: == sanity test 130g: FIEMAP (overstripe file) ========
      Lustre: 25401:0:(osd_handler.c:1934:osd_trans_start()) lustre-MDT0000: credits 54595 > trans_max 2592
      Lustre: 25401:0:(osd_handler.c:1863:osd_trans_dump_creds())   create: 800/6400/0, destroy: 1/4/0
      Lustre: 25401:0:(osd_handler.c:1870:osd_trans_dump_creds())   attr_set: 3/3/0, xattr_set: 804/148/0
      Lustre: 25401:0:(osd_handler.c:1880:osd_trans_dump_creds())   write: 4001/34410/0, punch: 0/0/0, quota 6/6/0
      Lustre: 25401:0:(osd_handler.c:1887:osd_trans_dump_creds())   insert: 801/13616/0, delete: 2/5/0
      Lustre: 25401:0:(osd_handler.c:1894:osd_trans_dump_creds())   ref_add: 1/1/0, ref_del: 2/2/0
      Pid: 25401, comm: mdt00_004 3.10.0-1160.36.2.el7_lustre.x86_64 #1 SMP Tue Aug 3 23:03:31 UTC 2021
      Call Trace:
      libcfs_call_trace+0x90/0xf0 [libcfs]
      libcfs_debug_dumpstack+0x26/0x30 [libcfs]
      osd_trans_start+0x4bb/0x4e0 [osd_ldiskfs]
      top_trans_start+0x702/0x940 [ptlrpc]
      lod_trans_start+0x34/0x40 [lod]
      mdd_trans_start+0x1a/0x20 [mdd]
      mdd_unlink+0x4ee/0xae0 [mdd]
      mdo_unlink+0x1b/0x1d [mdt]
      mdt_reint_unlink+0xb64/0x1890 [mdt]
      mdt_reint_rec+0x83/0x210 [mdt]
      mdt_reint_internal+0x720/0xaf0 [mdt]
      mdt_reint+0x67/0x140 [mdt]
      tgt_request_handle+0x7ea/0x1750 [ptlrpc]
      ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
      Lustre: 25401:0:(osd_internal.h:1325:osd_trans_exec_op()) lustre-MDT0000: opcode 7: before 2589 < left 34410, rollback = 7
      

      and

      Lustre: DEBUG MARKER: == sanity test 27Cd: test maximum stripe count ========
      Lustre: 12686:0:(osd_handler.c:1934:osd_trans_start()) lustre-MDT0003: credits 136195 > trans_max 2592
      Lustre: 12686:0:(osd_handler.c:1863:osd_trans_dump_creds())   create: 2000/16000/0, destroy: 1/4/0
      Lustre: 12686:0:(osd_handler.c:1870:osd_trans_dump_creds())   attr_set: 3/3/0, xattr_set: 2004/148/0
      Lustre: 12686:0:(osd_handler.c:1880:osd_trans_dump_creds())   write: 10001/86010/0, punch: 0/0/0, quota 6/6/0
      Lustre: 12686:0:(osd_handler.c:1887:osd_trans_dump_creds())   insert: 2001/34016/0, delete: 2/5/0
      Lustre: 12686:0:(osd_handler.c:1894:osd_trans_dump_creds())   ref_add: 1/1/0, ref_del: 2/2/0
      Pid: 12686, comm: mdt00_000 3.10.0-1160.36.2.el7_lustre.x86_64 #1 SMP Tue Aug 3 23:03:31 UTC 2021
      Call Trace:
      libcfs_call_trace+0x90/0xf0 [libcfs]
      libcfs_debug_dumpstack+0x26/0x30 [libcfs]
      osd_trans_start+0x4bb/0x4e0 [osd_ldiskfs]
      top_trans_start+0x702/0x940 [ptlrpc]
      lod_trans_start+0x34/0x40 [lod]
      mdd_trans_start+0x1a/0x20 [mdd]
      mdd_unlink+0x4ee/0xae0 [mdd]
      mdo_unlink+0x1b/0x1d [mdt]
      mdt_reint_unlink+0xb64/0x1890 [mdt]
      mdt_reint_rec+0x83/0x210 [mdt]
      mdt_reint_internal+0x720/0xaf0 [mdt]
      mdt_reint+0x67/0x140 [mdt]
      tgt_request_handle+0x7ea/0x1750 [ptlrpc]
      ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
      

      and similarly in sanity test_130e, sanity-pfl test_0b, test_1c, always during unlink.

      The two examples shown are trying to reserve a whopping 213MiB and 532MiB of journal space, respectively. Since the maximum xattr size for an overstriped file is 64KiB, this is pretty excessive.

      Attachments

        Issue Links

          Activity

            [LU-14918] too many ldiskfs transaction credits for llog when unlinking overstriped files
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49701/
            Subject: LU-14918 osd: don't declare similar zfs writes twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c1936c9d294d53ff39741e1b07ffc74f51fcddb6

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49701/ Subject: LU-14918 osd: don't declare similar zfs writes twice Project: fs/lustre-release Branch: master Current Patch Set: Commit: c1936c9d294d53ff39741e1b07ffc74f51fcddb6

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45765/
            Subject: LU-14918 osd: don't declare similar ldiskfs writes twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9e6225b2e7385cbb7be0474df01075fafc4966d5

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45765/ Subject: LU-14918 osd: don't declare similar ldiskfs writes twice Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9e6225b2e7385cbb7be0474df01075fafc4966d5

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49701
            Subject: LU-14918 osd: don't declare similar writes twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e4a86ff32d18b6598880df1ca19e16af5a8b781b

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49701 Subject: LU-14918 osd: don't declare similar writes twice Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e4a86ff32d18b6598880df1ca19e16af5a8b781b

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45765
            Subject: LU-14918 osd: don't declare similar writes twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 54aef59ac5bd349a3d42b71b3d0bdc9cda93066e

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45765 Subject: LU-14918 osd: don't declare similar writes twice Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 54aef59ac5bd349a3d42b71b3d0bdc9cda93066e

            Assign to Alex to balance 2.15 tickets.

            adilger Andreas Dilger added a comment - Assign to Alex to balance 2.15 tickets.

            I'd like to still keep this under consideration for 2.15.0 since it is hit very often during usage, and the transaction size (over 500MB) can hurt performance significantly.

            adilger Andreas Dilger added a comment - I'd like to still keep this under consideration for 2.15.0 since it is hit very often during usage, and the transaction size (over 500MB) can hurt performance significantly.

            As for concurrent records, that would be mostly fine. As I wrote before, there may be some interleaving of records, but not 65000 different records written from other threads between each delete of the overstriped object that would trigger a new llog file each time.

            adilger Andreas Dilger added a comment - As for concurrent records, that would be mostly fine. As I wrote before, there may be some interleaving of records, but not 65000 different records written from other threads between each delete of the overstriped object that would trigger a new llog file each time.

            Mike, I think that was later fixed by Alex to remove the "specific declare" since that hurt performance and didn't save much space...

            adilger Andreas Dilger added a comment - Mike, I think that was later fixed by Alex to remove the "specific declare" since that hurt performance and didn't save much space...

            also as I remember that was mostly due to ZFS which needs exact offset/len to be declared and complains if write doesn't match. So as we can't say for sure exact offset at the declaration time we assume range where it can go - the current chunk and the next one. And also we should take case about non-atomic 'declare'-'write' nature, I think there could be concurrent record additions in between, can't they?

            tappro Mikhail Pershin added a comment - also as I remember that was mostly due to ZFS which needs exact offset/len to be declared and complains if write doesn't match. So as we can't say for sure exact offset at the declaration time we assume range where it can go - the current chunk and the next one. And also we should take case about non-atomic 'declare'-'write' nature, I think there could be concurrent record additions in between, can't they?

            People

              bzzz Alex Zhuravlev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: