Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17307

osd_dirent_count() keeps multiple threads busy

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.14.0, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When osd_attr_get() is accessing a directory for the first time, it calls osd_dirent_count() to iterate over the directory and fill in obj->oo_dirent_count:

       iterate_dir+0x70/0x140
       osd_ldiskfs_it_fill+0xbd/0x290 [osd_ldiskfs]
       osd_it_ea_next+0xc2/0x150 [osd_ldiskfs]
       osd_attr_get+0x4bc/0x730 [osd_ldiskfs]
       lod_attr_get+0x8b/0x170 [lod]
       mdd_la_get+0x70/0x200 [mdd]
       mdd_attr_get+0x38/0x100 [mdd]
       mdt_attr_get_complex+0x4dd/0x800 [mdt]
       mdt_getattr_internal+0x445/0x1590 [mdt]
       mdt_getattr_name_lock+0x74d/0x2640 [mdt]
       mdt_intent_getattr+0x2a5/0x470 [mdt]
       mdt_intent_opc+0x1ba/0xb40 [mdt]
       mdt_intent_policy+0x1a9/0x370 [mdt]
       ldlm_lock_enqueue+0x3d4/0xb00 [ptlrpc]
       ldlm_handle_enqueue0+0x8b6/0x16d0 [ptlrpc]
       tgt_enqueue+0x62/0x220 [ptlrpc]
       tgt_request_handle+0x8bf/0x18c0 [ptlrpc]
       ptlrpc_server_handle_request+0x253/0xc40 [ptlrpc]
       ptlrpc_main+0xc88/0x26c0 [ptlrpc]
       kthread+0xd1/0xe0
      

      However, there are several issues with this:

      • if the directory is very large (e.g. millions of entries), then this iteration can take a considerable time and blocks the MDS service thread until it is finished.
      • if multiple MDS threads are accessing the same directory, then all of the threads will try to count the number of entries in the directory, blocking all of the threads.
      • the oo_dirent_count value is only needed for auto directory split, which is not enabled today.

      The entry counting should only be done if directory auto-split is enabled, avoiding overhead under normal cases. Since mdt_enable_dir_auto_split is controlled at the MDT level, the need for the count should be passed down to the OSD via an LA_DIRENT_COUNT valid flag. For osd-ldiskfs if this flag is not set then the count would be skipped. For osd-zfs it can optionally return the count since it is available for free.

      If directory auto-split is enabled, it would be much more efficient to only have one thread do the directory iteration. This should be controlled by a flag in the object, and the other threads can ignore oo_dirent_count (return "0" or "LU_DIRENT_COUNT_UNSET" or the current number of entries found). At worst this would defer the auto split by a few entries, but that doesn't matter in the end since the thread doing the counting will always perform the check itself.

      Attachments

        Issue Links

          Activity

            [LU-17307] osd_dirent_count() keeps multiple threads busy
            hxing Xing Huang added a comment -

            2023-12-13: The fix patch landed to master branch.

            hxing Xing Huang added a comment - 2023-12-13: The fix patch landed to master branch.

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53229/
            Subject: LU-17307 mdt: get dirent count by request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d0babe6bd8b38cb86875c5e3d92aee4c69986ce9

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53229/ Subject: LU-17307 mdt: get dirent count by request Project: fs/lustre-release Branch: master Current Patch Set: Commit: d0babe6bd8b38cb86875c5e3d92aee4c69986ce9
            hxing Xing Huang added a comment -

            2023-12-07: The fix patch is ready to land to master(temporarily not on master-next branch).

            hxing Xing Huang added a comment - 2023-12-07: The fix patch is ready to land to master(temporarily not on master-next branch).

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53229
            Subject: LU-17307 mdt: get dirent count by request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 06aec8f6c410dcb053bc87d61ce334c23e58e1fd

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53229 Subject: LU-17307 mdt: get dirent count by request Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 06aec8f6c410dcb053bc87d61ce334c23e58e1fd

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: