[LU-17307] osd_dirent_count() keeps multiple threads busy Created: 21/Nov/23 Updated: 15/Dec/23 Resolved: 15/Dec/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
When osd_attr_get() is accessing a directory for the first time, it calls osd_dirent_count() to iterate over the directory and fill in obj->oo_dirent_count: iterate_dir+0x70/0x140 osd_ldiskfs_it_fill+0xbd/0x290 [osd_ldiskfs] osd_it_ea_next+0xc2/0x150 [osd_ldiskfs] osd_attr_get+0x4bc/0x730 [osd_ldiskfs] lod_attr_get+0x8b/0x170 [lod] mdd_la_get+0x70/0x200 [mdd] mdd_attr_get+0x38/0x100 [mdd] mdt_attr_get_complex+0x4dd/0x800 [mdt] mdt_getattr_internal+0x445/0x1590 [mdt] mdt_getattr_name_lock+0x74d/0x2640 [mdt] mdt_intent_getattr+0x2a5/0x470 [mdt] mdt_intent_opc+0x1ba/0xb40 [mdt] mdt_intent_policy+0x1a9/0x370 [mdt] ldlm_lock_enqueue+0x3d4/0xb00 [ptlrpc] ldlm_handle_enqueue0+0x8b6/0x16d0 [ptlrpc] tgt_enqueue+0x62/0x220 [ptlrpc] tgt_request_handle+0x8bf/0x18c0 [ptlrpc] ptlrpc_server_handle_request+0x253/0xc40 [ptlrpc] ptlrpc_main+0xc88/0x26c0 [ptlrpc] kthread+0xd1/0xe0 However, there are several issues with this:
The entry counting should only be done if directory auto-split is enabled, avoiding overhead under normal cases. Since mdt_enable_dir_auto_split is controlled at the MDT level, the need for the count should be passed down to the OSD via an LA_DIRENT_COUNT valid flag. For osd-ldiskfs if this flag is not set then the count would be skipped. For osd-zfs it can optionally return the count since it is available for free. If directory auto-split is enabled, it would be much more efficient to only have one thread do the directory iteration. This should be controlled by a flag in the object, and the other threads can ignore oo_dirent_count (return "0" or "LU_DIRENT_COUNT_UNSET" or the current number of entries found). At worst this would defer the auto split by a few entries, but that doesn't matter in the end since the thread doing the counting will always perform the check itself. |
| Comments |
| Comment by Gerrit Updater [ 24/Nov/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53229 |
| Comment by Xing Huang [ 07/Dec/23 ] |
|
2023-12-07: The fix patch is ready to land to master(temporarily not on master-next branch). |
| Comment by Gerrit Updater [ 12/Dec/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53229/ |
| Comment by Xing Huang [ 13/Dec/23 ] |
|
2023-12-13: The fix patch landed to master branch. |