Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.14.0, Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
When osd_attr_get() is accessing a directory for the first time, it calls osd_dirent_count() to iterate over the directory and fill in obj->oo_dirent_count:
iterate_dir+0x70/0x140 osd_ldiskfs_it_fill+0xbd/0x290 [osd_ldiskfs] osd_it_ea_next+0xc2/0x150 [osd_ldiskfs] osd_attr_get+0x4bc/0x730 [osd_ldiskfs] lod_attr_get+0x8b/0x170 [lod] mdd_la_get+0x70/0x200 [mdd] mdd_attr_get+0x38/0x100 [mdd] mdt_attr_get_complex+0x4dd/0x800 [mdt] mdt_getattr_internal+0x445/0x1590 [mdt] mdt_getattr_name_lock+0x74d/0x2640 [mdt] mdt_intent_getattr+0x2a5/0x470 [mdt] mdt_intent_opc+0x1ba/0xb40 [mdt] mdt_intent_policy+0x1a9/0x370 [mdt] ldlm_lock_enqueue+0x3d4/0xb00 [ptlrpc] ldlm_handle_enqueue0+0x8b6/0x16d0 [ptlrpc] tgt_enqueue+0x62/0x220 [ptlrpc] tgt_request_handle+0x8bf/0x18c0 [ptlrpc] ptlrpc_server_handle_request+0x253/0xc40 [ptlrpc] ptlrpc_main+0xc88/0x26c0 [ptlrpc] kthread+0xd1/0xe0
However, there are several issues with this:
- if the directory is very large (e.g. millions of entries), then this iteration can take a considerable time and blocks the MDS service thread until it is finished.
- if multiple MDS threads are accessing the same directory, then all of the threads will try to count the number of entries in the directory, blocking all of the threads.
- the oo_dirent_count value is only needed for auto directory split, which is not enabled today.
The entry counting should only be done if directory auto-split is enabled, avoiding overhead under normal cases. Since mdt_enable_dir_auto_split is controlled at the MDT level, the need for the count should be passed down to the OSD via an LA_DIRENT_COUNT valid flag. For osd-ldiskfs if this flag is not set then the count would be skipped. For osd-zfs it can optionally return the count since it is available for free.
If directory auto-split is enabled, it would be much more efficient to only have one thread do the directory iteration. This should be controlled by a flag in the object, and the other threads can ignore oo_dirent_count (return "0" or "LU_DIRENT_COUNT_UNSET" or the current number of entries found). At worst this would defer the auto split by a few entries, but that doesn't matter in the end since the thread doing the counting will always perform the check itself.