[LU-8092] racy striping & default striping cache in LOD Created: 03/May/16 Updated: 13/Oct/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Niu Yawei (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
striping & default striping cache in LOD layer looks racy: 1. striping cache lod_load_striping() is used for loading striping for regular file (LOV) or directory (LMA), caller usually calls lod_load_striping() to load striping into cache first (see lod_object), then performs following operations which relies on the cached striping. However, there isn't a mechanism in LOD to prevent the cache from being freed before the operations done. External means are used to avoid such race in current Lustre:
I think we'd reorgnize the code to avoid such kind of race in LOD but not rely on external means. 2. default striping cache In lod_ah_init(), default striping (both default LOV and LMV) in parent directory will be loaded into LOD cache (see lod_object), then it'll be propagated to the child file/directory being created (child's lod_object). Because multiple lod_ah_init() against same parent could be called in parallel, and the default striping in parent could be changed at the same time, we probably need to protect the whole lod_ah_init() with dt_read/write_lock(parent). |
| Comments |
| Comment by Alex Zhuravlev [ 03/May/16 ] |
|
Like I mentioned few times it makes sense to look at the different option: |
| Comment by Niu Yawei (Inactive) [ 03/May/16 ] |
Alex, you mean for the second problem (default striping cache), in lod_ah_init(), we can read and parse the default LOV/LMA EA cached in OSD layer into a per thread temporary buffer in LOD, then use the buffer without locking? |
| Comment by Alex Zhuravlev [ 03/May/16 ] |
|
yes, that's correct description. |
| Comment by Niu Yawei (Inactive) [ 04/May/16 ] |
|
The second problem has been addressed in the |