Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8092

racy striping & default striping cache in LOD

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      striping & default striping cache in LOD layer looks racy:

      1. striping cache

      lod_load_striping() is used for loading striping for regular file (LOV) or directory (LMA), caller usually calls lod_load_striping() to load striping into cache first (see lod_object), then performs following operations which relies on the cached striping. However, there isn't a mechanism in LOD to prevent the cache from being freed before the operations done. External means are used to avoid such race in current Lustre:

      • For regular file, all the operations (create, setstripe, destroy, chown, lfsck) requires striping information (load striping cache & free striping cache) will hold LDLM ibits lock, that guarantees no concurrent striping free on the object;
      • For directory, the striping information (LMV) is immutable in most cases, so there won't be a concurrent striping free (except LFSCK, so it's not 100% race free);

      I think we'd reorgnize the code to avoid such kind of race in LOD but not rely on external means.

      2. default striping cache

      In lod_ah_init(), default striping (both default LOV and LMV) in parent directory will be loaded into LOD cache (see lod_object), then it'll be propagated to the child file/directory being created (child's lod_object). Because multiple lod_ah_init() against same parent could be called in parallel, and the default striping in parent could be changed at the same time, we probably need to protect the whole lod_ah_init() with dt_read/write_lock(parent).

      Attachments

        Issue Links

          Activity

            [LU-8092] racy striping & default striping cache in LOD

            The second problem has been addressed in the LU-7660 : http://review.whamcloud.com/#/c/19041 (by the way Alex suggested), I agree that the same approach can be applied to fix the first problem, but that requires a lot more code changes, let's fix it later on the code base of LU-7660.

            niu Niu Yawei (Inactive) added a comment - The second problem has been addressed in the LU-7660 : http://review.whamcloud.com/#/c/19041 (by the way Alex suggested), I agree that the same approach can be applied to fix the first problem, but that requires a lot more code changes, let's fix it later on the code base of LU-7660 .

            yes, that's correct description.
            probably this approach can be taken for the first problem as well - this would help to avoid another problem where one LOD object pins same (or even larger) number of OSP objects.

            bzzz Alex Zhuravlev added a comment - yes, that's correct description. probably this approach can be taken for the first problem as well - this would help to avoid another problem where one LOD object pins same (or even larger) number of OSP objects.

            OSD can cache LOV EA so that getting default from OSD on every creation doesn't imply performance penalty. all the locking needed to protect EA access in OSD is already in place, so LOD wouldn't need anything. plus this would solve cache invalidation issue we have with slave directories.

            Alex, you mean for the second problem (default striping cache), in lod_ah_init(), we can read and parse the default LOV/LMA EA cached in OSD layer into a per thread temporary buffer in LOD, then use the buffer without locking?

            niu Niu Yawei (Inactive) added a comment - OSD can cache LOV EA so that getting default from OSD on every creation doesn't imply performance penalty. all the locking needed to protect EA access in OSD is already in place, so LOD wouldn't need anything. plus this would solve cache invalidation issue we have with slave directories. Alex, you mean for the second problem (default striping cache), in lod_ah_init(), we can read and parse the default LOV/LMA EA cached in OSD layer into a per thread temporary buffer in LOD, then use the buffer without locking?

            Like I mentioned few times it makes sense to look at the different option:
            1) do not cache in LOD
            2) at declare do not create striping in the object, but instead create a temporary structure in LOD's thandle and instantiate that at execution.

            bzzz Alex Zhuravlev added a comment - Like I mentioned few times it makes sense to look at the different option: 1) do not cache in LOD 2) at declare do not create striping in the object, but instead create a temporary structure in LOD's thandle and instantiate that at execution.

            People

              wc-triage WC Triage
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: