Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-822

allow multiple Object Index files to be created

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.2.0
    • None
    • None
    • 2
    • 4764

    Description

      Per discussion in email, having multiple Object Index files improves performance of the MDS significantly. It appears from initial discussion that this problem is more directly correlated to concurrent access to the OI, and not as much to the number of entries in the OI.

      While I agree it is a good idea to also investigate why the single OI has a concurrency problem, it also makes sense to be able to add multiple OIs for testing, and potentially for production use. Care must be taken to ensure that there are no compatibility issues introduced if the new OSD can handle multiple OIs, but the filesystem was formatted with only a single OI and upgraded, or if it was upgraded and then downgraded. Lustre does not support formatting the filesystem with a new version of Lustre and then downgrading to an older version.

      For OSDs with multiple OI files it will use filenames oi.[0..OSD_OI_FID_NR-1]. We can start with OSD_OI_FID_NR=32 for now, but this should be flexible. The oi.N used for (SEQ % OSD_OI_FID_NR) will put FIDs from a single client in a single OI (for locality during single-threaded creates) and distribute multiple clients across multiple OIs fairly uniformly to minimize contention, because the FID SEQ values are allocated sequentially.

      I am aware that "oi.16" previously had a different meaning, namely that oi.16 was the size of the FID, not an OI index. However, the previous use of "oi.5" was a benchmark hack that only worked for the first 32k SEQ values ever allocated and was fortunately removed before Lustre 2.0 was released, so we are able to safely redefine what the ".16" suffix means.

      In order to address these issues, I would recommend proceeding as follows:

      • new 2.x filesystems: If the filesystem is mounted it should iterate over the root directory to locate oi.N files. If none are found (new filesystem, or oi.16 was deleted) then it should set in-memory per-OSD values osd_oi_start = 0 and osd_oi_seq_mask = (OSD_OI_FID_NR - 1), and create files oi.[osd_oi_start..osd_oi_start+osd_oi_seq_mask] and use those for the OIs.
      • existing filesystems: If the filesystem is mounted and after iteration over the root directory only oi.16 exists (the upgrade case) then it should be used as the only OI for that OSD and set osd_oi_start = 16 and osd_oi_seq_mask = 0. If multiple oi.N files exist, it will continue to open all oi.N files and set osd_oi_start = 0 and osd_oi_seq_mask = (2^n - 1) >= N, to handle the case where the OSD_OI_FID_NR is changed or made dynamic. We need OSD_OI_FID_NR >= 32 to ensure that it will always cover the range with oi.16 in it.
      • access to the OI in the rest of the code should use OI index N=((seq & osd_oi_seq_mask) + osd_oi_start)

      As part of lfsck Phase I, if any oi.N is missing (except in the special case of only oi.16 exising) it should be recreated and lfsck triggered to do a full OI scrub/rebuild (the OI count may be completely transparent to lfsck, I'm not sure yet).

      Attachments

        Issue Links

          Activity

            People

              liang Liang Zhen (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: