Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7225

change ladvise wire protocol for lockahead and future usage

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • 9223372036854775807

    Description

      As a continuation of LU-4931 "New feature of giving server/storage side advice of accessing file" and LU-4865 "osd-zfs: increase object block size dynamically as object grows" it is useful to be able to specify the ZFS blocksize for a file directly from the client, so that the OSS doesn't have to guess at this itself.

      This can be done via the "ladvise" functionality. One option is to pass the blocksize (a power-of-two value) along with the RANDOM hint on a newly-created file. This could use 8 bits of lla_padding to store the log2 blocksize, giving a maximum blocksize up to 2^255 bytes.

      struct lu_ladvise {
      	__u64 lla_advice;
      	__u64 lla_start;
      	__u64 lla_end;
              __u8  lla_blockbits;
      	__u8  lla_padding1;
      	__u16 lla_padding2;
      	__u32 lla_padding3;
      };
      

      It isn't clear if there are valid use cases for non-power-of-two RANDOM IO hints at the server, which would require using a different encoding (e.g. some multiple of a blocksize), since that could also be handled at the client by sending the underlying power-of-two blocksize.

      Attachments

        Issue Links

          Activity

            [LU-7225] change ladvise wire protocol for lockahead and future usage
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20666/
            Subject: LU-7225 llite: ladvise protocol changes
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 88fcc76b8dcd6febf780ad0503d725283e3c45c4

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20666/ Subject: LU-7225 llite: ladvise protocol changes Project: fs/lustre-release Branch: master Current Patch Set: Commit: 88fcc76b8dcd6febf780ad0503d725283e3c45c4

            The above is just a sample, it's not even going to build.

            I will do the work to update lfs, etc, making this a full patch once we've at least mostly agreed on the struct changes.

            paf Patrick Farrell (Inactive) added a comment - - edited The above is just a sample, it's not even going to build. I will do the work to update lfs, etc, making this a full patch once we've at least mostly agreed on the struct changes.

            Patrick Farrell (paf@cray.com) uploaded a new patch: http://review.whamcloud.com/20666
            Subject: LU-7225 llite: ladvise struct proposal
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2e8e328ee1752b3a934d96acee488fb4aa94e5e6

            gerrit Gerrit Updater added a comment - Patrick Farrell (paf@cray.com) uploaded a new patch: http://review.whamcloud.com/20666 Subject: LU-7225 llite: ladvise struct proposal Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2e8e328ee1752b3a934d96acee488fb4aa94e5e6

            I'll generate a patch with suggested protocol and interface changes. It's going to be the same as in the lock ahead + ladvise patch - which I'm close to posting - but I'd hope to land it first, then rebase the lock ahead patch.

            paf Patrick Farrell (Inactive) added a comment - I'll generate a patch with suggested protocol and interface changes. It's going to be the same as in the lock ahead + ladvise patch - which I'm close to posting - but I'd hope to land it first, then rebase the lock ahead patch.

            I don't think ladvise is under use by any customer now, since we are still testing it. However, any protocol level change should be done in a few weeks, because this feature will be officially released in the near future.

            lixi Li Xi (Inactive) added a comment - I don't think ladvise is under use by any customer now, since we are still testing it. However, any protocol level change should be done in a few weeks, because this feature will be officially released in the near future.

            I was originally thinking of combining "RANDOM" advice with the blocksize of that random IO for a particular range, but I'm not sure that makes sense anymore, since ladvise can combine multiple advices in a single RPC. On the one hand, it should be fine to have a BLOCKSIZE advice with the block size in "start", but that would preclude setting different block sizes for different file ranges (e.g. for PFL files with different storage types at different offsets). I'd be OK with the __u8 lla_blockbits as you propose, or we could just make this a __u8 or __u16 "lla_value16" that can be used by different advice types as needed.

            I guess the same question is true about your lla_flags? Is that generic for all the different advice types, or is it specific for the lockahead advice? If the latter, we might consider making an lla_value32 field that can be used by different advice types instead of reserving the field only for the one type? In general, flags are useful for a variety of things, so it might still make sense to keep it generic, but I'd at least like to know what this is used for.

            adilger Andreas Dilger added a comment - I was originally thinking of combining "RANDOM" advice with the blocksize of that random IO for a particular range, but I'm not sure that makes sense anymore, since ladvise can combine multiple advices in a single RPC. On the one hand, it should be fine to have a BLOCKSIZE advice with the block size in "start", but that would preclude setting different block sizes for different file ranges (e.g. for PFL files with different storage types at different offsets). I'd be OK with the __u8 lla_blockbits as you propose, or we could just make this a __u8 or __u16 "lla_value16" that can be used by different advice types as needed. I guess the same question is true about your lla_flags? Is that generic for all the different advice types, or is it specific for the lockahead advice? If the latter, we might consider making an lla_value32 field that can be used by different advice types instead of reserving the field only for the one type? In general, flags are useful for a variety of things, so it might still make sense to keep it generic, but I'd at least like to know what this is used for.

            Hmm - What about just putting the blockbits in "start"? I mean, for that advice type, just use start as the block bits. I still think we need flags, though.

            paf Patrick Farrell (Inactive) added a comment - - edited Hmm - What about just putting the blockbits in "start"? I mean, for that advice type, just use start as the block bits. I still think we need flags, though.

            Agreed, that does seem very large for the advice...

            Still need the flags (or at least, I feel strongly we should have them - I don't actually need them now, so could do without if necessary), so what about:

            struct lu_ladvise {
            	__u16 lla_advice;
                    __u8   lla_blockbits;
                    __u8   lla_padding1;
                    __u32 lla_flags;
            	__u64 lla_start;
            	__u64 lla_end;
            	__u64 lla_padding2;
            };
            

            Would that meet the alignment requirements you're describing? Or do we need to put the new things at the end (as you did with blockbits above)?

            paf Patrick Farrell (Inactive) added a comment - Agreed, that does seem very large for the advice... Still need the flags (or at least, I feel strongly we should have them - I don't actually need them now, so could do without if necessary), so what about: struct lu_ladvise { __u16 lla_advice; __u8 lla_blockbits; __u8 lla_padding1; __u32 lla_flags; __u64 lla_start; __u64 lla_end; __u64 lla_padding2; }; Would that meet the alignment requirements you're describing? Or do we need to put the new things at the end (as you did with blockbits above)?

            The other question is whether we need a __u64 for the advice type? I'd expect that this would be an enumeration of advice types, so having a __u8 or __u16 would be enough for the lu_advice field and the rest of that __u64 could be changed to other uses.

            In any case, changes to this structure need to be decided promptly, since feature freeze has passed, and we need to make any changes before the code freeze in 8 weeks.

            Li Xi, are there any sites already using the ladvise feature? Are any of them using PPC or other big-endian systems? If no big-endian machines are in use, then it should be safe to change the struct to keep lla_advice first, and re-use the later bytes:

            struct lu_ladvise {
            	__u16 lla_advice;
                    __u16 lla_paddingA;
                    __u32 lla_paddingB;
            	__u64 lla_start;
            	__u64 lla_end;
                    __u8  lla_blockbits;
            	__u8  lla_padding1;
            	__u16 lla_padding2;
            	__u32 lla_padding3;
            };
            
            adilger Andreas Dilger added a comment - The other question is whether we need a __u64 for the advice type? I'd expect that this would be an enumeration of advice types, so having a __u8 or __u16 would be enough for the lu_advice field and the rest of that __u64 could be changed to other uses. In any case, changes to this structure need to be decided promptly, since feature freeze has passed, and we need to make any changes before the code freeze in 8 weeks. Li Xi, are there any sites already using the ladvise feature? Are any of them using PPC or other big-endian systems? If no big-endian machines are in use, then it should be safe to change the struct to keep lla_advice first, and re-use the later bytes: struct lu_ladvise { __u16 lla_advice; __u16 lla_paddingA; __u32 lla_paddingB; __u64 lla_start; __u64 lla_end; __u8 lla_blockbits; __u8 lla_padding1; __u16 lla_padding2; __u32 lla_padding3; };

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: