Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12332

Add a liblustreapi call for IOC_MDC_GETFILEINFO

Details

    • 9223372036854775807

    Description

      IOC_MDC_GETFILEINFO is a very convenient ioctl to retrieve metadata information (stat + lov EA v1) from MDT. This avoid going to OSTs if you don't need information stored over there (like file size).

      This is wrapped by :

       int get_lmd_info_fd(char *path, int parent_fd, int dir_fd,
                           void *lmdbuf, int lmdlen, enum get_lmd_info_type type)
      

      but this function is not exported and it could be nice to also add a llapi prefix.

      Attachments

        Issue Links

          Activity

            [LU-12332] Add a liblustreapi call for IOC_MDC_GETFILEINFO

            The new llapi was added as part of LU-11367 (in 2.13.0 and 2.12.4).

            The ability to selectively get file attributes (without doing an OST RPC) is added with the statx() API in LU-10934 in 2.14.

            adilger Andreas Dilger added a comment - The new llapi was added as part of LU-11367 (in 2.13.0 and 2.12.4). The ability to selectively get file attributes (without doing an OST RPC) is added with the statx() API in LU-10934 in 2.14.

            > Because stat() will also send RPCs to OSTs and this is what we want to avoid, no?
            Perhaps better phrased as "the cost of a single RPC". My point is that a stat() call gets all the metadata that the MDS holds, both the MD in the inode and the layout, for the cost of a single ldlm_ibits_enqueue call to the MDS. At that point, all the info (except size) is available on the client, without an "open".

            If we filled in the stat size with lazy some data, and interrupted the stat on the client before it tried to get the size from the OSTs, we would have everything that llapi_get_lum_file_fd has, with a single RPC.

            Having said all that:
            >RPCs to OSTs and this is what we want to avoid
            Interestingly: no. That has always been the assumption, that "stats are slow because they have to go to OSTs". But that doesn't actually seem to be the case. We can get very high stat rates, even going to OSTs. The problem really seems to be RPCs to the MDS, and reducing the MDS RPC count is actually the win. In particular, directory open on MDS is slow, maybe because of locks, so it's likely (although I haven't tested) that the ioctl is slower that the stat, even including OST RPCs.

            nrutman Nathan Rutman added a comment - > Because stat() will also send RPCs to OSTs and this is what we want to avoid, no? Perhaps better phrased as "the cost of a single RPC". My point is that a stat() call gets all the metadata that the MDS holds, both the MD in the inode and the layout, for the cost of a single ldlm_ibits_enqueue call to the MDS. At that point, all the info (except size) is available on the client, without an "open". If we filled in the stat size with lazy some data, and interrupted the stat on the client before it tried to get the size from the OSTs, we would have everything that llapi_get_lum_file_fd has, with a single RPC. Having said all that: >RPCs to OSTs and this is what we want to avoid Interestingly: no. That has always been the assumption, that "stats are slow because they have to go to OSTs". But that doesn't actually seem to be the case. We can get very high stat rates, even going to OSTs. The problem really seems to be RPCs to the MDS, and reducing the MDS RPC count is actually the win. In particular, directory open on MDS is slow, maybe because of locks, so it's likely (although I haven't tested) that the ioctl is slower that the stat, even including OST RPCs.

            > why can't we get all the info we want for the cost of a single stat()?

             Because stat() will also send RPCs to OSTs and this is what we want to avoid, no?

            degremoa Aurelien Degremont (Inactive) added a comment - - edited > why can't we get all the info we want for the cost of a single stat()?  Because stat() will also send RPCs to OSTs and this is what we want to avoid, no?
            nrutman Nathan Rutman added a comment - - edited

            I don't want to hold up the landing of LU-11367, but it's not quite ideal yet.

            llapi_get_lum_file_fd requires an open parent directory FD to send the ioctl to. It looks like this has to be the direct parent given the strrchr call in get_lmd_info_fd, rather than a true "path relative to the parent fd", which might allow us to issue the ioctl on the FS root in all cases, and avoid opening the parent dirs.

            Since both stat() and getfattr() can be issued against the path with no opens, it seems like the info should be obtainable with even fewer RPCs. For example, llapi_get_lum_file could, instead of opening the parent dir and calling get_lum_file_fd, just call stat and getfattr lustre.lov, and avoid the open. Of course, stat and getfattr are two calls - but in reality we know that the stat() has already retrieved the layout info (because it needs to talk to the OSTs for size). So this is my long-winded way of asking: why can't we get all the info we want for the cost of a single stat()?

            nrutman Nathan Rutman added a comment - - edited I don't want to hold up the landing of LU-11367 , but it's not quite ideal yet. llapi_get_lum_file_fd requires an open parent directory FD to send the ioctl to. It looks like this has to be the direct parent given the strrchr call in get_lmd_info_fd, rather than a true "path relative to the parent fd", which might allow us to issue the ioctl on the FS root in all cases, and avoid opening the parent dirs. Since both stat() and getfattr() can be issued against the path with no opens, it seems like the info should be obtainable with even fewer RPCs. For example, llapi_get_lum_file could, instead of opening the parent dir and calling get_lum_file_fd, just call stat and getfattr lustre.lov, and avoid the open. Of course, stat and getfattr are two calls - but in reality we know that the stat() has already retrieved the layout info (because it needs to talk to the OSTs for size). So this is my long-winded way of asking: why can't we get all the info we want for the cost of a single stat()?

            This ticket seems to be addressed by LU-11367 patch
            https://review.whamcloud.com/#/c/35167/

            int llapi_get_lum_file_fd(int dir_fd, const char *fname, __u64 *valid,
            lstatx_t *statx, struct lov_user_md *lum,
            size_t lumsize);

            nrutman Nathan Rutman added a comment - This ticket seems to be addressed by LU-11367 patch https://review.whamcloud.com/#/c/35167/ int llapi_get_lum_file_fd(int dir_fd, const char *fname, __u64 *valid, lstatx_t *statx, struct lov_user_md *lum, size_t lumsize);

            Actually I was also thinking about add a call which was not exactly get_lmd_info_fd() but take this opportunity to improve that a little bit. Your proposal makes sense.

            Enabling this call to return more information in one RPC is interesting. Tools like Robinhood can benefit from a call which "give me everything in 1 RPC", like stats, layout, ...

            So, whatever this call will return with 1 RPC, the more, the better

            degremoa Aurelien Degremont (Inactive) added a comment - Actually I was also thinking about add a call which was not exactly get_lmd_info_fd() but take this opportunity to improve that a little bit. Your proposal makes sense. Enabling this call to return more information in one RPC is interesting. Tools like Robinhood can benefit from a call which "give me everything in 1 RPC", like stats, layout, ... So, whatever this call will return with 1 RPC, the more, the better

            People

              wc-triage WC Triage
              degremoa Aurelien Degremont (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: