Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3381

lock enqueue fails for open(".lustre/fid", ...)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • Lustre 2.4.0
    • Lustre 2.4.0
    • Lustre 2.4 client

    Description

      While watching the performance reported by Robinhood while it was scanning a test filesystem with 24 million files I noticed something strange. Performance was gradually decreasing over time because the STAGE_GET_INFO_FS stage was taking longer and longer. That's not good.

      May 22 13:31:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   1 | Done:   0 | Total:  20898 | ms/op: 1.00
      May 22 13:32:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total:  54724 | ms/op: 0.49
      May 22 13:33:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total:  95518 | ms/op: 0.36
      May 22 13:34:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 133057 | ms/op: 0.31
      May 22 13:35:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 141859 | ms/op: 0.31
      May 22 13:36:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   0 | Done:   0 | Total: 173883 | ms/op: 0.29
      May 22 13:37:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 210361 | ms/op: 0.58
      May 22 13:38:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   3 | Done:   0 | Total: 262009 | ms/op: 1.49
      May 22 13:39:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 316537 | ms/op: 2.07
      May 22 13:40:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   4 | Done:   1 | Total: 373078 | ms/op: 2.47
      May 22 13:41:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 432204 | ms/op: 2.74
      May 22 13:42:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 492933 | ms/op: 2.93
      May 22 13:43:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   2 | Done:   0 | Total: 554324 | ms/op: 3.10
      May 22 13:44:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   7 | Done:   0 | Total: 612107 | ms/op: 3.25
      May 22 13:45:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 665969 | ms/op: 3.40
      May 22 13:46:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 720286 | ms/op: 3.52
      May 22 13:47:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   0 | Total: 776099 | ms/op: 3.62
      May 22 13:48:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   6 | Done:   1 | Total: 834779 | ms/op: 3.72
      May 22 13:49:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 879068 | ms/op: 3.90
      May 22 13:50:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   4 | Done:   0 | Total: 932166 | ms/op: 4.01
      May 22 13:51:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   0 | Total: 986457 | ms/op: 4.11
      May 22 13:52:48 zeno66 robinhood[12124]: STATS |  2: STAGE_GET_INFO_FS    | Wait:     0 | Curr:   5 | Done:   1 | Total: 1014814 | ms/op: 4.31
      ...
      

      After a little investigation it was clear that the open("./lustre/fid", O_RDONLY) call was getting on average slower and slower. It could take as little as 0.1ms or as long as 15ms from what I observed. The subsequent GETSTRIPE ioctl() was always fast.

      The Lustre debug logs show that all the time is being spent in do_lookup() because there is never a valid dentry on the client. That means for every open we attempt to enqueue a lock for this special file. That lock enqueue fails on the MDT with ELDLM_LOCK_ABORTED but the open still seems to succeed. To make maters worse the client serials the enqueues because they are all IT_OPEN.

      00010000:00010000:0.0:1369264019.850032:0:11631:0:(ldlm_lockd.c:1426:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=301, rc=0) ns: mdt-zeno-MDT0000_UUID lock: ffff8802f011a000/0xed969b9b44bc7d27 lrc: 1/0,0 mode: --/CR res: 8589934594/1 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: 172.19.1.68@o2ib199 remote: 0xe4f855be29d7308d expref: 12 pid: 11631 timeout: 0 lvb_type: 0
      

      To get a handle on what was going on I wrote a trivial reproducer which just opens and closes a specified file repeatedly. If you run just a few iterations and grab the Lustre debug logs you can easily see the failing enqueues.

      I suspect if the enqueues were allowed to succeed this would be a non-issue since we would have a valid read lock on the client. However, I haven't tested that and it's not at all clear to me what you guys are planning to do with that .lustre directory. Perhaps you can propose a fix.

      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <errno.h>
      
      int
      main(int argc, char **argv)
      {
              long int start, last;
              int error, fd, i, j, i_max, j_max;
              char *path;
      
              start = last = time(0);
              path = argv[1];
              i_max = strtol(argv[2], NULL, 10);
              j_max = strtol(argv[3], NULL, 10);
      
              printf("file:  %s\n", path);
              printf("i_max: %d\n", i_max);
              printf("j_max: %d\n", j_max);
      
              for (i = 0; i < i_max; i++) {
                      printf("%4d / %4d - %ld sec total, %ld sec delta\n", i, i_max,
                          time(0) - start, time(0) - last);
                      last = time(0);
      
                      for (j = 0; j < j_max; j++) {
                              fd = open(path, O_RDONLY);
                              if (fd == -1) {
                                      printf("open() = %d\n", errno);
                                      return (1);
                              }
      
                              error = close(fd);
                              if (error) {
                                      printf("close() = %d\n", errno);
                                      return (1);
                              }
                      }
              }
      
              return (0);
      }
      

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: