Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
Lustre 2.4.0
-
Lustre 2.4 client
-
3
-
8372
Description
While watching the performance reported by Robinhood while it was scanning a test filesystem with 24 million files I noticed something strange. Performance was gradually decreasing over time because the STAGE_GET_INFO_FS stage was taking longer and longer. That's not good.
May 22 13:31:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 1 | Done: 0 | Total: 20898 | ms/op: 1.00 May 22 13:32:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 54724 | ms/op: 0.49 May 22 13:33:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 95518 | ms/op: 0.36 May 22 13:34:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 133057 | ms/op: 0.31 May 22 13:35:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 141859 | ms/op: 0.31 May 22 13:36:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 173883 | ms/op: 0.29 May 22 13:37:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 210361 | ms/op: 0.58 May 22 13:38:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 3 | Done: 0 | Total: 262009 | ms/op: 1.49 May 22 13:39:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 316537 | ms/op: 2.07 May 22 13:40:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 4 | Done: 1 | Total: 373078 | ms/op: 2.47 May 22 13:41:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 432204 | ms/op: 2.74 May 22 13:42:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 492933 | ms/op: 2.93 May 22 13:43:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 2 | Done: 0 | Total: 554324 | ms/op: 3.10 May 22 13:44:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 7 | Done: 0 | Total: 612107 | ms/op: 3.25 May 22 13:45:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 665969 | ms/op: 3.40 May 22 13:46:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 720286 | ms/op: 3.52 May 22 13:47:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 776099 | ms/op: 3.62 May 22 13:48:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 1 | Total: 834779 | ms/op: 3.72 May 22 13:49:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 879068 | ms/op: 3.90 May 22 13:50:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 4 | Done: 0 | Total: 932166 | ms/op: 4.01 May 22 13:51:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 986457 | ms/op: 4.11 May 22 13:52:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 1 | Total: 1014814 | ms/op: 4.31 ...
After a little investigation it was clear that the open("./lustre/fid", O_RDONLY) call was getting on average slower and slower. It could take as little as 0.1ms or as long as 15ms from what I observed. The subsequent GETSTRIPE ioctl() was always fast.
The Lustre debug logs show that all the time is being spent in do_lookup() because there is never a valid dentry on the client. That means for every open we attempt to enqueue a lock for this special file. That lock enqueue fails on the MDT with ELDLM_LOCK_ABORTED but the open still seems to succeed. To make maters worse the client serials the enqueues because they are all IT_OPEN.
00010000:00010000:0.0:1369264019.850032:0:11631:0:(ldlm_lockd.c:1426:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=301, rc=0) ns: mdt-zeno-MDT0000_UUID lock: ffff8802f011a000/0xed969b9b44bc7d27 lrc: 1/0,0 mode: --/CR res: 8589934594/1 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: 172.19.1.68@o2ib199 remote: 0xe4f855be29d7308d expref: 12 pid: 11631 timeout: 0 lvb_type: 0
To get a handle on what was going on I wrote a trivial reproducer which just opens and closes a specified file repeatedly. If you run just a few iterations and grab the Lustre debug logs you can easily see the failing enqueues.
I suspect if the enqueues were allowed to succeed this would be a non-issue since we would have a valid read lock on the client. However, I haven't tested that and it's not at all clear to me what you guys are planning to do with that .lustre directory. Perhaps you can propose a fix.
#include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> int main(int argc, char **argv) { long int start, last; int error, fd, i, j, i_max, j_max; char *path; start = last = time(0); path = argv[1]; i_max = strtol(argv[2], NULL, 10); j_max = strtol(argv[3], NULL, 10); printf("file: %s\n", path); printf("i_max: %d\n", i_max); printf("j_max: %d\n", j_max); for (i = 0; i < i_max; i++) { printf("%4d / %4d - %ld sec total, %ld sec delta\n", i, i_max, time(0) - start, time(0) - last); last = time(0); for (j = 0; j < j_max; j++) { fd = open(path, O_RDONLY); if (fd == -1) { printf("open() = %d\n", errno); return (1); } error = close(fd); if (error) { printf("close() = %d\n", errno); return (1); } } } return (0); }
Attachments
Issue Links
- duplicates
-
LU-2613 opening and closing file can generate 'unreclaimable slab' space
- Resolved