Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
Lustre 2.4.0
-
Lustre 2.4 client
-
3
-
8372
Description
While watching the performance reported by Robinhood while it was scanning a test filesystem with 24 million files I noticed something strange. Performance was gradually decreasing over time because the STAGE_GET_INFO_FS stage was taking longer and longer. That's not good.
May 22 13:31:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 1 | Done: 0 | Total: 20898 | ms/op: 1.00 May 22 13:32:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 54724 | ms/op: 0.49 May 22 13:33:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 95518 | ms/op: 0.36 May 22 13:34:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 133057 | ms/op: 0.31 May 22 13:35:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 141859 | ms/op: 0.31 May 22 13:36:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 173883 | ms/op: 0.29 May 22 13:37:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 210361 | ms/op: 0.58 May 22 13:38:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 3 | Done: 0 | Total: 262009 | ms/op: 1.49 May 22 13:39:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 316537 | ms/op: 2.07 May 22 13:40:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 4 | Done: 1 | Total: 373078 | ms/op: 2.47 May 22 13:41:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 432204 | ms/op: 2.74 May 22 13:42:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 492933 | ms/op: 2.93 May 22 13:43:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 2 | Done: 0 | Total: 554324 | ms/op: 3.10 May 22 13:44:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 7 | Done: 0 | Total: 612107 | ms/op: 3.25 May 22 13:45:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 665969 | ms/op: 3.40 May 22 13:46:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 720286 | ms/op: 3.52 May 22 13:47:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 0 | Total: 776099 | ms/op: 3.62 May 22 13:48:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 6 | Done: 1 | Total: 834779 | ms/op: 3.72 May 22 13:49:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 879068 | ms/op: 3.90 May 22 13:50:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 4 | Done: 0 | Total: 932166 | ms/op: 4.01 May 22 13:51:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 0 | Total: 986457 | ms/op: 4.11 May 22 13:52:48 zeno66 robinhood[12124]: STATS | 2: STAGE_GET_INFO_FS | Wait: 0 | Curr: 5 | Done: 1 | Total: 1014814 | ms/op: 4.31 ...
After a little investigation it was clear that the open("./lustre/fid", O_RDONLY) call was getting on average slower and slower. It could take as little as 0.1ms or as long as 15ms from what I observed. The subsequent GETSTRIPE ioctl() was always fast.
The Lustre debug logs show that all the time is being spent in do_lookup() because there is never a valid dentry on the client. That means for every open we attempt to enqueue a lock for this special file. That lock enqueue fails on the MDT with ELDLM_LOCK_ABORTED but the open still seems to succeed. To make maters worse the client serials the enqueues because they are all IT_OPEN.
00010000:00010000:0.0:1369264019.850032:0:11631:0:(ldlm_lockd.c:1426:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=301, rc=0) ns: mdt-zeno-MDT0000_UUID lock: ffff8802f011a000/0xed969b9b44bc7d27 lrc: 1/0,0 mode: --/CR res: 8589934594/1 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: 172.19.1.68@o2ib199 remote: 0xe4f855be29d7308d expref: 12 pid: 11631 timeout: 0 lvb_type: 0
To get a handle on what was going on I wrote a trivial reproducer which just opens and closes a specified file repeatedly. If you run just a few iterations and grab the Lustre debug logs you can easily see the failing enqueues.
I suspect if the enqueues were allowed to succeed this would be a non-issue since we would have a valid read lock on the client. However, I haven't tested that and it's not at all clear to me what you guys are planning to do with that .lustre directory. Perhaps you can propose a fix.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
int
main(int argc, char **argv)
{
long int start, last;
int error, fd, i, j, i_max, j_max;
char *path;
start = last = time(0);
path = argv[1];
i_max = strtol(argv[2], NULL, 10);
j_max = strtol(argv[3], NULL, 10);
printf("file: %s\n", path);
printf("i_max: %d\n", i_max);
printf("j_max: %d\n", j_max);
for (i = 0; i < i_max; i++) {
printf("%4d / %4d - %ld sec total, %ld sec delta\n", i, i_max,
time(0) - start, time(0) - last);
last = time(0);
for (j = 0; j < j_max; j++) {
fd = open(path, O_RDONLY);
if (fd == -1) {
printf("open() = %d\n", errno);
return (1);
}
error = close(fd);
if (error) {
printf("close() = %d\n", errno);
return (1);
}
}
}
return (0);
}
Attachments
Issue Links
- duplicates
-
LU-2613 opening and closing file can generate 'unreclaimable slab' space
-
- Resolved
-