[LU-3849] Client cache directory information is not updated Created: 28/Aug/13 Updated: 16/Oct/13 Resolved: 11/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastien Buisson (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9968 | ||||||||
| Description |
|
Hi, On a customer cluster, we noticed that Lustre client cache directory information is not updated when directory owner or access mode changes. If a user tries to change owner/group or directory mode and then looks for the change from another node, the change is not visible. Moreover, the more we look, the more the cache update is delayed. We were suspecting troubles related to This problem is easy to reproduce: Node 2 : add directory in node2 cache Node 1 : update directory Node 2 : look for the change Thanks, |
| Comments |
| Comment by Peter Jones [ 28/Aug/13 ] |
|
Lai Could you please advise on this one? Thanks Peter |
| Comment by Lai Siyao [ 29/Aug/13 ] |
|
This issue should have existed for quite some time:
But CR and PW mode are compatible, so the first lock is not canceled, and client get old stats. Changing setattr lock mode from PW to EX can fix this, I'll commit a fix later. |
| Comment by Alexey Lyashkov [ 29/Aug/13 ] |
|
dup of |
| Comment by Sebastien Buisson (Inactive) [ 29/Aug/13 ] |
|
Alex, You mean patch at http://review.whamcloud.com/6460 will fix this issue? |
| Comment by Alexey Lyashkov [ 29/Aug/13 ] |
|
i'm not a fully sure, but that patch disable matching a some locks's locally, a specially client will skip open lock if they need attributes for an inode (http://review.whamcloud.com/#/c/6460/3/lustre/mdc/mdc_locks.c). also readdir isn't create a lock to match to get an valid attributes, so i suggest to ask you to test that patch before Liang will create a new patch. |
| Comment by Patrick Valentin (Inactive) [ 30/Aug/13 ] |
|
I have reproduced this error on lustre 2.1.6 and lustre 2.4.0. I have backported this patch on lustre 2.4 and I will test it next week. I have made some additional tests, and I have noticed that the information printed by "ls" is correct if we specify the directory. But if we issue a chmod/chown/chgrp/mkdir/rmdir on "node 1", and then a "ls" on "node 2", without specifying a "directory name", and whatever option is specified, then the inode information printed does not reflect the last change made on "node 1". Moreover, if we issue a second "ls" command, but this time adding "." or "$PWD", the printed inode information is still bad. Looking at "ls" source code, it appears that the code starts with an "opendir()" and a loop of "readdir()" on the current directory, when no file or directory is specified. If a name is specified, then the command starts by "stat()" on this name. I wrote the following simple command, allowing to show that the culprit is readdir(). The command first issues an opendir() of "." and a loop of readdir()/printf() , but only if an argument is specified. #include <sys/types.h>
#include <errno.h>
#include <sys/stat.h>
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
main(int argc, char **argv)
{
struct stat dir_stat;
DIR *dirp;
struct dirent *next;
if (argc > 1) {
dirp = opendir(".");
if (! dirp) {
perror("opendir: ");
exit(1);
}
while (next = readdir(dirp)) {
printf("%s ", next->d_name);
}
printf("\n");
closedir(dirp);
}
if (stat(".", &dir_stat) < 0) {
perror("stat: ");
exit(1);
}
printf("st_nlink %d, st_mode 0%ho, st_uid 0x%hx, st_gid 0x%hx\n", dir_stat.st_nlink, dir_stat.st_mode, dir_stat.st_uid, dir_stat.st_gid);
exit(0);
}
To reproduce the error, you have to run the chmod/chown/chgrp/mkdir/rmdir commands on "node 1", and after each one, run "my_stat --" on "node 2". Now re-issue one of the commands on "node 1", and run "my_stat" without argument on "node 2". All the changes made previously are now visible. I have also checked that the problem described in for i in `seq 1 10`; do
mkdir d$i
my_stat --
done
mkdir last
my_stat
During the loop, the st_nlink is not correct (always 2, instead of 3, 4, 5, ...). |
| Comment by Lai Siyao [ 03/Sep/13 ] |
|
In my test I found another issue:
|
| Comment by Alexey Lyashkov [ 03/Sep/13 ] |
|
Lyang, problem is - ll_get_dir_page take a lock but don't take a intent body so don't update inode with info from server. > During the loop, the st_nlink is not correct (always 2, instead of 3, 4, 5, ...). |
| Comment by Patrick Valentin (Inactive) [ 05/Sep/13 ] |
|
Alexey, Lai, |
| Comment by Alexey Lyashkov [ 05/Sep/13 ] |
|
Thanks to testing! |
| Comment by Christopher Morrone [ 06/Sep/13 ] |
|
Alexey, the patch in question has a -1 code review which will need to be resolved (additional explanation, code fix, whatever is needed) before before landing can be expected. |
| Comment by Alexey Lyashkov [ 07/Sep/13 ] |
|
Christopher, I think we need to choose a different reviewer because questions is very strange and can't be implemented without MDT rewrite (a specially mdt_getattr_name_lock don't able to return lock without look to parent). |
| Comment by Nathaniel Clark [ 18/Sep/13 ] |
|
Alexey, Can you respond to Fan Young's comments in gerrit and maybe we can find a way to move forward with the patch? |
| Comment by Andreas Dilger [ 11/Oct/13 ] |
|
This problem is very similar to |