[LU-3026] Failure on test suite sanity-benchmark test_iozone Created: 25/Mar/13 Updated: 09/Apr/13 Resolved: 09/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Severity: | 3 |
| Rank (Obsolete): | 7389 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/898c8c02-94ba-11e2-93c6-52540035b04c. The sub-test test_iozone failed with the following error:
I will try to reproduce this issue manually to get more information |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 26/Mar/13 ] |
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
3845408 512
Sanity check failed. Do not deploy this filesystem in a production environment !
It seems iozone detected some stability issue in the FS. 5.00% of last 100 executions |
| Comment by Peter Jones [ 28/Mar/13 ] |
|
Minh Could you please see if you are able to reproduce this and provide more data? Thanks Peter |
| Comment by Sarah Liu [ 28/Mar/13 ] |
|
Also seen in DNE testing(1MDS/2MDTs) |
| Comment by Keith Mannthey (Inactive) [ 29/Mar/13 ] |
|
U-3060 After upgrade from 1.8.9 to 2.4, hit FAIL: iozone did not fail with EDQUOT Looks to be the same issue in a different context? |
| Comment by Minh Diep [ 29/Mar/13 ] |
|
Initially, this looks like |
| Comment by Minh Diep [ 30/Mar/13 ] |
|
I've done more runs and found that iozone only fail in auster run which use sanityusr to run. It passed when iozone run as root. A look at the iozone source found that ftruncate on a 0-byte file failed. I wrote a little program below to confirm this issue. If we run the executable as non-root, it will fail with "permission denied". This seem like a layout lock issue. #include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#define string_len 0
main() {
int file_descriptor;
char fn[]="write.file";
struct stat st;
if ((file_descriptor = open(fn, (int)O_CREAT|O_WRONLY,0)) < 0)
perror("creat() error");
else {
if (ftruncate(file_descriptor, 0) != 0)
perror("ftruncate() error");
else {
fstat(file_descriptor, &st);
printf("the file has %ld bytes\n", (long) st.st_size);
}
close(file_descriptor);
}
}
~ |
| Comment by Jinshan Xiong (Inactive) [ 01/Apr/13 ] |
|
It turns out this is a problem of permission check on the MDT where __mdd_permission_internal() returns -EACCES because the file was created without access mode setting correctly. We can fix this problem by bypassing permission check on the MDT if it the permission check passed with file handler on the client side. |
| Comment by Minh Diep [ 03/Apr/13 ] |
|
I have tracked this down to https://build.whamcloud.com/job/lustre-master/1334/ start breaking iozone |
| Comment by Jinshan Xiong (Inactive) [ 03/Apr/13 ] |
|
patch is at here: http://review.whamcloud.com/5924, this is imported by bb68e4c1. |
| Comment by Minh Diep [ 03/Apr/13 ] |
|
I verified that the patch worked on iozone |
| Comment by Alex Zhuravlev [ 05/Apr/13 ] |
|
> It turns out this is a problem of permission check on the MDT where __mdd_permission_internal() returns -EACCES because the file was created without access mode setting correctly. was created without access mode setting correctly? please clarify on this. |
| Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ] |
Reproduce program is there, please try it. |
| Comment by Alex Zhuravlev [ 08/Apr/13 ] |
|
well, I was rather asking for a better explanation.. |
| Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ] |
|
ah I thought you're questioning how the file was created. For VFS, ftruncate does permission check by file's f_mode, since the file was created with O_WRONLY so it can pass w/o any problem. However, we do permission check on the MDT by inode mode which is not set at all. Ideally we should pass along file's mode but I choose to set MDS_OWNEROVERRIDE which seems easier. But I don't understand the code in mdd_fix_attr(): if (la->la_valid & (LA_SIZE | LA_BLOCKS)) { if (!((flags & MDS_OWNEROVERRIDE) && (uc->uc_fsuid == tmp_la->la_uid)) && !(flags & MDS_PERM_BYPASS)) { rc = mdd_permission_internal(env, obj, tmp_la, MAY_WRITE); if (rc != 0) RETURN(rc); } } why it checks permission only if size and blocks are going to changed. |
| Comment by Alex Zhuravlev [ 08/Apr/13 ] |
|
are you saying iozone changes file's rights or own uid/gid once the file is open? if, so then the patch should be OK. given we do not pass openhandle to setattr, passing f_mode to MDS is not any better than MDS_OWNEROVERRIDE, but more code. the code pasted above seem to be solving this specific problem - to let process passed permission checks in open to be able to modify the file even if the actual rights have changed. would be good to mention the case in the code or in the patch. |
| Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ] |
No, there is no changing mode after file is opened. The file is created with O_WRONLY but mode is not set in open(2).
To be honest, I don't think it can pass permission check in anyways. However, changing rights after opening the file is a tricky case so I'm okay with the current code |
| Comment by Alex Zhuravlev [ 08/Apr/13 ] |
|
> No, there is no changing mode after file is opened. The file is created with O_WRONLY but mode is not set in open(2). hmm, then why current rights is not enough to proceed? |
| Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ] |
|
The file is created w/o mode: [root@jupiter ~]# ls -l /mnt/lustre/ttt/write.file |
| Comment by Alex Zhuravlev [ 08/Apr/13 ] |
|
ah, nice. thanks. then sure, MDS_OWNEROVERRIDE sounds correct. |
| Comment by Alex Zhuravlev [ 08/Apr/13 ] |
|
oh, wait.. but this should apply only to ftruncate()? how do we make sure regular truncate() won't bypass the checks? |
| Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ] |
|
for truncate(), it won't pass in kernel check on the client side. |
| Comment by Peter Jones [ 09/Apr/13 ] |
|
Landed for 2.4 |