[LU-3026] Failure on test suite sanity-benchmark test_iozone Created: 25/Mar/13  Updated: 09/Apr/13  Resolved: 09/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: HB

Severity: 3
Rank (Obsolete): 7389

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/898c8c02-94ba-11e2-93c6-52540035b04c.

The sub-test test_iozone failed with the following error:

iozone (1) failed

I will try to reproduce this issue manually to get more information



 Comments   
Comment by Keith Mannthey (Inactive) [ 26/Mar/13 ]
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         3845408     512

Sanity check failed. Do not deploy this filesystem in a production environment !

It seems iozone detected some stability issue in the FS. 5.00% of last 100 executions

Comment by Peter Jones [ 28/Mar/13 ]

Minh

Could you please see if you are able to reproduce this and provide more data?

Thanks

Peter

Comment by Sarah Liu [ 28/Mar/13 ]

Also seen in DNE testing(1MDS/2MDTs)
https://maloo.whamcloud.com/test_sets/872b7398-9658-11e2-9abb-52540035b04c

Comment by Keith Mannthey (Inactive) [ 29/Mar/13 ]

U-3060 After upgrade from 1.8.9 to 2.4, hit FAIL: iozone did not fail with EDQUOT

Looks to be the same issue in a different context?

Comment by Minh Diep [ 29/Mar/13 ]

Initially, this looks like LU-2909. I am going to reproduce this and apply the patch to see if it fixes.

Comment by Minh Diep [ 30/Mar/13 ]

I've done more runs and found that iozone only fail in auster run which use sanityusr to run. It passed when iozone run as root.

A look at the iozone source found that ftruncate on a 0-byte file failed. I wrote a little program below to confirm this issue. If we run the executable as non-root, it will fail with "permission denied". This seem like a layout lock issue.

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

#define string_len 0

main() {
  int  file_descriptor;
  char fn[]="write.file";
  struct stat st;

  if ((file_descriptor = open(fn, (int)O_CREAT|O_WRONLY,0)) < 0)
    perror("creat() error");
  else {
      if (ftruncate(file_descriptor, 0) != 0)
        perror("ftruncate() error");
      else {
        fstat(file_descriptor, &st);
        printf("the file has %ld bytes\n", (long) st.st_size);
      } 
    close(file_descriptor);
  }
}

~

Comment by Jinshan Xiong (Inactive) [ 01/Apr/13 ]

It turns out this is a problem of permission check on the MDT where __mdd_permission_internal() returns -EACCES because the file was created without access mode setting correctly.

We can fix this problem by bypassing permission check on the MDT if it the permission check passed with file handler on the client side.

Comment by Minh Diep [ 03/Apr/13 ]

I have tracked this down to https://build.whamcloud.com/job/lustre-master/1334/ start breaking iozone

Comment by Jinshan Xiong (Inactive) [ 03/Apr/13 ]

patch is at here: http://review.whamcloud.com/5924, this is imported by bb68e4c1.

Comment by Minh Diep [ 03/Apr/13 ]

I verified that the patch worked on iozone

Comment by Alex Zhuravlev [ 05/Apr/13 ]

> It turns out this is a problem of permission check on the MDT where __mdd_permission_internal() returns -EACCES because the file was created without access mode setting correctly.

was created without access mode setting correctly? please clarify on this.

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

was created without access mode setting correctly? please clarify on this.

Reproduce program is there, please try it.

Comment by Alex Zhuravlev [ 08/Apr/13 ]

well, I was rather asking for a better explanation..

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

ah I thought you're questioning how the file was created.

For VFS, ftruncate does permission check by file's f_mode, since the file was created with O_WRONLY so it can pass w/o any problem. However, we do permission check on the MDT by inode mode which is not set at all.

Ideally we should pass along file's mode but I choose to set MDS_OWNEROVERRIDE which seems easier. But I don't understand the code in mdd_fix_attr():

                if (la->la_valid & (LA_SIZE | LA_BLOCKS)) {
                        if (!((flags & MDS_OWNEROVERRIDE) &&
                              (uc->uc_fsuid == tmp_la->la_uid)) &&
                            !(flags & MDS_PERM_BYPASS)) {
                                rc = mdd_permission_internal(env, obj,
                                                             tmp_la, MAY_WRITE);
                                if (rc != 0)
                                        RETURN(rc);
                        }
                }

why it checks permission only if size and blocks are going to changed.

Comment by Alex Zhuravlev [ 08/Apr/13 ]

are you saying iozone changes file's rights or own uid/gid once the file is open? if, so then the patch should be OK.

given we do not pass openhandle to setattr, passing f_mode to MDS is not any better than MDS_OWNEROVERRIDE, but more code.

the code pasted above seem to be solving this specific problem - to let process passed permission checks in open to be able to modify the file even if the actual rights have changed.

would be good to mention the case in the code or in the patch.

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

are you saying iozone changes file's rights or own uid/gid once the file is open? if, so then the patch should be OK.

No, there is no changing mode after file is opened. The file is created with O_WRONLY but mode is not set in open(2).

the code pasted above seem to be solving this specific problem - to let process passed permission checks in open to be able to modify the file even if the actual rights have changed.

To be honest, I don't think it can pass permission check in anyways. However, changing rights after opening the file is a tricky case so I'm okay with the current code

Comment by Alex Zhuravlev [ 08/Apr/13 ]

> No, there is no changing mode after file is opened. The file is created with O_WRONLY but mode is not set in open(2).

hmm, then why current rights is not enough to proceed?

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

The file is created w/o mode:

[root@jupiter ~]# ls -l /mnt/lustre/ttt/write.file
---------- 1 root root 0 Apr 6 09:06 /mnt/lustre/ttt/write.file

Comment by Alex Zhuravlev [ 08/Apr/13 ]

ah, nice. thanks. then sure, MDS_OWNEROVERRIDE sounds correct.

Comment by Alex Zhuravlev [ 08/Apr/13 ]

oh, wait.. but this should apply only to ftruncate()? how do we make sure regular truncate() won't bypass the checks?

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

for truncate(), it won't pass in kernel check on the client side.

Comment by Peter Jones [ 09/Apr/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:30:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.