[LU-307] Test failure on test suite parallel-scale ior Created: 11/May/11  Updated: 14/Jun/11  Resolved: 09/Jun/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4995

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/64e9dc92-7c07-11e0-b5bf-52540025f9af.

Here is more information of this failure:
http://jira.whamcloud.com/browse/LU-163?focusedCommentId=13721&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_13721



 Comments   
Comment by nasf (Inactive) [ 17/May/11 ]

The failure of IOR on NFSv3 against Lustre is related with NFSv3 protocol. There are several means to create OST objects:

1) Normal open_create with write mode, inherit stripe attribute from parent.
2) If the file created with "OPEN_DELAY_CREATE" flags, or open without write mode, then the OST objects will be created when next open with write mode, inherit stripe attribute from parent.
3) lfs setstripe directly.

In our IOR test on NFSv3 against Lustre, the test program sets stripe information for parent directory firstly with stripe_count "-1", means child files should be distributed to all OSTs. Then creates target file under such directory without write mode, so at that time its stripe information is NULL. After that, nfs_write triggers separate Lustre open RPC with write mode, then MDS tries to create related OST objects. Unfortunately, at such point, Lustre client does not tell MDS which is the parent, so MDS does not know how to get the stripe information to create related OST objects for the target file, it has to use default mode – single stripe on some OST, so caused the target file only has one stripe, and then got ENOSPC error.

As for touching file through NFSv3 client does not generate stripe information, I think it is normal, delay create is one of our policy for accelerating open_create. Lustre does not promise when the OST objects will be created. We just need to guarantee related OST objects are created when used.

Comment by nasf (Inactive) [ 17/May/11 ]

The patch is to be inspected:

http://review.whamcloud.com/#change,557

Sarah, it pass my local test, you can verify it.

Comment by nasf (Inactive) [ 18/May/11 ]

There are two possible solutions:

1) Lustre client parses out parent's fid from NFS_FH, and transfers such fid to MDS when open (will create OST objects), then MDS can get related default stripe attributes from the parent to create related OST objects. The shortcomings are:
1.1) We cannot guarantee the parent's fid transferred to MDS is valid, because others maybe rename the file and unlink the parent. But I think it is acceptable for NFS use cases, we can tell NFS client with ESTALE or similar.
1.2) Before NFS client triggers open (for create), related file maybe renamed/linked, so the parent maybe changed or have several patents, those parents maybe have different default stripe attributes. It is common issue for both NFS use cases, and normal Lustre client.

2) When MDS creates an regular file with OST objects delay created, then stores parent's default stripe attributes as file's extend attributes. Then when needs to create OST objects later, MDS can get related stripe attributes from the file's extent attributed without to know which is its parent. And it also resolves the issues related with link/rename before creating OST objects. The shortcomings are:
2.1) it introduces new extend attributes, which decreases the performance of some create (mknod or with O_LOV_DELAY_CREATE flags) operations.
2.2) it changes on-disk format, so works on old format disk, related OST objects delay creating operations will meet LU-307.

Compare about two solutions, 1) is easier, I have made patch for that.
http://review.whamcloud.com/#change,557

What are your options?

Comment by Di Wang [ 18/May/11 ]

Just checked the code, if you open a file with DELAY_CREATE, it assumes the user will setstripe before real writing, otherwise, it will return Error (EBADF actually).

But if it does mknod for regular file, then open with write(like nfs3+lustre does), it indeed does not take into account default stripe of parent. I would prefer goes to 2, i.e. MDS set default stripe into the EA of the "empty" regular file, then in later write, it will create the object according to this default stripe, since it is the more "correct" way to go, compared with 1.

Another option might be just create the objects for mknod of regular file? does it break any rule?

Comment by nasf (Inactive) [ 18/May/11 ]

I also think 2) is more "correct" solution, but I am not sure whether we need to resolve 2.1) & 2.2).

As for create OST objects when mknod, I do not think it is good idea, although it is quite easy. Because Lustre never create OST objects when mknod for former releases, which is known to customer already. I am not sure whether some customers have build their system according to such assumption. If so, our fixes will cause trouble for them.

Andreas, what's your suggestion for that?

Comment by Andreas Dilger [ 18/May/11 ]

The only reason that we didn't create objects at mknod time is to allow the file to be created, then allow it to be opened with O_LOV_DELAY_CREATE so that ioctl(LL_IOC_LOV_SETSTRIPE) can be called on it. With the lustre-patched tar it also depends on files created with mknod() to not have objects, so that setxattr() can restore the original file striping.

When the "layout lock" patch lands (hopefully one of the first major features to be landed for 2.2) it includes the ability to change the file striping after the file is initially created, as long as it is 0 bytes in size. That, in turn, would allow Lustre to allocate objects on a new file at mknod() time instead of open() time without preventing the layout from being changed, and without having to add further complexity and incompatibility to the protocol.

In the meantime, this is definitely NOT a new bug (it has existed as long as Lustre has been able to re-export via NFS, though I didn't know about it until now, and nobody has ever complained) so I definitely do not think it is a blocker for the 2.1 release. I think a temporary workaround for this test might be to use the "lustrestripecount" parameter to IOR (available if IOR is compiled with "-D_USE_LUSTRE") to have IOR set the new file striping itself.

Comment by nasf (Inactive) [ 19/May/11 ]

Thanks Andreas. It is clear now, we will not fix mknod logic. As for how to make lustre re-export via NFS to work as a temporary solution, I think my current patch (http://review.whamcloud.com/#change,557) is better than recompiling IOR. Because such patch can make most NFS applications to work, not only IOR. On the other hand, it does not change protocol (neither on-wire nor on-disk), and not too complex, since parent's fid is part of NFS_FH already.

Would you please to give an inspection for such patch, and then decide whether it can be used or not?

Comment by nasf (Inactive) [ 29/May/11 ]

The latest patch to be verified:

http://review.whamcloud.com/#change,557 set 4

Comment by Sarah Liu [ 31/May/11 ]

I tried this patch, but the problem is still there with nfs v3. Or am I miss something of this patch?

cat /etc/exports
/mnt/lustre *(rw,all_squash,anonuid=500,anongid=500)

#client-15 is lustre client/nfs server
[root@client-15 ~]# lfs getstripe /mnt/lustre/test1
/mnt/lustre/test1
stripe_count: 2 stripe_size: 0 stripe_offset: -1
#client-18 is nfs client

[root@client-18 ~]# touch /mnt/lustre/test1/f
[root@client-15 ~]# lfs getstripe /mnt/lustre/test1/f
/mnt/lustre/test1/f has no stripe info

Comment by nasf (Inactive) [ 31/May/11 ]

Currently, touch file through nfs3 client (re-exported by lustre) will not create OST objects at once, they will be created when open for write next time. So above situations are expected. Please verify IOR through nfs client.

Comment by Sarah Liu [ 31/May/11 ]

IOR passed on RHEL5/NFSv3 but failed on RHEL5/NFSv4

https://maloo.whamcloud.com/test_sets/90ebfc00-8bd2-11e0-aab9-52540025f9af
https://maloo.whamcloud.com/test_sets/657f4cda-8bd6-11e0-aab9-52540025f9af

Comment by nasf (Inactive) [ 06/Jun/11 ]

With the following patches applied, most of the test cases in parallel_scale against lustre re-export through NFS work well, including NFSv3/4 against RHEL5/6, except for lock related test (test7) for connectathon.

http://review.whamcloud.com/#change,557
http://review.whamcloud.com/#change,886
http://review.whamcloud.com/#change,892

Oleg, I think the patch you made for LU-104 (http://jira.whamcloud.com/browse/LU-104) should fix all lock related issues in parallel_scale connectathon test. But it is broken again, maybe you have more ideas about the failure.

Comment by Andreas Dilger [ 08/Jun/11 ]

Should we land a version of this onto b1_8, so that there are no compatibility issues with 1.8 clients on 2.x servers, or is it enough that we tell users to do NFS file serving from 2.x clients when they upgrade to 2.x servers?

Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/file.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_lov.c
  • lustre/llite/llite_internal.h
  • lustre/mdt/mdt_open.c
  • lustre/llite/llite_nfs.c
  • lustre/include/md_object.h
  • lustre/mdt/mdt_internal.h
  • lustre/llite/llite_lib.c
  • lustre/include/lustre_mds.h
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/include/md_object.h
  • lustre/mdt/mdt_handler.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdd/mdd_lov.c
  • lustre/llite/llite_internal.h
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_nfs.c
  • lustre/llite/llite_lib.c
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdt/mdt_internal.h
  • lustre/llite/file.c
  • lustre/llite/llite_nfs.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_lov.c
  • lustre/llite/llite_lib.c
  • lustre/mdt/mdt_handler.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_internal.h
  • lustre/include/md_object.h
  • lustre/mdt/mdt_open.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/include/md_object.h
  • lustre/llite/llite_internal.h
  • lustre/mdt/mdt_open.c
  • lustre/include/lustre_mds.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdt/mdt_internal.h
  • lustre/llite/llite_lib.c
  • lustre/llite/file.c
  • lustre/mdd/mdd_lov.c
  • lustre/llite/llite_nfs.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/llite_lib.c
  • lustre/llite/file.c
  • lustre/llite/llite_internal.h
  • lustre/include/md_object.h
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre_mds.h
  • lustre/mdt/mdt_open.c
  • lustre/mdd/mdd_lov.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_handler.c
  • lustre/llite/llite_nfs.c
  • lustre/mdt/mdt_internal.h
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/llite_nfs.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/llite_lib.c
  • lustre/llite/llite_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/mdt/mdt_handler.c
  • lustre/llite/file.c
  • lustre/mdd/mdd_lov.c
  • lustre/include/md_object.h
  • lustre/mdt/mdt_internal.h
  • lustre/mdt/mdt_open.c
  • lustre/include/lustre_mds.h
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,client,el5,ofa #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/file.c
  • lustre/mdt/mdt_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre_mds.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/md_object.h
  • lustre/llite/llite_internal.h
  • lustre/llite/llite_nfs.c
  • lustre/mdd/mdd_lov.c
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/llite/llite_lib.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdd/mdd_lov.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdd/mdd_object.c
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_nfs.c
  • lustre/llite/llite_internal.h
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdt/mdt_internal.h
  • lustre/include/md_object.h
  • lustre/llite/llite_lib.c
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/llite_lib.c
  • lustre/mdt/mdt_internal.h
  • lustre/include/md_object.h
  • lustre/llite/llite_nfs.c
  • lustre/include/lustre_mds.h
  • lustre/mdd/mdd_lov.c
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_handler.c
  • lustre/mdd/mdd_object.c
  • lustre/llite/llite_internal.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/llite_nfs.c
  • lustre/include/md_object.h
  • lustre/mdt/mdt_open.c
  • lustre/llite/llite_lib.c
  • lustre/llite/llite_internal.h
  • lustre/mdd/mdd_lov.c
  • lustre/include/lustre_mds.h
  • lustre/mdt/mdt_handler.c
  • lustre/llite/file.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_internal.h
  • lustre/mdd/mdd_object.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/llite/llite_internal.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_open.c
  • lustre/mdt/mdt_internal.h
  • lustre/mdt/mdt_handler.c
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_lib.c
  • lustre/llite/file.c
  • lustre/mdd/mdd_lov.c
  • lustre/include/md_object.h
  • lustre/mdd/mdd_object.c
  • lustre/llite/llite_nfs.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdt/mdt_internal.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdd/mdd_lov.c
  • lustre/llite/file.c
  • lustre/llite/llite_lib.c
  • lustre/llite/llite_nfs.c
  • lustre/include/lustre_mds.h
  • lustre/mdt/mdt_handler.c
  • lustre/llite/llite_internal.h
  • lustre/mdt/mdt_open.c
  • lustre/include/md_object.h
  • lustre/mdd/mdd_object.c
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-master » i686,server,el5,ofa #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdt/mdt_handler.c
  • lustre/llite/llite_nfs.c
  • lustre/include/md_object.h
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_lib.c
  • lustre/mdt/mdt_internal.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_open.c
  • lustre/llite/llite_internal.h
  • lustre/mdd/mdd_object.c
  • lustre/llite/file.c
  • lustre/mdd/mdd_lov.c
Comment by Peter Jones [ 09/Jun/11 ]

Andreas

I chatted directly with FanYong about this. At this stage I think that we will not land an equivalent 1.8.x patch but we could revisit this for a future 1.8.x maintenance release if we find that there is sufficient demand for running 1.8.x clients with 2.x servers in conjunction with NFS re-esxports.

Thanks for being vigilant and posing questions such as this

Peter

Comment by Build Master (Inactive) [ 09/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #159
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/include/md_object.h
  • lustre/mdt/mdt_open.c
  • lustre/llite/llite_internal.h
  • lustre/include/lustre_mds.h
  • lustre/llite/llite_nfs.c
  • lustre/mdt/mdt_internal.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/llite_lib.c
  • lustre/mdd/mdd_object.c
  • lustre/mdd/mdd_lov.c
  • lustre/llite/file.c
  • lustre/mdt/mdt_handler.c
Comment by Oleg Drokin [ 09/Jun/11 ]

After applying http://review.whamcloud.com/923 (LU-405) on top of the previously mentioned 3 patches I cannot reproduce any NFS problems in my local testing)

Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #170
LU-307 Send parent FID from client to MDS on NFS open for striping info

Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
Files :

  • lustre/mdt/mdt_internal.h
  • lustre/llite/llite_lib.c
  • lustre/llite/llite_internal.h
  • lustre/include/md_object.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/mdt/mdt_handler.c
  • lustre/llite/file.c
  • lustre/include/lustre_mds.h
  • lustre/mdt/mdt_open.c
  • lustre/mdd/mdd_lov.c
  • lustre/mdd/mdd_object.c
  • lustre/llite/llite_nfs.c
Generated at Sat Feb 10 01:05:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.