[LU-307] Test failure on test suite parallel-scale ior Created: 11/May/11 Updated: 14/Jun/11 Resolved: 09/Jun/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4995 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/64e9dc92-7c07-11e0-b5bf-52540025f9af. Here is more information of this failure: |
| Comments |
| Comment by nasf (Inactive) [ 17/May/11 ] |
|
The failure of IOR on NFSv3 against Lustre is related with NFSv3 protocol. There are several means to create OST objects: 1) Normal open_create with write mode, inherit stripe attribute from parent. In our IOR test on NFSv3 against Lustre, the test program sets stripe information for parent directory firstly with stripe_count "-1", means child files should be distributed to all OSTs. Then creates target file under such directory without write mode, so at that time its stripe information is NULL. After that, nfs_write triggers separate Lustre open RPC with write mode, then MDS tries to create related OST objects. Unfortunately, at such point, Lustre client does not tell MDS which is the parent, so MDS does not know how to get the stripe information to create related OST objects for the target file, it has to use default mode – single stripe on some OST, so caused the target file only has one stripe, and then got ENOSPC error. As for touching file through NFSv3 client does not generate stripe information, I think it is normal, delay create is one of our policy for accelerating open_create. Lustre does not promise when the OST objects will be created. We just need to guarantee related OST objects are created when used. |
| Comment by nasf (Inactive) [ 17/May/11 ] |
|
The patch is to be inspected: http://review.whamcloud.com/#change,557 Sarah, it pass my local test, you can verify it. |
| Comment by nasf (Inactive) [ 18/May/11 ] |
|
There are two possible solutions: 1) Lustre client parses out parent's fid from NFS_FH, and transfers such fid to MDS when open (will create OST objects), then MDS can get related default stripe attributes from the parent to create related OST objects. The shortcomings are: 2) When MDS creates an regular file with OST objects delay created, then stores parent's default stripe attributes as file's extend attributes. Then when needs to create OST objects later, MDS can get related stripe attributes from the file's extent attributed without to know which is its parent. And it also resolves the issues related with link/rename before creating OST objects. The shortcomings are: Compare about two solutions, 1) is easier, I have made patch for that. What are your options? |
| Comment by Di Wang [ 18/May/11 ] |
|
Just checked the code, if you open a file with DELAY_CREATE, it assumes the user will setstripe before real writing, otherwise, it will return Error (EBADF actually). But if it does mknod for regular file, then open with write(like nfs3+lustre does), it indeed does not take into account default stripe of parent. I would prefer goes to 2, i.e. MDS set default stripe into the EA of the "empty" regular file, then in later write, it will create the object according to this default stripe, since it is the more "correct" way to go, compared with 1. Another option might be just create the objects for mknod of regular file? does it break any rule? |
| Comment by nasf (Inactive) [ 18/May/11 ] |
|
I also think 2) is more "correct" solution, but I am not sure whether we need to resolve 2.1) & 2.2). As for create OST objects when mknod, I do not think it is good idea, although it is quite easy. Because Lustre never create OST objects when mknod for former releases, which is known to customer already. I am not sure whether some customers have build their system according to such assumption. If so, our fixes will cause trouble for them. Andreas, what's your suggestion for that? |
| Comment by Andreas Dilger [ 18/May/11 ] |
|
The only reason that we didn't create objects at mknod time is to allow the file to be created, then allow it to be opened with O_LOV_DELAY_CREATE so that ioctl(LL_IOC_LOV_SETSTRIPE) can be called on it. With the lustre-patched tar it also depends on files created with mknod() to not have objects, so that setxattr() can restore the original file striping. When the "layout lock" patch lands (hopefully one of the first major features to be landed for 2.2) it includes the ability to change the file striping after the file is initially created, as long as it is 0 bytes in size. That, in turn, would allow Lustre to allocate objects on a new file at mknod() time instead of open() time without preventing the layout from being changed, and without having to add further complexity and incompatibility to the protocol. In the meantime, this is definitely NOT a new bug (it has existed as long as Lustre has been able to re-export via NFS, though I didn't know about it until now, and nobody has ever complained) so I definitely do not think it is a blocker for the 2.1 release. I think a temporary workaround for this test might be to use the "lustrestripecount" parameter to IOR (available if IOR is compiled with "-D_USE_LUSTRE") to have IOR set the new file striping itself. |
| Comment by nasf (Inactive) [ 19/May/11 ] |
|
Thanks Andreas. It is clear now, we will not fix mknod logic. As for how to make lustre re-export via NFS to work as a temporary solution, I think my current patch (http://review.whamcloud.com/#change,557) is better than recompiling IOR. Because such patch can make most NFS applications to work, not only IOR. On the other hand, it does not change protocol (neither on-wire nor on-disk), and not too complex, since parent's fid is part of NFS_FH already. Would you please to give an inspection for such patch, and then decide whether it can be used or not? |
| Comment by nasf (Inactive) [ 29/May/11 ] |
|
The latest patch to be verified: |
| Comment by Sarah Liu [ 31/May/11 ] |
|
I tried this patch, but the problem is still there with nfs v3. Or am I miss something of this patch? cat /etc/exports #client-15 is lustre client/nfs server [root@client-18 ~]# touch /mnt/lustre/test1/f |
| Comment by nasf (Inactive) [ 31/May/11 ] |
|
Currently, touch file through nfs3 client (re-exported by lustre) will not create OST objects at once, they will be created when open for write next time. So above situations are expected. Please verify IOR through nfs client. |
| Comment by Sarah Liu [ 31/May/11 ] |
|
IOR passed on RHEL5/NFSv3 but failed on RHEL5/NFSv4 https://maloo.whamcloud.com/test_sets/90ebfc00-8bd2-11e0-aab9-52540025f9af |
| Comment by nasf (Inactive) [ 06/Jun/11 ] |
|
With the following patches applied, most of the test cases in parallel_scale against lustre re-export through NFS work well, including NFSv3/4 against RHEL5/6, except for lock related test (test7) for connectathon. http://review.whamcloud.com/#change,557 Oleg, I think the patch you made for |
| Comment by Andreas Dilger [ 08/Jun/11 ] |
|
Should we land a version of this onto b1_8, so that there are no compatibility issues with 1.8 clients on 2.x servers, or is it enough that we tell users to do NFS file serving from 2.x clients when they upgrade to 2.x servers? |
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Build Master (Inactive) [ 08/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Peter Jones [ 09/Jun/11 ] |
|
Andreas I chatted directly with FanYong about this. At this stage I think that we will not land an equivalent 1.8.x patch but we could revisit this for a future 1.8.x maintenance release if we find that there is sufficient demand for running 1.8.x clients with 2.x servers in conjunction with NFS re-esxports. Thanks for being vigilant and posing questions such as this Peter |
| Comment by Build Master (Inactive) [ 09/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|
| Comment by Oleg Drokin [ 09/Jun/11 ] |
|
After applying http://review.whamcloud.com/923 ( |
| Comment by Build Master (Inactive) [ 14/Jun/11 ] |
|
Integrated in Oleg Drokin : 255e37f1639fa4edec5b929228afe7c0e8b56724
|