[LU-65] Interop testing results for 1.8.5.54 clients with Lustre 2.0.59 servers Created: 08/Feb/11 Updated: 28/Jun/11 Resolved: 29/Mar/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | James A Simmons | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
4 OSS, each with 7 OSTS with one MDS with a disk devoted to a MDT and another disk devoted to the MGS; all running lustre 2.0.59. I have 4 clients running lustre 1.8.5.54 |
||
| Attachments: |
|
| Severity: | 3 |
| Bugzilla ID: | 21,367 |
| Epic: | interop, results, test |
| Rank (Obsolete): | 5209 |
| Description |
|
These are the results of running the b1_8 acc_sm test. I also listed the results at https://bugzilla.lustre.org/show_bug.cgi?id=21367. sanity 64b - bug 22703 obdfilter-survey - locks the client up. No debug output LustreError: 13996:0:(mdt_handler.c:4521:mdt_init0()) CMD Operation not allowed in IOP mode I will provide more info and logs as very soon. |
| Comments |
| Comment by James A Simmons [ 08/Feb/11 ] |
|
Here are some lustre logs produced for the failed runs. |
| Comment by Peter Jones [ 08/Feb/11 ] |
|
Nasf Can you please look into this one? Thanks Peter |
| Comment by James A Simmons [ 09/Feb/11 ] |
|
Debug log from OBDFilter test 1b from 1.8.X test suite on 1.8 client |
| Comment by James A Simmons [ 09/Feb/11 ] |
|
No debug logs for Obdfilter-survey but I do have a dmesg that could be of some interest. For some reason it appears the client can't communicate with the OSS runing 2.X |
| Comment by James A Simmons [ 09/Feb/11 ] |
|
Also for Obdfilter_survery test 2b on the OSS running lustre 2.X I'm seeing in dmesg Lustre: DEBUG MARKER: == test 2b: Stripe F/S over the Network, async journal == 11:45:14 (1297269914) Their is no lustre-OST008_UUID for the OSS. Where is it getting this info? |
| Comment by nasf (Inactive) [ 10/Feb/11 ] |
|
Patch for "sanity 72b - bug 24226" is in inspection: |
| Comment by James A Simmons [ 10/Feb/11 ] |
|
That patch falls short of behaving correctly. Try this. touch $DIR/$tfile While it runs look at the file's permission. You will notice the suid bits are still there. If you try this on a ext[3,4] file system you will notice the suid bits are gone soon as you start writing to the file. The current lustre code handles the fixup of attr when the file is closed. It needs to be managed when the file is opened. I have a patch but it needs to be worked on since their exist the case of a suid file copied from a non lustre file system to a lustre file system to preserve those suid bits. The case is very specific for the removal of the suid. As soon as I have it working I will post a patch. |
| Comment by nasf (Inactive) [ 11/Feb/11 ] |
|
The SUID/SGID should be removed just when you start to write the file, which is the expected behavior. With the patch applied, Lustre-2.x behavior is the same as ext3/4 does, but not you mentioned, at least I can not reproduce it (means the SUID/SGID bits still exist when writes until file closed). Have you verified my patch? or your description is based on the test result against old Lustre code? |
| Comment by James A Simmons [ 11/Feb/11 ] |
|
I will give your patches a try monday. Right now I'm running some test for oleg. |
| Comment by James A Simmons [ 14/Feb/11 ] |
|
Okay I did test your patch and it appears to work. |
| Comment by James A Simmons [ 14/Feb/11 ] |
|
I have a bunch of patches to fix various parts of the 2.X test suite. Shoudl I opena different bug for those patches? |
| Comment by nasf (Inactive) [ 14/Feb/11 ] |
|
I thing you can create some sub-tasks under this one, then it is more easy to be tracked. |
| Comment by Peter Jones [ 21/Feb/11 ] |
|
Jsmes Personally I think that it is easier to track issues when there is a 1:1 relationship between tickets and issues\fixes Peter |
| Comment by nasf (Inactive) [ 01/Mar/11 ] |
|
>replay-single 65a - bug 19960 Sorry, I can not reproduce this failure. Can you show me an easy way to reproduce it? I think it is a duplicate of bug 22560, which has been fixed on master and lustre-1.8.5. Would you like to verify it again? |
| Comment by James A Simmons [ 01/Mar/11 ] |
|
Just tried it. Also the test fails with 2.X clients with 2.X servers. |
| Comment by James A Simmons [ 01/Mar/11 ] |
|
I believe I found the problem for replay-single 65a. Please look at patch http://review.whamcloud.com/#change,284 |
| Comment by nasf (Inactive) [ 02/Mar/11 ] |
|
> sanity 64b - bug 22703 I have made patch for it: |
| Comment by nasf (Inactive) [ 02/Mar/11 ] |
|
> sanity-quota - totally broken. Does not work at all. Locks up clients Are there any logs related with sanity_quota interoperability test which caused client locked up? Because sanity-quota interoperability test passed in my local environment. I have checked bugzilla also, and found recent test result: https://bugzilla.lustre.org/show_bug.cgi?id=24207#c4 That means sanity-quota interoperability works under TCP environment, but failed under IB case for bug 24055, and related patch for bug 24055 has been landed. So would you like to check whether such patch applied in your test. On the other hand, I think bug 24055's patch is not enough, you need above patch for bug 22703 also. Thanks! |
| Comment by nasf (Inactive) [ 03/Mar/11 ] |
|
> config-sanity 55,56,57 - no bug report yet. According to the MDS side log (lustre_conf-sanity_test_55.1297196706.gz), the system is not ready to accept client(10.36.230.36@o2ib) connection yet. ========== That means MDS returned "EAGAIN" to client to tell it retry later, which is normal case. But from the log, I can not find any other communication between client and MDS after that, until MDS reported test_55 failure. 00000001:00000001:5.0:1297196705.949788:0:19367:0:(debug.c:439:libcfs_debug_mark_buffer()) *************************************************** I need client side log to investigate what happened on client after MDS returned "EAGAIN". Thanks! |
| Comment by James A Simmons [ 03/Mar/11 ] |
|
For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one? |
| Comment by nasf (Inactive) [ 03/Mar/11 ] |
|
> For http://review.whamcloud.com/#change,286 the two sets of patches conflict. Is the second patch the only valid one? Yes, set 2 is the right one. |
| Comment by James A Simmons [ 07/Mar/11 ] |
|
Sorry I haven't been able to test. The build system is broken. /data/buildsystem/jsimmons-head/rpmbuild/BUILD/lustre-2.0.59/lustre/lvfs/fsfilt-ldiskfs.c: In function 'fsfilt_ldiskfs_fid2dentry': |
| Comment by nasf (Inactive) [ 22/Mar/11 ] |
|
The patch of "http://review.whamcloud.com/#change,286" has been landed, I think you can test with the latest code. On the other hand, would you like to update your patch of "http://review.whamcloud.com/#change,284" to make it more compatible? Thanks |
| Comment by Build Master (Inactive) [ 23/Mar/11 ] |
|
Integrated in James Simmons : 43d727e089f1a1cf237da4251dc2aa661de05a0b
|
| Comment by Brian Murrell (Inactive) [ 23/Mar/11 ] |
|
FWIW, your ubuntu reviews builds are failing due to |
| Comment by Peter Jones [ 29/Mar/11 ] |
|
Believed resolved. ORNL will reopen or open a new ticket if their reproducer still has issues |