[LU-1275] Lustre 2.1.1 REPLAY_SINGLE test_0a FAIL: Restart of mds failed Created: 30/Mar/12 Updated: 05/Mar/14 Resolved: 05/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.1, Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jay Lan (Inactive) | Assignee: | Minh Diep |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server runs centos 6.2, ofed-1.5.4.1, Lustre 2.1.1. |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6096 |
| Description |
|
My acc-sm set-ups has been used in testing 1.8.5, 1.8.6, and 1.8.7 successfully. == test 0a: empty replay == 12:05:12 The /var/log/message of the MGS/MDS node showed: |
| Comments |
| Comment by Peter Jones [ 30/Mar/12 ] |
|
Minh Could you please help with this one? Thanks Peter |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
Could you please help on this? I am going to spend time to convert to auster for 2.1.1 server + 2.1.1 client, |
| Comment by Minh Diep [ 03/Apr/12 ] |
|
ok, looking into this |
| Comment by Minh Diep [ 03/Apr/12 ] |
|
can you show me the config file? or local.sh if you modified it |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
The command used in testing was:
The ncli_nas.v3 will be attached. |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
I accidentally also attached nas.v3.sh. It was a wrapper. The end result was to |
| Comment by Minh Diep [ 03/Apr/12 ] |
|
thanks. Did you run this on a client that was running 1.8.6? |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
Yes. It was started from service331, a client. All nodes (mds, 2 oss'es and 2 clients) have the same set of configuration. |
| Comment by Minh Diep [ 03/Apr/12 ] |
|
I don't have a system to try it out now. could you manually run "mount -t lustre -o errors=panic,acl /dev/sdb1 /mnt/mds" on the mds to see if it works |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
I know for the fact that "mount -t lustre -o errors=panic,acl /dev/sdb1 /mnt/mds" works because the command has been executed so many times. However, that brought some thought to me. In fact I ran the acceptance-small.sh in a for-loop: for i in SANITY SANITYN REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL INSANITY SANITY_QUOTA LNET_SELFTEST MMP; do So, by the time the REPLAY_SINGLE is executed, both SANITY and SANITYN has completed. That means it was not the same as starting from ground zero. So, I rebooted all the machines. Ran "mount -t lustre" to make sure it worked. Now, this wrapper worked when the lustre server is 1.8.6 (or 1.8.7). Any suggestion to make it work when server runs 2.1.1? |
| Comment by Jay Lan (Inactive) [ 03/Apr/12 ] |
|
Since the REPLAY_SINGLE can be executed successfully on a clean environment, you can close this ticket then. I will figure out a way to work around my problem when testing with 2.x servers. Suggestion is welcome |
| Comment by Minh Diep [ 03/Apr/12 ] |
|
I need to reproduce this in the lab and investigate the cause. In the mean time, please try this. Add MDSDEV1=/dev/sdb1 in the config file to see if it makes any different. If you don't care to reformat the FS before every test, you could put export REFORMAT=true in the config file. I also suggest you to explore auster script which has an option to send the result back in our maloo result db. |
| Comment by Jay Lan (Inactive) [ 04/Apr/12 ] |
|
I have this line in my configuration file: Would it have the same effect as "export REFORMAT=true"? |
| Comment by Minh Diep [ 06/Apr/12 ] |
|
yes |
| Comment by Jay Lan (Inactive) [ 06/Apr/12 ] |
|
Attached two files, cut from /var/log/messages of the mds server between the MARKER of beginning and end of test 0a. The *.FAIL was the run that failed. and The *.PASS was the run that passed. |
| Comment by Jay Lan (Inactive) [ 06/Apr/12 ] |
|
On a second thought I do not feel comfortable to declare this is a test issue (ie, is a problem of test environment setup.) It could also resulted from mds behaving differently in different situations and represents a real problem. We do not know enough to say either way. |
| Comment by John Fuchs-Chesney (Inactive) [ 05/Mar/14 ] |
|
Jay – is this still an issue of concern to you? |
| Comment by Jay Lan (Inactive) [ 05/Mar/14 ] |
|
Yes, please. No longer a problem. Thanks! |
| Comment by John Fuchs-Chesney (Inactive) [ 05/Mar/14 ] |
|
Thank you |
| Comment by John Fuchs-Chesney (Inactive) [ 05/Mar/14 ] |
|
Not clear if this was a test issue – but time has moved on and it is no longer a problem. |