[LU-1284] CONF_SANITY test_50[bef]: FAIL: test_50b import is not in DISCONN state Created: 04/Apr/12  Updated: 10/Apr/12  Resolved: 10/Apr/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jay Lan (Inactive) Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Server: centos 6.2 (2.6.32-220.4.1.el6) with Lustre 2.1.1
Client: sles11sp1, running 1.8.6.81
MGS/MDS uses the same device. Two OSS'es. Two clients.


Attachments: File CONF_SANITY.output     Text File conf-sanity.log     File debug     File ncli_nas.v3.sh     File ogdb-service331     File test_logs.tgz    
Severity: 3
Rank (Obsolete): 6125

 Description   

Excution of this ACC_SM CONF_SANITY test_50[bef] failed.

  1. ONLY=50b ACC_SM_ONLY="CONF_SANITY" NAME=ncli_nas.v3 RCLIENTS="service332" sh acceptance-small.sh

The conf-sanity.log showed:
/usr/lib64/lustre/tests/conf-sanity.sh: FAIL: test_50b import is not in DISCONN state

When I executed against all test 50's, 50b, 50e, and 50f failed, with the same error.

These output/debug files are attached:
CONF_SANITY.output
conf-sanity.log
debug
ogdb-service331
test_logs.tgz (of subdirs of test_logs/)

It seemed the statement below in test_50b() timed out:
wait_osc_import_state mds ost DISCONN

Tests 50b, 50e, and 50f failed on the same problem.



 Comments   
Comment by Jay Lan (Inactive) [ 04/Apr/12 ]

The configuration file.

Comment by Peter Jones [ 04/Apr/12 ]

Yangsheng

Could you please help with this one?

Thanks

Peter

Comment by Yang Sheng [ 05/Apr/12 ]

I think this issue already fixed in LU-690. It was landed on 1.8.7 & 2.1.1. So 1.8.6 client is not include this change.

Comment by Jay Lan (Inactive) [ 05/Apr/12 ]

Hi Yang,

Our client is not a plain 1.8.6-wc1 release. The complete list of commits can be seen at https://github.com/jlan/lustre-nas/commits/nas-1.8.6
Patch of LU-690 did make it to the build (tag 1.8.6-5nasC).

The server side does not have the patch, but it seems to me it does not matter.

Comment by Yang Sheng [ 06/Apr/12 ]

Hi, Jay, Do you sure MDS side doesn't included LU-690 change? Looks like this is a import for MDS connect to OST1, So it will cause such failed if MDS without this patch.

Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-MDT0000.ost_server_uuid into DISCONN state after 240 sec, have CONNECTING

Comment by Jay Lan (Inactive) [ 06/Apr/12 ]

Yes, the MDS side does not have the LU-690 fix.

However, the LU-690 commit fixed lustre/tests/ost-pools.sh and lustre/tests/test-framework.sh. Both were on the test scripts, not the Lustre codes.

Since the test was started at a client, I thought the MDS/OSS did not read ost-pool.sh or test-framework.sh, did they? If not, then the patch should not affect the MDS, I think?

I will patch the MDS and give it a try.

Comment by Jay Lan (Inactive) [ 06/Apr/12 ]

The LU-690 patch was landed to b1_8 and master branches only. When I tried to cherry-pick the fix from master to b2_1 tree, cherry-pick failed. The patch changes lustre/tests/ost-pools.sh and lustre/lustre/tests/test-framework.sh. The b2_1 version of ost-pools.sh is very different in test_1. I can not see how the change can fit into ost-pools.sh of v2_1.

Is it OK to nevertheless port the changes at test-framework.sh to b_2.1 tree?

Comment by Yang Sheng [ 06/Apr/12 ]

I think it is OK. Since we just want to verify conf-sanity.

Comment by Jay Lan (Inactive) [ 06/Apr/12 ]

Yeah! 50b passed with back port of LU-690 to 2.1.1 servers.

Please explain to me how the changes at test-framework.sh at servers matter! I thought the client was at the driver's seat and the action to be taken at the server side was through remote lustre commands issued by the client. I thought lustre-tests scripts at server side was not executed... It seems that I was wrong?

Comment by Yang Sheng [ 06/Apr/12 ]

Hi, Jay, As my understand, Any command via 'do_nodes node command...' will run on remote node if the 'node' isn't same as local node. The PDSH must be defined to compelte this task. So if remote machine has a different test scripts it will really cause some problem.

Comment by Jay Lan (Inactive) [ 06/Apr/12 ]

All test 50 subtests passed.

The LU-690 needs to be back-ported to b2_1 and b2_2. Thanks for your help, Yang!

Comment by Yang Sheng [ 10/Apr/12 ]

Close as duplcated LU-690.

Generated at Sat Feb 10 01:15:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.