[LU-8483] sanity.sh aborts after test 117 due to get_param error Created: 07/Aug/16  Updated: 08/Aug/16  Resolved: 08/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Doug Oucharek (Inactive) Assignee: Doug Oucharek (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running sanity.sh on 1 client, 1 MDT, and 1 OST, I consistently find it fails just after test 117 due to a get_params error:

 SKIP: sanity test_117 skipping ALWAYS excluded test 117
error: get_param: param_path 'osc/lustrewt-OST*-osc-*/resend_count': No such file or directory
error: set_param: param_path 'osc/lustrewt-OST*-osc-*/resend_count': No such file or directory
sanity.sh returned 0
Finished at Sun Aug  7 12:05:37 PDT 2016 in 2657s
./auster: completed with rc 0

Even if test 117 is excluded (as above), it still aborts. It tells me that auster completed with rc of 0, but that is not true. A test earlier did fail, and there are more tests which did not get run because of this abort.



 Comments   
Comment by Andreas Dilger [ 08/Aug/16 ]

This error message is generated by set_resend_count() which is run between test_117() and test_118a(). It appears that the client is not mounted at this point for some reason, so some kind of console and/or test logs are needed to see what test is actually failing.

Comment by Doug Oucharek (Inactive) [ 08/Aug/16 ]

Turns out the problem is this:

  • We are testing Multi-Rail, so the interfaces are known by multiple NIDs.
  • Autotest uses DNS names (I've tried IP addresses and ran into some problems so reverted to DNS names).
  • In our testing, we mounted the file system using a NID of the MDT which did not match the DNS name. That worked, but caused this problem later on.

So, the lesson is this: when Multi-Rail comes into play, we must make sure that the DNS names for the nodes refer to the primary NID (preferably) or at least is the NID which we are mounting the file system with.

Generated at Sat Feb 10 02:17:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.