[LU-8483] sanity.sh aborts after test 117 due to get_param error Created: 07/Aug/16 Updated: 08/Aug/16 Resolved: 08/Aug/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Doug Oucharek (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Running sanity.sh on 1 client, 1 MDT, and 1 OST, I consistently find it fails just after test 117 due to a get_params error: SKIP: sanity test_117 skipping ALWAYS excluded test 117 error: get_param: param_path 'osc/lustrewt-OST*-osc-*/resend_count': No such file or directory error: set_param: param_path 'osc/lustrewt-OST*-osc-*/resend_count': No such file or directory sanity.sh returned 0 Finished at Sun Aug 7 12:05:37 PDT 2016 in 2657s ./auster: completed with rc 0 Even if test 117 is excluded (as above), it still aborts. It tells me that auster completed with rc of 0, but that is not true. A test earlier did fail, and there are more tests which did not get run because of this abort. |
| Comments |
| Comment by Andreas Dilger [ 08/Aug/16 ] |
|
This error message is generated by set_resend_count() which is run between test_117() and test_118a(). It appears that the client is not mounted at this point for some reason, so some kind of console and/or test logs are needed to see what test is actually failing. |
| Comment by Doug Oucharek (Inactive) [ 08/Aug/16 ] |
|
Turns out the problem is this:
So, the lesson is this: when Multi-Rail comes into play, we must make sure that the DNS names for the nodes refer to the primary NID (preferably) or at least is the NID which we are mounting the file system with. |