[LU-2407] Interop 2.1.3<->2.4 Failure on test suite conf-sanity test_35a: conf_param: No such device Created: 29/Nov/12 Updated: 15/Aug/13 Resolved: 11/Jul/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: 2.3 RHEL6 |
||
| Severity: | 3 |
| Rank (Obsolete): | 5713 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/013a4c58-3980-11e2-9fda-52540035b04c. The sub-test test_35a failed with the following error:
test log shows: Set up a fake failnode for the MDS CMD: client-26vm7 lctl get_param -n devices CMD: client-26vm7 /usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp client-26vm7: error: conf_param: No such device conf-sanity test_35a: @@@@@@ FAIL: test_35a failed with 4 MDS dmesg: Lustre: DEBUG MARKER: Set up a fake failnode for the MDS Lustre: DEBUG MARKER: lctl get_param -n devices Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp LustreError: 11886:0:(mgs_llog.c:2684:mgs_write_log_param()) err -19 on param 'failover.node=' LustreError: 11886:0:(mgs_handler.c:1147:mgs_iocontrol()) setparam err -19 Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_35a: @@@@@@ FAIL: test_35a failed with 4 |
| Comments |
| Comment by Peter Jones [ 30/Nov/12 ] |
|
duplicate of |
| Comment by Sarah Liu [ 26/Dec/12 ] |
|
Hit the same issue when doing interop between 2.1.3 server and 2.4 client. The client build is lustre-master #1127 which should include the fix of https://maloo.whamcloud.com/test_sets/351a6028-4a86-11e2-8a7b-52540035b04c |
| Comment by Sarah Liu [ 14/Jan/13 ] |
|
another instance seen in 2.3.0 server vs 2.4 client: |
| Comment by Sarah Liu [ 15/Jan/13 ] |
|
also seen in 2.1.4 server vs 2.4 client: |
| Comment by Jian Yu [ 26/Jun/13 ] |
|
The issue was introduced by the change of http://review.whamcloud.com/4247, which added an extra white space between "=" and "$(h2$NETTYPE $FAKENID)". This made the "lctl conf_param" command become: /usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp As we can see, the fake failover NID was not really set to "failover.node" parameter. So, we need create a patch on master and b2_4 branches to fix this script issue. On the other hand, although the test script had issue, why the same test passed on master and b2_4 branches but failed on master/b2_4<->b2_3/b2_1 interop combinations? So, to fix the interop issue, we need backport the error handling codes from master branch to b2_3 and b2_1 branches, or need wait for fixing the test script issue on master and b2_4 branches. |
| Comment by Jian Yu [ 26/Jun/13 ] |
|
Patch for Lustre master branch to remove the extra space: http://review.whamcloud.com/6779. The patch also needs to be cherry-picked to Lustre b2_4 branch. |
| Comment by Nathaniel Clark [ 11/Jul/13 ] |
|
Patches merged to master |
| Comment by Jian Yu [ 11/Aug/13 ] |
Hi Oleg, |
| Comment by Jian Yu [ 15/Aug/13 ] |
|
Patch was cherry-picked to Lustre b2_4 branch. |