[LU-2407] Interop 2.1.3<->2.4 Failure on test suite conf-sanity test_35a: conf_param: No such device Created: 29/Nov/12  Updated: 15/Aug/13  Resolved: 11/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: 2.3 RHEL6
client: lustre master build #1065 RHEL6


Severity: 3
Rank (Obsolete): 5713

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/013a4c58-3980-11e2-9fda-52540035b04c.

The sub-test test_35a failed with the following error:

test_35a failed with 4

test log shows:

Set up a fake failnode for the MDS
CMD: client-26vm7 lctl get_param -n devices
CMD: client-26vm7 /usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp
client-26vm7: error: conf_param: No such device
 conf-sanity test_35a: @@@@@@ FAIL: test_35a failed with 4 

MDS dmesg:

Lustre: DEBUG MARKER: Set up a fake failnode for the MDS
Lustre: DEBUG MARKER: lctl get_param -n devices
Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp
LustreError: 11886:0:(mgs_llog.c:2684:mgs_write_log_param()) err -19 on param 'failover.node='
LustreError: 11886:0:(mgs_handler.c:1147:mgs_iocontrol()) setparam err -19
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_35a: @@@@@@ FAIL: test_35a failed with 4 


 Comments   
Comment by Peter Jones [ 30/Nov/12 ]

duplicate of LU-2406

Comment by Sarah Liu [ 26/Dec/12 ]

Hit the same issue when doing interop between 2.1.3 server and 2.4 client. The client build is lustre-master #1127 which should include the fix of LU-2406.

https://maloo.whamcloud.com/test_sets/351a6028-4a86-11e2-8a7b-52540035b04c

Comment by Sarah Liu [ 14/Jan/13 ]

another instance seen in 2.3.0 server vs 2.4 client:
https://maloo.whamcloud.com/test_sets/19d3d0ea-5b54-11e2-8985-52540035b04c

Comment by Sarah Liu [ 15/Jan/13 ]

also seen in 2.1.4 server vs 2.4 client:
https://maloo.whamcloud.com/test_sets/ae414df4-5f35-11e2-b507-52540035b04c

Comment by Jian Yu [ 26/Jun/13 ]

The issue was introduced by the change of http://review.whamcloud.com/4247, which added an extra white space between "=" and "$(h2$NETTYPE $FAKENID)". This made the "lctl conf_param" command become:

/usr/sbin/lctl conf_param lustre-MDT0000.failover.node= 127.0.0.2@tcp

As we can see, the fake failover NID was not really set to "failover.node" parameter. So, we need create a patch on master and b2_4 branches to fix this script issue.

On the other hand, although the test script had issue, why the same test passed on master and b2_4 branches but failed on master/b2_4<->b2_3/b2_1 interop combinations?
This is because master and b2_4 branches have the patch of http://review.whamcloud.com/3670, which improves the error handling codes in mgs_modify() and mgs_write_log_failnid_internal() to make sure that setting an empty value to "failover.node" means to remove all failover NIDs. If "failover.node" was empty before, mgs_modify() will return 1 which means no modification is done. However, on b2_3 and b2_1 branches, without the patch, setting an empty value will make mgs_modify() return -ENODEV if "failover.node" had no value before.

So, to fix the interop issue, we need backport the error handling codes from master branch to b2_3 and b2_1 branches, or need wait for fixing the test script issue on master and b2_4 branches.

Comment by Jian Yu [ 26/Jun/13 ]

Patch for Lustre master branch to remove the extra space: http://review.whamcloud.com/6779. The patch also needs to be cherry-picked to Lustre b2_4 branch.

Comment by Nathaniel Clark [ 11/Jul/13 ]

Patches merged to master

Comment by Jian Yu [ 11/Aug/13 ]

Patch for Lustre master branch to remove the extra space: http://review.whamcloud.com/6779. The patch also needs to be cherry-picked to Lustre b2_4 branch.

Hi Oleg,
Could you please cherry-pick the above patch to Lustre b2_4 branch? Thanks.

Comment by Jian Yu [ 15/Aug/13 ]

Patch was cherry-picked to Lustre b2_4 branch.

Generated at Sat Feb 10 01:24:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.