[LU-1848] interop issue 2,2 clients can't talk to 2.3 servers Created: 06/Sep/12 Updated: 10/Sep/12 Resolved: 10/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4263 |
| Description |
|
with the following error message: [root@client-17 ~]# uname -r 2.6.32-220.4.2.el6_lustre.g45b2fe8.x86_64 [root@client-17 ~]# rpm -qa |grep lustre lustre-2.2.0-2.6.32_220.4.2.el6_lustre.g45b2fe8.x86_64_g25a1427.x86_64 kernel-2.6.32-220.4.2.el6_lustre.g45b2fe8.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.g45b2fe8.x86_64_g25a1427.x86_64 lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.g45b2fe8.x86_64_g25a1427.x86_64 [root@client-17 ~]# mount -t lustre client-18@tcp:/lustre /mnt/lustre mount.lustre: mount client-18@tcp:/lustre at /mnt/lustre failed: Invalid argument This may have multiple causes. Is 'lustre' the correct filesystem name? Are the mount options correct? Check the syslog for more info. [root@client-17 ~]# dmesg Lustre: MGC10.10.4.18@tcp: Reactivating import Lustre: 6833:0:(obd_config.c:1002:class_process_config()) Ignoring unknown param jobid_var=procname_uid LustreError: 6833:0:(obd_config.c:1362:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf00f 0:(null) 1:sys.jobid_var=procname_uid 2:procname_uid LustreError: 15b-f: MGC10.10.4.18@tcp: The configuration from log 'lustre-client'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.10.4.18@tcp: The configuration from log 'lustre-client' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 6821:0:(llite_lib.c:978:ll_fill_super()) Unable to process log: -22 LustreError: 6736:0:(lov_obd.c:928:lov_cleanup()) lov tgt 0 not cleaned! deathrow=0, lovrc=1 LustreError: 6736:0:(lov_obd.c:928:lov_cleanup()) Skipped 3 previous similar messages LustreError: 6821:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 6821:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client ffff880329563000 umount complete LustreError: 6821:0:(obd_mount.c:2349:lustre_fill_super()) Unable to mount (-22) We need to fix it by: |
| Comments |
| Comment by Peter Jones [ 06/Sep/12 ] |
|
Sarah will look into this one |
| Comment by Oleg Drokin [ 06/Sep/12 ] |
|
possibly might be helped by this too: http://review.whamcloud.com/3806 |
| Comment by Jinshan Xiong (Inactive) [ 06/Sep/12 ] |
|
After taking a closer look, we don't need a connect bit or something to address this issue. The only culprit is test-framework.sh which is eager to set jobvar stuff... |
| Comment by Peter Jones [ 06/Sep/12 ] |
|
Lai will take care of this |
| Comment by Sarah Liu [ 07/Sep/12 ] |
|
I cannot reproduce this issue with following config: MDS and OST 2.3-tag2.2.94 https://maloo.whamcloud.com/test_sessions/7948cadc-f8bf-11e1-b9a7-52540035b04c |
| Comment by Lai Siyao [ 07/Sep/12 ] |
|
Sarah, it occurs between 2.2 client and 2.3 (not master) server. |
| Comment by Sarah Liu [ 07/Sep/12 ] |
I used 2.3 as servers, master was just another client which was the case Jinshan told me |
| Comment by Jinshan Xiong (Inactive) [ 07/Sep/12 ] |
|
Hi Sarah, I started my cluster with auster and then mounted a 2.2 client manually. From what I have seen, the following piece of code has run: in test-framework.sh, function init_param_vars(): local jobid_var
if [ -z "$(lctl get_param -n mdc.*.connect_flags | grep jobstats)" ]; then
jobid_var="none"
elif [ $JOBSTATS_AUTO -ne 0 ]; then
echo "enable jobstats, set job scheduler as $JOBID_VAR"
jobid_var=$JOBID_VAR
else
jobid_var=`$LCTL get_param -n jobid_var`
if [ $jobid_var != "disable" ]; then
echo "disable jobstats as required"
jobid_var="disable"
else
jobid_var="none"
fi
fi
if [ $jobid_var == $JOBID_VAR -o $jobid_var == "disable" ]; then
do_facet mgs $LCTL conf_param $FSNAME.sys.jobid_var=$jobid_var
wait_update $HOSTNAME "$LCTL get_param -n jobid_var" \
$jobid_var || return 1
fi
by default JOBSTATS_AUTO is 1. This caused lctl conf_param was called to set jobid_var and 2.2 clients don't understand this config items for sure. |
| Comment by Lai Siyao [ 09/Sep/12 ] |
|
Jinshan, I think before you test 2.2 client, you've setup system from a 2.3 client, which will enable jobid. So the right way to test 2.2 client and 2.3 server is to setup the system from 2.2 client (which is okay). So this should not be a bug. |
| Comment by Jinshan Xiong (Inactive) [ 09/Sep/12 ] |
|
Yes I agree. |