[LU-7851] ZFS rolling upgrade: cannot mount MDS: mdt: lustre-MDT0000 unknown param som=disabled Created: 05/Mar/16  Updated: 28/Feb/18  Resolved: 27/Apr/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Sarah Liu Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

before upgrade, system was formatted as 2.7.1 RHEL6.7 zfs


Issue Links:
Related
is related to LU-6047 remove client Size on MDS support Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Before upgrade, system was formatted as 2.7.1 RHEL6.7 zfs
1. upgrade OSS from 2.7.1 RHEL6.7 zfs to 2.8.0 RHEL7.1 zfs, MDS and clients were remained 2.7.1, ran sanity, hit known issues;
2. then upgrade MDS from 2.7.1 RHEL6.7 zfs to 2.8.0 RHEL7.1 zfs, cannot mount

[root@onyx-25 ~]# mount -t lustre -o acl,user_xattr lustre-mdt1/mdt1 /mnt/mds1
[ 1753.734535] Lustre: MGS: Connection restored to 3f133170-911b-365a-d7e8-6a31eadc485c (at 0@lo)
[ 1755.859175] LustreError: 22512:0:(obd_config.c:1387:class_process_proc_param()) mdt: lustre-MDT0000 unknown param som=disabled
[ 1755.878206] LustreError: 22512:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -38
[ 1755.896907] Lustre:    cmd=cf00f 0:lustre-MDT0000  1:mdt.som=disabled  
[ 1755.896907] 
[ 1755.911461] LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 1755.946877] LustreError: 22417:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0000: -38
[ 1755.962800] LustreError: 22417:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -38
[ 1755.976801] Lustre: Failing over lustre-MDT0000
[ 1763.422747] Lustre: 22417:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1457156520/real 1457156520]  req@ffff8807f1960000 x1527938522546332/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1457156526 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[ 1765.473299] Lustre: server umount lustre-MDT0000 complete
[ 1765.481424] LustreError: 22417:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-38)
mount.lustre: mount lustre-mdt1/mdt1 at /mnt/mds1 failed: Function not implemented


 Comments   
Comment by Sarah Liu [ 05/Mar/16 ]

upgrade 2.5.5 RHEL6.6 zfs to 2.8 RHEL6.7 zfs also hit the issue. It seems this is a zfs specific problem, ldiskfs didn't hit it.

Comment by Joseph Gmitter (Inactive) [ 07/Mar/16 ]

Hi Hongchao,

Can you please have a look at this issue?

Thanks.
Joe

Comment by Oleg Drokin [ 07/Mar/16 ]

when SOM code was removed som parameter handling should not have been removed completely, but instead converted to a noop like some other compat params.

This mostly would only happen on the fiesystem where sanity.sh was run that tries to enable and then disable SOM

Comment by Sarah Liu [ 07/Mar/16 ]

I also tried upgrade from 2.7.1 RHEL6.7 zfs to 2.8 RHEL6.7 zfs, it didn't hit this issue.

Comment by Hongchao Zhang [ 08/Mar/16 ]

this problem could be related to the sanity.sh test during upgrading.

Prior to 2.8, the SOM is disabled by default (mdt_som_conf = 0) and there should be no such configuration by default,
but the parameter "som=disabled" for "som=enabled" will be setup during running sanity.sh, the it could cause this problem
during upgrading the MDS to 2.8 afterwards for the newly added configuration of "som".

Hi Sarah,
Could you please verify whether it is the case? Thanks!

Comment by Oleg Drokin [ 08/Mar/16 ]

Hongchao: even when users of the actual option might be rare, we still need the patch to ignore it since it was a valid one.

That said making sure it does not hit by default is still good.

Comment by Hongchao Zhang [ 09/Mar/16 ]

Oleg: Okay, I'll create a patch to ignore the configuration.

Comment by Gerrit Updater [ 09/Mar/16 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/18834
Subject: LU-7851 mdt: skip SOM related configuration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1555ccad216290befa4f32d93b366b6440bcc9d2

Comment by Sarah Liu [ 11/Mar/16 ]

hi Hongchao,

sorry for the late reply. So here is the scenario will hit the problem
1. format and setup system as 2.5.5/2.7.1 zfs
2. upgrade OSS only to 2.8 zfs
*3. run sanity with 2.8 OSS, 2.5.5/2.7.1 MDS and clients
4. now upgrade MDS to 2.8 zfs, hit the problem

Step 3 is the key, without running sanity with such config won't hit this problem.

Comment by Gerrit Updater [ 22/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18834/
Subject: LU-7851 mdt: skip SOM related configuration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 109cdb83d77844e00ebc6d5df6bf73845cf9d45e

Comment by Joseph Gmitter (Inactive) [ 27/Apr/16 ]

Landed to master for 2.9.0

Generated at Sat Feb 10 02:12:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.