Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7851

ZFS rolling upgrade: cannot mount MDS: mdt: lustre-MDT0000 unknown param som=disabled

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • before upgrade, system was formatted as 2.7.1 RHEL6.7 zfs
    • 3
    • 9223372036854775807

    Description

      Before upgrade, system was formatted as 2.7.1 RHEL6.7 zfs
      1. upgrade OSS from 2.7.1 RHEL6.7 zfs to 2.8.0 RHEL7.1 zfs, MDS and clients were remained 2.7.1, ran sanity, hit known issues;
      2. then upgrade MDS from 2.7.1 RHEL6.7 zfs to 2.8.0 RHEL7.1 zfs, cannot mount

      [root@onyx-25 ~]# mount -t lustre -o acl,user_xattr lustre-mdt1/mdt1 /mnt/mds1
      [ 1753.734535] Lustre: MGS: Connection restored to 3f133170-911b-365a-d7e8-6a31eadc485c (at 0@lo)
      [ 1755.859175] LustreError: 22512:0:(obd_config.c:1387:class_process_proc_param()) mdt: lustre-MDT0000 unknown param som=disabled
      [ 1755.878206] LustreError: 22512:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -38
      [ 1755.896907] Lustre:    cmd=cf00f 0:lustre-MDT0000  1:mdt.som=disabled  
      [ 1755.896907] 
      [ 1755.911461] LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      [ 1755.946877] LustreError: 22417:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lustre-MDT0000: -38
      [ 1755.962800] LustreError: 22417:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -38
      [ 1755.976801] Lustre: Failing over lustre-MDT0000
      [ 1763.422747] Lustre: 22417:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1457156520/real 1457156520]  req@ffff8807f1960000 x1527938522546332/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1457156526 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 1765.473299] Lustre: server umount lustre-MDT0000 complete
      [ 1765.481424] LustreError: 22417:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-38)
      mount.lustre: mount lustre-mdt1/mdt1 at /mnt/mds1 failed: Function not implemented
      

      Attachments

        Issue Links

          Activity

            [LU-7851] ZFS rolling upgrade: cannot mount MDS: mdt: lustre-MDT0000 unknown param som=disabled

            Landed to master for 2.9.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.9.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18834/
            Subject: LU-7851 mdt: skip SOM related configuration
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 109cdb83d77844e00ebc6d5df6bf73845cf9d45e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18834/ Subject: LU-7851 mdt: skip SOM related configuration Project: fs/lustre-release Branch: master Current Patch Set: Commit: 109cdb83d77844e00ebc6d5df6bf73845cf9d45e
            sarah Sarah Liu added a comment -

            hi Hongchao,

            sorry for the late reply. So here is the scenario will hit the problem
            1. format and setup system as 2.5.5/2.7.1 zfs
            2. upgrade OSS only to 2.8 zfs
            *3. run sanity with 2.8 OSS, 2.5.5/2.7.1 MDS and clients
            4. now upgrade MDS to 2.8 zfs, hit the problem

            Step 3 is the key, without running sanity with such config won't hit this problem.

            sarah Sarah Liu added a comment - hi Hongchao, sorry for the late reply. So here is the scenario will hit the problem 1. format and setup system as 2.5.5/2.7.1 zfs 2. upgrade OSS only to 2.8 zfs *3. run sanity with 2.8 OSS, 2.5.5/2.7.1 MDS and clients 4. now upgrade MDS to 2.8 zfs, hit the problem Step 3 is the key, without running sanity with such config won't hit this problem.

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/18834
            Subject: LU-7851 mdt: skip SOM related configuration
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1555ccad216290befa4f32d93b366b6440bcc9d2

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/18834 Subject: LU-7851 mdt: skip SOM related configuration Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1555ccad216290befa4f32d93b366b6440bcc9d2

            Oleg: Okay, I'll create a patch to ignore the configuration.

            hongchao.zhang Hongchao Zhang added a comment - Oleg: Okay, I'll create a patch to ignore the configuration.
            green Oleg Drokin added a comment -

            Hongchao: even when users of the actual option might be rare, we still need the patch to ignore it since it was a valid one.

            That said making sure it does not hit by default is still good.

            green Oleg Drokin added a comment - Hongchao: even when users of the actual option might be rare, we still need the patch to ignore it since it was a valid one. That said making sure it does not hit by default is still good.

            this problem could be related to the sanity.sh test during upgrading.

            Prior to 2.8, the SOM is disabled by default (mdt_som_conf = 0) and there should be no such configuration by default,
            but the parameter "som=disabled" for "som=enabled" will be setup during running sanity.sh, the it could cause this problem
            during upgrading the MDS to 2.8 afterwards for the newly added configuration of "som".

            Hi Sarah,
            Could you please verify whether it is the case? Thanks!

            hongchao.zhang Hongchao Zhang added a comment - this problem could be related to the sanity.sh test during upgrading. Prior to 2.8, the SOM is disabled by default (mdt_som_conf = 0) and there should be no such configuration by default, but the parameter "som=disabled" for "som=enabled" will be setup during running sanity.sh, the it could cause this problem during upgrading the MDS to 2.8 afterwards for the newly added configuration of "som". Hi Sarah, Could you please verify whether it is the case? Thanks!
            sarah Sarah Liu added a comment -

            I also tried upgrade from 2.7.1 RHEL6.7 zfs to 2.8 RHEL6.7 zfs, it didn't hit this issue.

            sarah Sarah Liu added a comment - I also tried upgrade from 2.7.1 RHEL6.7 zfs to 2.8 RHEL6.7 zfs, it didn't hit this issue.
            green Oleg Drokin added a comment -

            when SOM code was removed som parameter handling should not have been removed completely, but instead converted to a noop like some other compat params.

            This mostly would only happen on the fiesystem where sanity.sh was run that tries to enable and then disable SOM

            green Oleg Drokin added a comment - when SOM code was removed som parameter handling should not have been removed completely, but instead converted to a noop like some other compat params. This mostly would only happen on the fiesystem where sanity.sh was run that tries to enable and then disable SOM

            Hi Hongchao,

            Can you please have a look at this issue?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Hongchao, Can you please have a look at this issue? Thanks. Joe

            People

              hongchao.zhang Hongchao Zhang
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: