Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7410

After downgrade from 2.8 to 2.5.5, hit unsupported incompat filesystem feature(s) 400

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • None
    • None
    • before upgrade: lustre-master #3226 RHEL6.7
      after upgrade: lustre-b2_5_fe #62 RHEL6.6
    • 3
    • 9223372036854775807

    Description

      1. upgrade system from 2.5.5 RHEL6.6 to master RHEL6.7 PASS
      2. downgrade system from master RHEL6.7 to 2.5.5 6.6 FAIL

      mount MDS failed

      Lustre: DEBUG MARKER: == upgrade-downgrade End == 15:01:41 (1447110101)
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
      Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
      Lustre: lustre-MDT0000: used disk, loading
      LustreError: 12684:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
      LustreError: 12684:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
      LustreError: 12684:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
      Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
      LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
      LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 12589:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
      LustreError: 12589:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
      LustreError: 12589:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      LustreError: 12589:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 12589:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
      Lustre: 12589:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447110105/real 1447110105]  req@ffff8808352bac00 x1517404919169064/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447110111 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: server umount lustre-MDT0000 complete
      LustreError: 12589:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
      Lustre: DEBUG MARKER: Using TIMEOUT=100
      Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
      Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
      Lustre: lustre-MDT0000: used disk, loading
      LustreError: 13112:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
      LustreError: 13112:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
      LustreError: 13112:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
      Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
      LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
      LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 13025:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
      LustreError: 13025:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
      LustreError: 13025:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
      LustreError: 13025:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 13025:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
      Lustre: 13025:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447110256/real 1447110256]  req@ffff88081d67dc00 x1517404919169104/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447110262 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: server umount lustre-MDT0000 complete
      LustreError: 13025:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
      Lustre: DEBUG MARKER: Using TIMEOUT=100
      Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
      [root@onyx-25 ~]# 
      

      Attachments

        1. debug-after
          43 kB
        2. dmesg-after
          93 kB
        3. dmesg-before
          93 kB
        4. trace-after
          482 kB

        Issue Links

          Activity

            [LU-7410] After downgrade from 2.8 to 2.5.5, hit unsupported incompat filesystem feature(s) 400

            Could you provide the complete test case that was executed ?
            How is designed the filesystem (nodes hosting the MGT, MDTs, OSTs, client nodes...) ?
            It would be helpful to also provide the full MDS lustre log, including the log before downgrade and log after downgrade.

            thanks.

            pichong Gregoire Pichon added a comment - Could you provide the complete test case that was executed ? How is designed the filesystem (nodes hosting the MGT, MDTs, OSTs, client nodes...) ? It would be helpful to also provide the full MDS lustre log, including the log before downgrade and log after downgrade. thanks.
            sarah Sarah Liu added a comment -

            Hello Gregoire,

            I hit the same issue recently, on master/tag-2.7.66 and b2_8/tag-2.7.90. I did remount the MDS with option "abort_recovery" and umount it again before downgrading, here is what I saw. The same test passed on tag-2.7.64, do you have any idea why this happens?

            on MDS

            [root@onyx-25 ~]# mount -t lustre -o abort_recovery /dev/sdb1 /mnt/mds1
            LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
            Lustre: MGS: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo)
            Lustre: Skipped 4 previous similar messages
            LustreError: 45919:0:(mdt_handler.c:5735:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device
            [root@onyx-25 ~]# mount
            /dev/sda1 on / type ext3 (rw)
            proc on /proc type proc (rw)
            sysfs on /sys type sysfs (rw)
            devpts on /dev/pts type devpts (rw,gid=5,mode=620)
            tmpfs on /dev/shm type tmpfs (rw)
            none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
            sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
            nfsd on /proc/fs/nfsd type nfsd (rw)
            /dev/sdb1 on /mnt/mds1 type lustre (rw,abort_recovery)
            [root@onyx-25 ~]# Lustre: 23885:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732424/real 1455732424]  req@ffff8808074dfcc0 x1526383585120268/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 520/544 e 0 to 1 dl 1455732429 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: lustre-MDT0000: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo)
            
            
            [root@onyx-25 ~]# umount /mnt/mds1
            Lustre: Failing over lustre-MDT0000
            
            Lustre: 46030:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732461/real 1455732461]  req@ffff88080d158cc0 x1526383585120452/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455732467 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: server umount lustre-MDT0000 complete
            [root@onyx-25 ~]# 
            [root@onyx-25 ~]# mount
            /dev/sda1 on / type ext3 (rw)
            proc on /proc type proc (rw)
            sysfs on /sys type sysfs (rw)
            devpts on /dev/pts type devpts (rw,gid=5,mode=620)
            tmpfs on /dev/shm type tmpfs (rw)
            none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
            sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
            nfsd on /proc/fs/nfsd type nfsd (rw)
            

            dmesg of MDS

            LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
            Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
            Lustre: lustre-MDT0000: used disk, loading
            LustreError: 10899:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
            LustreError: 10899:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
            LustreError: 10899:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
            Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
            LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
            LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            LustreError: 10804:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
            LustreError: 10804:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
            LustreError: 10804:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            LustreError: 10804:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            LustreError: 10804:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
            Lustre: 10804:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737214/real 1455737214]  req@ffff8808181aec00 x1526451090227240/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737220 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: server umount lustre-MDT0000 complete
            LustreError: 10804:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
            Lustre: DEBUG MARKER: Using TIMEOUT=100
            Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
            Slow work thread pool: Starting up
            Slow work thread pool: Ready
            FS-Cache: Loaded
            NFS: Registering the id_resolver key type
            FS-Cache: Netfs 'nfs' registered for caching
            LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
            Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
            Lustre: lustre-MDT0000: used disk, loading
            LustreError: 34262:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
            LustreError: 34262:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
            LustreError: 34262:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
            Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
            LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
            LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            LustreError: 34110:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
            LustreError: 34110:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
            LustreError: 34110:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            LustreError: 34110:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            LustreError: 34110:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
            Lustre: 34110:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737367/real 1455737367]  req@ffff880412fb3800 x1526451090227280/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737373 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: server umount lustre-MDT0000 complete
            LustreError: 34110:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
            Lustre: DEBUG MARKER: Using TIMEOUT=100
            Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
            [root@onyx-25 ~]# 
            
            sarah Sarah Liu added a comment - Hello Gregoire, I hit the same issue recently, on master/tag-2.7.66 and b2_8/tag-2.7.90. I did remount the MDS with option "abort_recovery" and umount it again before downgrading, here is what I saw. The same test passed on tag-2.7.64, do you have any idea why this happens? on MDS [root@onyx-25 ~]# mount -t lustre -o abort_recovery /dev/sdb1 /mnt/mds1 LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: MGS: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo) Lustre: Skipped 4 previous similar messages LustreError: 45919:0:(mdt_handler.c:5735:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device [root@onyx-25 ~]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/sdb1 on /mnt/mds1 type lustre (rw,abort_recovery) [root@onyx-25 ~]# Lustre: 23885:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732424/real 1455732424] req@ffff8808074dfcc0 x1526383585120268/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 520/544 e 0 to 1 dl 1455732429 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: lustre-MDT0000: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo) [root@onyx-25 ~]# umount /mnt/mds1 Lustre: Failing over lustre-MDT0000 Lustre: 46030:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732461/real 1455732461] req@ffff88080d158cc0 x1526383585120452/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455732467 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete [root@onyx-25 ~]# [root@onyx-25 ~]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) dmesg of MDS LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo) Lustre: lustre-MDT0000: used disk, loading LustreError: 10899:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400 LustreError: 10899:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22) LustreError: 10899:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22 Lustre: cmd=cf003 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 10804:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22 LustreError: 10804:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22 LustreError: 10804:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. LustreError: 10804:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 10804:0:(obd_config.c:619:class_cleanup()) Device 5 not setup Lustre: 10804:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737214/real 1455737214] req@ffff8808181aec00 x1526451090227240/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737220 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete LustreError: 10804:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount (-22) Lustre: DEBUG MARKER: Using TIMEOUT=100 Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted Slow work thread pool: Starting up Slow work thread pool: Ready FS-Cache: Loaded NFS: Registering the id_resolver key type FS-Cache: Netfs 'nfs' registered for caching LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo) Lustre: lustre-MDT0000: used disk, loading LustreError: 34262:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400 LustreError: 34262:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22) LustreError: 34262:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22 Lustre: cmd=cf003 0:lustre-MDT0000 1:lustre-MDT0000_UUID 2:0 3:lustre-MDT0000-mdtlov 4:f LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 34110:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22 LustreError: 34110:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22 LustreError: 34110:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. LustreError: 34110:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 34110:0:(obd_config.c:619:class_cleanup()) Device 5 not setup Lustre: 34110:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737367/real 1455737367] req@ffff880412fb3800 x1526451090227280/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737373 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete LustreError: 34110:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount (-22) Lustre: DEBUG MARKER: Using TIMEOUT=100 Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted [root@onyx-25 ~]#
            pjones Peter Jones added a comment -

            If I understand correctly, this is not a bug

            pjones Peter Jones added a comment - If I understand correctly, this is not a bug
            sarah Sarah Liu added a comment -

            Ah I see thank you for the clarification!

            sarah Sarah Liu added a comment - Ah I see thank you for the clarification!

            Sarah,

            The issue will be the same with a downgrade to 2.7.0, if you don't perform the additional operation that clears the incompatibility flag.

            The important point is that to have the OBD_INCOMPAT_MULTI_RPCS incompatibility flag cleared on the MDT servers, it must unmount all the clients, then unmount the servers, and then additionally operate the "mount with abort_recovery option and unmount of each MDT one by one". After that the nodes can be downgraded to a lower Lustre version.

            pichong Gregoire Pichon added a comment - Sarah, The issue will be the same with a downgrade to 2.7.0, if you don't perform the additional operation that clears the incompatibility flag. The important point is that to have the OBD_INCOMPAT_MULTI_RPCS incompatibility flag cleared on the MDT servers, it must unmount all the clients, then unmount the servers, and then additionally operate the "mount with abort_recovery option and unmount of each MDT one by one". After that the nodes can be downgraded to a lower Lustre version.
            sarah Sarah Liu added a comment - - edited

            Hi Gregoire,

            I hit this issue when doing a clean downgrade from master to 2.5.5, which umount all the servers and clients and downgrade them all at once, then tried to mount the system again and failed.

            I will try with 2.7.0 and see how it goes.

            sarah Sarah Liu added a comment - - edited Hi Gregoire, I hit this issue when doing a clean downgrade from master to 2.5.5, which umount all the servers and clients and downgrade them all at once, then tried to mount the system again and failed. I will try with 2.7.0 and see how it goes.

            To have the OBD_INCOMPAT_MULTI_RPCS incompatiblity flag cleaned, the MDT target must have no client connected when it is unmounted.
            Connected clients include both Lustre clients and other MDT targets.

            Therefore, if the file system has only one MDT target, unmounting the Lustre clients first will allow clearing the incompatibility flag at MDT target unmount.

            If the file system has several MDT targets, then it is required to unmount all MDT targets, and then, for each MDT target (one by one), mount with abort_recovery option and unmount the target. This is mentioned in the LU-5319 test plan, at section "Upgrade / Downgrade".

            After clearing the incompatiblity flag, the server can be downgraded to a lower Lustre version.

            pichong Gregoire Pichon added a comment - To have the OBD_INCOMPAT_MULTI_RPCS incompatiblity flag cleaned, the MDT target must have no client connected when it is unmounted. Connected clients include both Lustre clients and other MDT targets. Therefore, if the file system has only one MDT target, unmounting the Lustre clients first will allow clearing the incompatibility flag at MDT target unmount. If the file system has several MDT targets, then it is required to unmount all MDT targets, and then, for each MDT target (one by one), mount with abort_recovery option and unmount the target. This is mentioned in the LU-5319 test plan , at section "Upgrade / Downgrade". After clearing the incompatiblity flag, the server can be downgraded to a lower Lustre version.
            pjones Peter Jones added a comment -

            Gregoire

            Could you please advise on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Gregoire Could you please advise on this one? Thanks Peter
            sarah Sarah Liu added a comment -

            Hit the same issue when downgrading from master RHEL7 to 2.5.5 RHEL6.6.

            before downgrade, the MDS is unmounted

            [14588.476494] Lustre: DEBUG MARKER: == upgrade-downgrade Start clean downgrade == 20:19:16 (1447215556)
            [14588.857361] Lustre: DEBUG MARKER: == upgrade-downgrade Shutdown the entire Lustre filesystem == 20:19:16 (1447215556)
            [14592.877840] LustreError: 3346:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88040e18c800 x1517502900474884/t0(0) o13->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
            [14594.489735] Lustre: lustre-MDT0000: Not available for connect from 10.2.4.56@tcp (stopping)
            [14602.944947] Lustre: 28852:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447215564/real 1447215564]  req@ffff880427623f00 x1517502900474904/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447215570 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            [14603.506132] Lustre: server umount lustre-MDT0000 complete
            [14627.485965] Lustre: DEBUG MARKER: == upgrade-downgrade downgrade the Lustre servers all at once == 20:19:46 (1447215586)
            [  OK  ] Started Show Plymouth Power Off Screen.
            
            sarah Sarah Liu added a comment - Hit the same issue when downgrading from master RHEL7 to 2.5.5 RHEL6.6. before downgrade, the MDS is unmounted [14588.476494] Lustre: DEBUG MARKER: == upgrade-downgrade Start clean downgrade == 20:19:16 (1447215556) [14588.857361] Lustre: DEBUG MARKER: == upgrade-downgrade Shutdown the entire Lustre filesystem == 20:19:16 (1447215556) [14592.877840] LustreError: 3346:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88040e18c800 x1517502900474884/t0(0) o13->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [14594.489735] Lustre: lustre-MDT0000: Not available for connect from 10.2.4.56@tcp (stopping) [14602.944947] Lustre: 28852:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447215564/real 1447215564] req@ffff880427623f00 x1517502900474904/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447215570 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [14603.506132] Lustre: server umount lustre-MDT0000 complete [14627.485965] Lustre: DEBUG MARKER: == upgrade-downgrade downgrade the Lustre servers all at once == 20:19:46 (1447215586) [ OK ] Started Show Plymouth Power Off Screen.
            sarah Sarah Liu added a comment -

            before downgrade the system, the script did cleanupall to umount

            sarah Sarah Liu added a comment - before downgrade the system, the script did cleanupall to umount

            This is caused by OBD_INCOMPAT_MULTI_RPCS being set on the MDS. It should be cleared if the MDS is unmounted cleanly.

            adilger Andreas Dilger added a comment - This is caused by OBD_INCOMPAT_MULTI_RPCS being set on the MDS. It should be cleared if the MDS is unmounted cleanly.

            People

              pichong Gregoire Pichon
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: