[LU-7410] After downgrade from 2.8 to 2.5.5, hit unsupported incompat filesystem feature(s) 400 Created: 09/Nov/15  Updated: 20/Jul/17  Resolved: 17/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sarah Liu Assignee: Gregoire Pichon
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

before upgrade: lustre-master #3226 RHEL6.7
after upgrade: lustre-b2_5_fe #62 RHEL6.6


Attachments: HTML File debug-after     HTML File dmesg-after     HTML File dmesg-before     HTML File trace-after    
Issue Links:
Related
is related to LU-5319 Support multiple slots per client in ... Resolved
is related to LU-9788 upgrading ldiskfs on-disk format from... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

1. upgrade system from 2.5.5 RHEL6.6 to master RHEL6.7 PASS
2. downgrade system from master RHEL6.7 to 2.5.5 6.6 FAIL

mount MDS failed

Lustre: DEBUG MARKER: == upgrade-downgrade End == 15:01:41 (1447110101)
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
Lustre: lustre-MDT0000: used disk, loading
LustreError: 12684:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
LustreError: 12684:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
LustreError: 12684:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 12589:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
LustreError: 12589:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
LustreError: 12589:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
LustreError: 12589:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
LustreError: 12589:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
Lustre: 12589:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447110105/real 1447110105]  req@ffff8808352bac00 x1517404919169064/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447110111 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount lustre-MDT0000 complete
LustreError: 12589:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
Lustre: DEBUG MARKER: Using TIMEOUT=100
Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
Lustre: lustre-MDT0000: used disk, loading
LustreError: 13112:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
LustreError: 13112:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
LustreError: 13112:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 13025:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
LustreError: 13025:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
LustreError: 13025:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
LustreError: 13025:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
LustreError: 13025:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
Lustre: 13025:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447110256/real 1447110256]  req@ffff88081d67dc00 x1517404919169104/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447110262 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount lustre-MDT0000 complete
LustreError: 13025:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
Lustre: DEBUG MARKER: Using TIMEOUT=100
Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
[root@onyx-25 ~]# 


 Comments   
Comment by Andreas Dilger [ 10/Nov/15 ]

This is caused by OBD_INCOMPAT_MULTI_RPCS being set on the MDS. It should be cleared if the MDS is unmounted cleanly.

Comment by Sarah Liu [ 10/Nov/15 ]

before downgrade the system, the script did cleanupall to umount

Comment by Sarah Liu [ 11/Nov/15 ]

Hit the same issue when downgrading from master RHEL7 to 2.5.5 RHEL6.6.

before downgrade, the MDS is unmounted

[14588.476494] Lustre: DEBUG MARKER: == upgrade-downgrade Start clean downgrade == 20:19:16 (1447215556)
[14588.857361] Lustre: DEBUG MARKER: == upgrade-downgrade Shutdown the entire Lustre filesystem == 20:19:16 (1447215556)
[14592.877840] LustreError: 3346:0:(client.c:1138:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88040e18c800 x1517502900474884/t0(0) o13->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
[14594.489735] Lustre: lustre-MDT0000: Not available for connect from 10.2.4.56@tcp (stopping)
[14602.944947] Lustre: 28852:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1447215564/real 1447215564]  req@ffff880427623f00 x1517502900474904/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1447215570 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[14603.506132] Lustre: server umount lustre-MDT0000 complete
[14627.485965] Lustre: DEBUG MARKER: == upgrade-downgrade downgrade the Lustre servers all at once == 20:19:46 (1447215586)
[  OK  ] Started Show Plymouth Power Off Screen.
Comment by Peter Jones [ 13/Nov/15 ]

Gregoire

Could you please advise on this one?

Thanks

Peter

Comment by Gregoire Pichon [ 16/Nov/15 ]

To have the OBD_INCOMPAT_MULTI_RPCS incompatiblity flag cleaned, the MDT target must have no client connected when it is unmounted.
Connected clients include both Lustre clients and other MDT targets.

Therefore, if the file system has only one MDT target, unmounting the Lustre clients first will allow clearing the incompatibility flag at MDT target unmount.

If the file system has several MDT targets, then it is required to unmount all MDT targets, and then, for each MDT target (one by one), mount with abort_recovery option and unmount the target. This is mentioned in the LU-5319 test plan, at section "Upgrade / Downgrade".

After clearing the incompatiblity flag, the server can be downgraded to a lower Lustre version.

Comment by Sarah Liu [ 18/Nov/15 ]

Hi Gregoire,

I hit this issue when doing a clean downgrade from master to 2.5.5, which umount all the servers and clients and downgrade them all at once, then tried to mount the system again and failed.

I will try with 2.7.0 and see how it goes.

Comment by Gregoire Pichon [ 19/Nov/15 ]

Sarah,

The issue will be the same with a downgrade to 2.7.0, if you don't perform the additional operation that clears the incompatibility flag.

The important point is that to have the OBD_INCOMPAT_MULTI_RPCS incompatibility flag cleared on the MDT servers, it must unmount all the clients, then unmount the servers, and then additionally operate the "mount with abort_recovery option and unmount of each MDT one by one". After that the nodes can be downgraded to a lower Lustre version.

Comment by Sarah Liu [ 19/Nov/15 ]

Ah I see thank you for the clarification!

Comment by Peter Jones [ 24/Nov/15 ]

If I understand correctly, this is not a bug

Comment by Sarah Liu [ 17/Feb/16 ]

Hello Gregoire,

I hit the same issue recently, on master/tag-2.7.66 and b2_8/tag-2.7.90. I did remount the MDS with option "abort_recovery" and umount it again before downgrading, here is what I saw. The same test passed on tag-2.7.64, do you have any idea why this happens?

on MDS

[root@onyx-25 ~]# mount -t lustre -o abort_recovery /dev/sdb1 /mnt/mds1
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: MGS: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo)
Lustre: Skipped 4 previous similar messages
LustreError: 45919:0:(mdt_handler.c:5735:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device
[root@onyx-25 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/sdb1 on /mnt/mds1 type lustre (rw,abort_recovery)
[root@onyx-25 ~]# Lustre: 23885:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732424/real 1455732424]  req@ffff8808074dfcc0 x1526383585120268/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 520/544 e 0 to 1 dl 1455732429 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: lustre-MDT0000: Connection restored to MGC10.2.4.47@tcp_0 (at 0@lo)


[root@onyx-25 ~]# umount /mnt/mds1
Lustre: Failing over lustre-MDT0000

Lustre: 46030:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455732461/real 1455732461]  req@ffff88080d158cc0 x1526383585120452/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455732467 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount lustre-MDT0000 complete
[root@onyx-25 ~]# 
[root@onyx-25 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)

dmesg of MDS

LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
Lustre: lustre-MDT0000: used disk, loading
LustreError: 10899:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
LustreError: 10899:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
LustreError: 10899:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 10804:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
LustreError: 10804:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
LustreError: 10804:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
LustreError: 10804:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
LustreError: 10804:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
Lustre: 10804:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737214/real 1455737214]  req@ffff8808181aec00 x1526451090227240/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737220 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount lustre-MDT0000 complete
LustreError: 10804:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
Lustre: DEBUG MARKER: Using TIMEOUT=100
Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
Slow work thread pool: Starting up
Slow work thread pool: Ready
FS-Cache: Loaded
NFS: Registering the id_resolver key type
FS-Cache: Netfs 'nfs' registered for caching
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: MGC10.2.4.47@tcp: Connection restored to MGS (at 0@lo)
Lustre: lustre-MDT0000: used disk, loading
LustreError: 34262:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
LustreError: 34262:0:(obd_config.c:572:class_setup()) setup lustre-MDT0000 failed (-22)
LustreError: 34262:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-MDT0000  1:lustre-MDT0000_UUID  2:0  3:lustre-MDT0000-mdtlov  4:f  
LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 34110:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server lustre-MDT0000: -22
LustreError: 34110:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
LustreError: 34110:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
LustreError: 34110:0:(obd_mount_server.c:1422:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
LustreError: 34110:0:(obd_config.c:619:class_cleanup()) Device 5 not setup
Lustre: 34110:0:(client.c:1943:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1455737367/real 1455737367]  req@ffff880412fb3800 x1526451090227280/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1455737373 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount lustre-MDT0000 complete
LustreError: 34110:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount  (-22)
Lustre: DEBUG MARKER: Using TIMEOUT=100
Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
[root@onyx-25 ~]# 
Comment by Gregoire Pichon [ 19/Feb/16 ]

Could you provide the complete test case that was executed ?
How is designed the filesystem (nodes hosting the MGT, MDTs, OSTs, client nodes...) ?
It would be helpful to also provide the full MDS lustre log, including the log before downgrade and log after downgrade.

thanks.

Comment by Sarah Liu [ 01/Mar/16 ]

the complete case is:
1. format and setup the system with 1 MDS(1MDT), 1 OSS(1 OST) and 2 clients with lustre 2.5.5 RHEL6.6; create some data
2. shundown the whole system, umount all nodes
3. upgrade the whole system to b2_8/build #8; only clear the boot disk, keep data disk untouched
4. remount the whole system, check the data, works fine;
5. shudown the whole system again, umount all nodes
6. do additional step, remounting the MDS with abort_recovery option
7. umount the MDS again
8. downgrade all servers and clients to 2.5.5 again without touching the data disk
9. mount MDS failed as above.

Please find the attached for more logs. 'before means before downgrade; after means after downgrade'

Comment by Sarah Liu [ 01/Mar/16 ]

MDS logs before and after downgrade

update: I tried today with b2_8/build #11, manually ran those steps without using script, it doesn't hit the problem.

Comment by Gregoire Pichon [ 02/Mar/16 ]

Could you add to the script some calls to the command "lr_reader <mdt-target-device>" at different places (between steps 5-6, 6-7, 7-8 and 8-9 for example) ?
This could help identify why the incompatibility flag is not cleared.

The output of the command looks like this:

# lr_reader  /dev/sdc
last_rcvd:
  uuid: fs3-MDT0000_UUID
  feature_compat: 0x8
  feature_incompat: 0x61c
  feature_rocompat: 0x1
  last_transaction: 30064771072
  target_index: 0
  mount_count: 44

The flag OBD_INCOMPAT_MULTI_RPCS = 0x400 can be checked within feature_incompat value.

Comment by Jian Yu [ 17/Aug/16 ]

Hi Gregoire,

I performed a basic clean upgrading/downing testing from EE 2.4.2.2 (tag 2.5.42.15) to EE 3.0.1.0 (tag 2.7.16.5) with the following steps:

  1. format and mount EE 2.4.2.2 filesystem with 1 MGS/MDS (1 MDT), 1 OSS (1 OST) and 1 Client
    # lr_reader /dev/sdc
    Reading last_rcvd
    UUID lustre-MDT0000_UUID
    Feature compat=0xc
    Feature incompat=0x21c
    Feature rocompat=0x1
    Last transaction 4294967296
    target index 0
    MDS, index 0
    
  2. unmount the whole filesystem
    # lr_reader /dev/sdc
    Reading last_rcvd
    UUID lustre-MDT0000_UUID
    Feature compat=0xc
    Feature incompat=0x21c
    Feature rocompat=0x1
    Last transaction 4294967296
    target index 0
    MDS, index 0
    
  3. upgrade the whole filesystem to EE 3.0.1.0
  4. mount the whole filesystem
  5. unmount the whole filesystem
    # lr_reader /dev/sdc
    last_rcvd:
      uuid: lustre-MDT0000_UUID
      feature_compat: 0xc
      feature_incompat: 0x61c
      feature_rocompat: 0x1
      last_transaction: 8589934592
      target_index: 0
      mount_count: 2
    
  6. remount MDS and OSS with "-o abort_recovery" option
    # lr_reader /dev/sdc
    last_rcvd:
      uuid: lustre-MDT0000_UUID
      feature_compat: 0xc
      feature_incompat: 0x61c
      feature_rocompat: 0x1
      last_transaction: 12884901888
      target_index: 0
      mount_count: 3
    
  7. unmount MDS and OSS
    # lr_reader /dev/sdc
    last_rcvd:
      uuid: lustre-MDT0000_UUID
      feature_compat: 0xc
      feature_incompat: 0x61c
      feature_rocompat: 0x1
      last_transaction: 12884901888
      target_index: 0
      mount_count: 3
    
  8. downgrade the whole filesystem to EE 2.4.2.2
    # lr_reader /dev/sdc
    Reading last_rcvd
    UUID lustre-MDT0000_UUID
    Feature compat=0xc
    Feature incompat=0x61c
    Feature rocompat=0x1
    Last transaction 12884901888
    target index 0
    MDS, index 0
    
  9. mount MDS still failed:
    LustreError: 24312:0:(mdt_recovery.c:263:mdt_server_data_init()) lustre-MDT0000: unsupported incompat filesystem feature(s) 400
    
    # lr_reader /dev/sdc
    Reading last_rcvd
    UUID lustre-MDT0000_UUID
    Feature compat=0xc
    Feature incompat=0x61c
    Feature rocompat=0x1
    Last transaction 12884901888
    target index 0
    MDS, index 0
    
Comment by Niu Yawei (Inactive) [ 17/Aug/16 ]

Hi, Sarah

How did you umount the MDT in 7th step?

6. do additional step, remounting the MDS with abort_recovery option
7. umount the MDS again

If you didn't use "umount -f", could you try with "umount -f" to see if the problem can be reproduced?

Comment by Sarah Liu [ 17/Aug/16 ]

Hi Niu,

No I didn't use "-f", I will try today and get back to you. thank you!

Comment by Sarah Liu [ 17/Aug/16 ]

Thank you YuJian for the information.

Niu, I tried with "-f" option(step 7) and it seems work, upgrade from EE2.4.2.2 RHEL6.8 to EE3.0.1 RHEL7.2 and downgrade again:
MDS

[root@onyx-27 ~]# lr_reader /dev/sdb1
Reading last_rcvd
UUID lustre-MDT0000_UUID
Feature compat=0xc
Feature incompat=0x21c
Feature rocompat=0x1
Last transaction 17179869184
target index 0
MDS, index 0
[root@onyx-27 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
onyx-4.onyx.hpdd.intel.com:/export/scratch on /scratch type nfs (rw,vers=4,addr=10.2.0.2,clientaddr=10.2.4.65)
/dev/sdb1 on /mnt/mds1 type lustre (rw,acl,user_xattr)

I also record the lc_reader value for all steps for reference:
1. first time mount system under EE2.4.2.2 RHEL6.8

[root@onyx-27 ~]# lr_reader /dev/sdb1
Reading last_rcvd
UUID lustre-MDT0000_UUID
Feature compat=0xc
Feature incompat=0x21c
Feature rocompat=0x1
Last transaction 4294967296
target index 0
MDS, index 0

2. umount MDS

[root@onyx-27 ~]# lr_reader /dev/sdb1
Reading last_rcvd
UUID lustre-MDT0000_UUID
Feature compat=0xc
Feature incompat=0x21c
Feature rocompat=0x1
Last transaction 4294967307
target index 0
MDS, index 0

3. after upgrade to EE3.0.1 RHEL7 and remount

[root@onyx-27 ~]# lr_reader /dev/sdb1
last_rcvd:
  uuid: lustre-MDT0000_UUID
  feature_compat: 0xc
  feature_incompat: 0x61c
  feature_rocompat: 0x1
  last_transaction: 8589934592
  target_index: 0
  mount_count: 2
[root@onyx-27 ~]# 

4. umount again

[root@onyx-27 ~]# lr_reader /dev/sdb1
last_rcvd:
  uuid: lustre-MDT0000_UUID
  feature_compat: 0xc
  feature_incompat: 0x21c
  feature_rocompat: 0x1
  last_transaction: 8589934594
  target_index: 0
  mount_count: 2
[root@onyx-27 ~]# 

5. remount with abort_recovery

[root@onyx-27 ~]# mount -t lustre -o abort_recovery /dev/sdb1 /mnt/mds1
[ 1098.170025] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
[ 1098.827347] LustreError: 23424:0:(mdt_handler.c:5840:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device
[root@onyx-27 ~]# mountg[ 1103.554471] Lustre: 23226:0:(client.c:2029:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1471453302/real 1471453302]  req@ffff8807fd682d00 x1542930245353916/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.74@tcp:28/4 lens 520/544 e 0 to 1 dl 1471453307 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 1103.595823] Lustre: 23226:0:(client.c:2029:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[root@onyx-27 ~]# lr_reader /dev/sdb1
last_rcvd:
  uuid: lustre-MDT0000_UUID
  feature_compat: 0xc
  feature_incompat: 0x21c
  feature_rocompat: 0x1
  last_transaction: 12884901888
  target_index: 0
  mount_count: 3

6. umount with "-f"

[root@onyx-27 ~]# umount -f /mnt/mds1
[ 1321.101538] Lustre: server umount lustre-MDT0000 complete
[root@onyx-27 ~]# lr_reader /dev/sdb1
last_rcvd:
  uuid: lustre-MDT0000_UUID
  feature_compat: 0xc
  feature_incompat: 0x21c
  feature_rocompat: 0x1
  last_transaction: 12884901888
  target_index: 0
  mount_count: 3
[root@onyx-27 ~]# 

7. downgrade the system to EE2.4.2.2 and mount again

[root@onyx-27 ~]# lr_reader /dev/sdb1
Reading last_rcvd
UUID lustre-MDT0000_UUID
Feature compat=0xc
Feature incompat=0x21c
Feature rocompat=0x1
Last transaction 17179869184
target index 0
MDS, index 0
Comment by Peter Jones [ 17/Aug/16 ]

I think that we can safely close this out from a community release point of view. Upgrade/downgrade from 2.5.x to 2.9 is outside the official scope of the release and there is a viable workaround for those who want to try this anyway.

Generated at Sat Feb 10 02:08:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.