[LU-6724] Downgrading from 2.8 with DNE2 patches to 2.5 servers fails: unsupported read-only filesystem feature(s) 2 Created: 15/Jun/15  Updated: 14/Aug/16  Resolved: 14/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: James A Simmons Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: dne2
Environment:

Lustre 2.5.3 servers plus lastest Lustre 2.8 with DNE2 patches.


Issue Links:
Duplicate
Related
is related to LU-6050 Master testing: Unable to set stripin... Resolved
is related to LU-6663 DNE2 directories has very very bad pe... Resolved
Epic/Theme: dne
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This morning I updated to the latest vanilla master and ended up in a state where I could not mount the file system. So I tried migrating back to lustre 2.5 and when I attempted to mount the file system I got these errors:

[ 1025.018232] Lustre: Lustre: Build Version: 2.5.4--CHANGED-2.6.32-431.29.2.el6.atlas.x86_64
[ 1062.767104] LDISKFS-fs (dm-7): recovery complete
[ 1062.791599] LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. quota=on. Opts:
[ 1063.389770] LustreError: 24686:0:(ofd_fs.c:594:ofd_server_data_init()) sultan-OST0000: unsupported read-only filesystem feature(s) 2
[ 1063.412582] LustreError: 24686:0:(obd_config.c:572:class_setup()) setup sultan-OST0000 failed (-22)
[ 1063.421818] LustreError: 24686:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.37.248.67@o2ib1: cfg command failed: rc = -22
[ 1063.433944] Lustre: cmd=cf003 0:sultan-OST0000 1:dev 2:0 3:f
[ 1063.440506] LustreError: 15b-f: MGC10.37.248.67@o2ib1: The configuration from log 'sultan-OST0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
[ 1063.458690] LustreError: 15c-8: MGC10.37.248.67@o2ib1: The configuration from log 'sultan-OST0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 1063.482398] LustreError: 24600:0:(obd_mount_server.c:1254:server_start_targets()) failed to start server sultan-OST0000: -22
[ 1063.493822] LustreError: 24600:0:(obd_mount_server.c:1737:server_fill_super()) Unable to start targets: -22
[ 1063.503768] LustreError: 24600:0:(obd_mount_server.c:847:lustre_disconnect_lwp()) sultan-MDT0000-lwp-OST0000: Can't end config log sultan-client.
[ 1063.516947] LustreError: 24600:0:(obd_mount_server.c:1422:server_put_super()) sultan-OST0000: failed to disconnect lwp. (rc=-2)
[ 1063.528574] LustreError: 24600:0:(obd_config.c:619:class_cleanup()) Device 3 not setup
[ 1063.539611] Lustre: server umount sultan-OST0000 complete
[ 1063.545093] LustreError: 24600:0:(obd_mount.c:1330:lustre_fill_super()) Unable to mount /dev/mapper/sultan-ddn-l0 (-22)
[ 1070.949382] LDISKFS-fs (dm-6): recovery complete
[ 1070.956045] LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on. Opts:
[ 1071.472962] LustreError: 24982:0:(ofd_fs.c:594:ofd_server_data_init()) sultan-OST0004: unsupported read-only filesystem feature(s) 2
[ 1071.495949] LustreError: 24982:0:(obd_config.c:572:class_setup()) setup sultan-OST0004 failed (-22)
[ 1071.505140] LustreError: 24982:0:(obd_config.c:1629:class_config_llog_handler()) MGC10.37.248.67@o2ib1: cfg command failed: rc = -22


 Comments   
Comment by Andreas Dilger [ 15/Jun/15 ]

Copying comment from LU-6663 where this issue was first reported.

The unknown read-only feature "2" appears to be OBD_ROCOMPAT_IDX_IN_IDIF. That feature shouldn't automatically be enabled, and needs active participation from the administrator:

static ssize_t
ldiskfs_osd_index_in_idif_seq_write(struct file *file, const char *buffer,
                                    size_t count, loff_t *off)
{
                LCONSOLE_WARN("%s: OST-index in IDIF has been enabled, "
                              "it cannot be reverted back.\n", osd_name(dev));
                return -EPERM;

James, did you set the index_in_idif feature in /proc, or is there a bug here that needs to be filed? Looking at the code it doesn't appear that this flag could have been set automatically.

Comment by Andreas Dilger [ 16/Jun/15 ]

It might make sense that reading the index_in_idif file will print out a message if it hasn't been enabled yet:

0
Writing a non-zero value to this file means you cannot downgrade below 2.7.0
Comment by James A Simmons [ 17/Jun/15 ]

I just took a look and index_in_idif does not exist in my proc tree. Looking at the in osd_prepare() this means either lsd_feature_rocompat is being set with the wrong flag or OBD_OCD_VERSION macro is broken.

Comment by Andreas Dilger [ 18/Jun/15 ]

The index_in_idif file will not be created in /proc if the filesystem was formatted with 2.7 or later, or if it was enabled at runtime on an older filesystem that was upgraded since it is not possible to disable it after it is set.

Is there a chance that this was set manually, or the filesystem was mounted as 2.7+ the first time after formatting? The only other possibility I can think of is if the last_rcvd file was deleted and recreated on 2.7+ then it would set this feature when the last_rcvd file is created.

Comment by Patrick Farrell (Inactive) [ 18/Jun/15 ]

A detailed description of the corruption and the patch to make it required intervention are in LU-6050.
https://jira.hpdd.intel.com/browse/LU-6050
http://review.whamcloud.com/#/c/13516/

Comment by James A Simmons [ 14/Aug/16 ]

Oh this is a old ticket. This problem doesn't exist anymore. Well you do see the issue with mult-slot but their exist a work around. For those that don't know you umount your 2.8 file system then remount with recovery abort. Then unmount again. After that you can reboot into your 2.5 image and remount with no problem. This clears the multi-slot support from your config logs. Other than that we haven't see migration issues.

Generated at Sat Feb 10 02:02:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.