[LU-4384] Hit unsupported incompat filesystem feature error after downgrade system from 2.6 to 2.5.0 Created: 13/Dec/13  Updated: 22/May/14  Resolved: 17/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Blocker
Reporter: Sarah Liu Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: MB, mn4
Environment:

Issue Links:
Related
is related to LU-3467 Unified request handler on OST Resolved
Severity: 3
Rank (Obsolete): 12019

 Description   

Before upgrade, server and client are running 2.5.0 ldiskfs, then upgrade the whole system to lustre-master build #1791, it passed; then downgrade the system to 2.5.0 again, when mounting the OST, got following error:

OST console shows:

Lustre: DEBUG MARKER: == upgrade-downgrade Lustre version and system information == 11:31:49 (1386963109)
Lustre: Lustre: Build Version: 2.5.0-RC1--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64
LNet: Added LNI 10.10.19.53@tcp [8/256/0/180]
LNet: Accept secure, port 988
Lustre: DEBUG MARKER: == upgrade-downgrade End == 11:31:52 (1386963112)
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 33847:0:(ofd_fs.c:588:ofd_server_data_init()) lustre-OST0000: unsupported incompat filesystem feature(s) 10
LustreError: 33847:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-22)
LustreError: 33847:0:(obd_config.c:1591:class_config_llog_handler()) MGC10.10.19.62@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-OST0000  1:dev  2:0  3:f  
LustreError: 15b-f: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 33730:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-OST0000: -22
LustreError: 33730:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -22
LustreError: 33730:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0000: Can't end config log lustre-client.
LustreError: 33730:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0000: failed to disconnect lwp. (rc=-2)
LustreError: 33730:0:(obd_config.c:619:class_cleanup()) Device 3 not setup
Lustre: server umount lustre-OST0000 complete
LustreError: 33730:0:(obd_mount.c:1311:lustre_fill_super()) Unable to mount /dev/sdb1 (-22)
[root@wtm-88 ~]# 


 Comments   
Comment by Andreas Dilger [ 13/Dec/13 ]

The EXT4_FEATURE_INCOMPAT_METABG = 0x10. I don't know why this would be set when upgrading to 2.6. This feature relates to filesystem resizing but should not be enabled by default. What kernel is used for 2.6? It looks like 2.6.32-358.18.1 is used for 2.5.

Comment by Sarah Liu [ 16/Dec/13 ]

the kernel used in 2.6 is 2.6.32-358.23.2

http://build.whamcloud.com/job/lustre-master/1791/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/artifact/artifacts/RPMS/x86_64/

Comment by Andreas Dilger [ 18/Dec/13 ]

Sorry, I misunderstood the problem above. This is not an EXT4_INCOMPAT_META_BG flag, this is actually OBD_INCOMPAT_FID being set in the last_rcvd header by 2.6.

/** FID is enabled */
#define OBD_INCOMPAT_FID        0x00000010

Strangely, I don't see OBD_INCOMPAT_FID being set in OFD_INCOMPAT_SUPP on master, nor OFD_INCOMPAT_SUPP being used anywhere. It seems this checking has moved over to tgt_lastrcvd.c::tgt_scd[] as part of the unified target patches in LU-3467 (http://review.whamcloud.com/7330). I can't yet see how OBD_INCOMPAT_FID is being set, but this is a definite problem for downgrade.

Comment by Andreas Dilger [ 18/Dec/13 ]

Actually, it looks like tgt_server_data_init() is unconditionally setting OBD_INCOMPAT_FID on all filesystems. This should be limited to MDT filesystems. What is the meaning of OBD_INCOMPAT_FID on an OST filesystem anyway? Is that for FID_SEQ_NORMAL objects being allocated there?

Comment by Sarah Liu [ 20/Dec/13 ]

rolling downgrade also hit this problem on OST

LNet: Added LNI 10.10.19.53@tcp [8/256/0/180]
LNet: Accept secure, port 988
LNet: 9095:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
LNet: 9096:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 48 MB is too small for debug buffer size, setting it to 128 MB.
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 9242:0:(ofd_fs.c:588:ofd_server_data_init()) lustre-OST0000: unsupported incompat filesystem feature(s) 10
LustreError: 9242:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-22)
LustreError: 9242:0:(obd_config.c:1591:class_config_llog_handler()) MGC10.10.19.62@tcp: cfg command failed: rc = -22
Lustre:    cmd=cf003 0:lustre-OST0000  1:dev  2:0  3:f  
LustreError: 15b-f: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
LustreError: 15c-8: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 9125:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-OST0000: -22
LustreError: 9125:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -22
LustreError: 9125:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0000: Can't end config log lustre-client.
LustreError: 9125:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0000: failed to disconnect lwp. (rc=-2)
LustreError: 9125:0:(obd_config.c:619:class_cleanup()) Device 3 not setup
Lustre: server umount lustre-OST0000 complete
LustreError: 9125:0:(obd_mount.c:1311:lustre_fill_super()) Unable to mount /dev/sdb1 (-22)
[root@wtm-88 ~]# 
Comment by Mikhail Pershin [ 11/Jan/14 ]

http://review.whamcloud.com/8810

patch to remove unconditional set of OBD_INCOMPAT_FID for all types of filesystems. Now it is set for MDT only as before.

Comment by Jodi Levi (Inactive) [ 20/Jan/14 ]

Can this ticket be closed now that Change, 8810 has landed or is more work needed in this ticket?

Comment by Andreas Dilger [ 20/Jan/14 ]

I think some more work is still needed to make this handling correct:

  • add a patch for b2_4, b2_5, and master to add OBD_INCOMPAT_FID to OFD_INCOMAP_SUPP
  • fix checking of OFD_INCOMAT_SUPP in b2_5 and master
  • set OBD_INCOMPAT_FID on OSTs when FID_SEQ_NORMAL objects are created
Comment by Jodi Levi (Inactive) [ 23/Jan/14 ]

Ok, are we safe to reduce this from blocker at this point? Or do we need to continue to track as a 2.6 blocker until these remaining tasks are completed?

Comment by Andreas Dilger [ 25/Jan/14 ]

I think all three issues are still blockers for 2.6, and #1 back porting the fix to 2.4.3 and 2.5.1 is a blocker there, so that users can downgrade again.

Comment by Mikhail Pershin [ 04/Feb/14 ]

Andreas, if server supports FID_SEQ_NORMAL and create such objects, does that mean the old client will be incompatible? I had impression that client will work anyway with FID being just converted to OID/SEQ format. So I wonder do we have here incompatible case at all?

Comment by Andreas Dilger [ 04/Feb/14 ]

Mike, the use of OBD_INCOMPAT_FID is completely separate from any client interoperability. That is needed to prevent old OST code from mounting the filesystem after it has started to create FID_SEQ_NORMAL objects that the 2.3 and older server does not understand.

Comment by Mikhail Pershin [ 27/Feb/14 ]

patches to set OBD_INCOMPAT_FID bit:

master - http://review.whamcloud.com/9375
b2_4 - http://review.whamcloud.com/9410
b2_5 - http://review.whamcloud.com/9411

The checking fix is needed only in master and it is landed.

Comment by Peter Jones [ 17/Mar/14 ]

Landed for master. Fixes for maintenance branches will be tracked separately.

Generated at Sat Feb 10 01:42:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.