[LU-4384] Hit unsupported incompat filesystem feature error after downgrade system from 2.6 to 2.5.0 Created: 13/Dec/13 Updated: 22/May/14 Resolved: 17/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sarah Liu | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB, mn4 | ||
| Environment: | |||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12019 | ||||||||
| Description |
|
Before upgrade, server and client are running 2.5.0 ldiskfs, then upgrade the whole system to lustre-master build #1791, it passed; then downgrade the system to 2.5.0 again, when mounting the OST, got following error: OST console shows: Lustre: DEBUG MARKER: == upgrade-downgrade Lustre version and system information == 11:31:49 (1386963109) Lustre: Lustre: Build Version: 2.5.0-RC1--PRISTINE-2.6.32-358.18.1.el6_lustre.x86_64 LNet: Added LNI 10.10.19.53@tcp [8/256/0/180] LNet: Accept secure, port 988 Lustre: DEBUG MARKER: == upgrade-downgrade End == 11:31:52 (1386963112) LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 33847:0:(ofd_fs.c:588:ofd_server_data_init()) lustre-OST0000: unsupported incompat filesystem feature(s) 10 LustreError: 33847:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-22) LustreError: 33847:0:(obd_config.c:1591:class_config_llog_handler()) MGC10.10.19.62@tcp: cfg command failed: rc = -22 Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f LustreError: 15b-f: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 33730:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-OST0000: -22 LustreError: 33730:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -22 LustreError: 33730:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0000: Can't end config log lustre-client. LustreError: 33730:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0000: failed to disconnect lwp. (rc=-2) LustreError: 33730:0:(obd_config.c:619:class_cleanup()) Device 3 not setup Lustre: server umount lustre-OST0000 complete LustreError: 33730:0:(obd_mount.c:1311:lustre_fill_super()) Unable to mount /dev/sdb1 (-22) [root@wtm-88 ~]# |
| Comments |
| Comment by Andreas Dilger [ 13/Dec/13 ] |
|
The EXT4_FEATURE_INCOMPAT_METABG = 0x10. I don't know why this would be set when upgrading to 2.6. This feature relates to filesystem resizing but should not be enabled by default. What kernel is used for 2.6? It looks like 2.6.32-358.18.1 is used for 2.5. |
| Comment by Sarah Liu [ 16/Dec/13 ] |
|
the kernel used in 2.6 is 2.6.32-358.23.2 |
| Comment by Andreas Dilger [ 18/Dec/13 ] |
|
Sorry, I misunderstood the problem above. This is not an EXT4_INCOMPAT_META_BG flag, this is actually OBD_INCOMPAT_FID being set in the last_rcvd header by 2.6. /** FID is enabled */
#define OBD_INCOMPAT_FID 0x00000010
Strangely, I don't see OBD_INCOMPAT_FID being set in OFD_INCOMPAT_SUPP on master, nor OFD_INCOMPAT_SUPP being used anywhere. It seems this checking has moved over to tgt_lastrcvd.c::tgt_scd[] as part of the unified target patches in |
| Comment by Andreas Dilger [ 18/Dec/13 ] |
|
Actually, it looks like tgt_server_data_init() is unconditionally setting OBD_INCOMPAT_FID on all filesystems. This should be limited to MDT filesystems. What is the meaning of OBD_INCOMPAT_FID on an OST filesystem anyway? Is that for FID_SEQ_NORMAL objects being allocated there? |
| Comment by Sarah Liu [ 20/Dec/13 ] |
|
rolling downgrade also hit this problem on OST LNet: Added LNI 10.10.19.53@tcp [8/256/0/180] LNet: Accept secure, port 988 LNet: 9095:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. LNet: 9096:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. Lustre: 48 MB is too small for debug buffer size, setting it to 128 MB. LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 9242:0:(ofd_fs.c:588:ofd_server_data_init()) lustre-OST0000: unsupported incompat filesystem feature(s) 10 LustreError: 9242:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-22) LustreError: 9242:0:(obd_config.c:1591:class_config_llog_handler()) MGC10.10.19.62@tcp: cfg command failed: rc = -22 Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f LustreError: 15b-f: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.10.19.62@tcp: The configuration from log 'lustre-OST0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 9125:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-OST0000: -22 LustreError: 9125:0:(obd_mount_server.c:1732:server_fill_super()) Unable to start targets: -22 LustreError: 9125:0:(obd_mount_server.c:848:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0000: Can't end config log lustre-client. LustreError: 9125:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0000: failed to disconnect lwp. (rc=-2) LustreError: 9125:0:(obd_config.c:619:class_cleanup()) Device 3 not setup Lustre: server umount lustre-OST0000 complete LustreError: 9125:0:(obd_mount.c:1311:lustre_fill_super()) Unable to mount /dev/sdb1 (-22) [root@wtm-88 ~]# |
| Comment by Mikhail Pershin [ 11/Jan/14 ] |
|
http://review.whamcloud.com/8810 patch to remove unconditional set of OBD_INCOMPAT_FID for all types of filesystems. Now it is set for MDT only as before. |
| Comment by Jodi Levi (Inactive) [ 20/Jan/14 ] |
|
Can this ticket be closed now that Change, 8810 has landed or is more work needed in this ticket? |
| Comment by Andreas Dilger [ 20/Jan/14 ] |
|
I think some more work is still needed to make this handling correct:
|
| Comment by Jodi Levi (Inactive) [ 23/Jan/14 ] |
|
Ok, are we safe to reduce this from blocker at this point? Or do we need to continue to track as a 2.6 blocker until these remaining tasks are completed? |
| Comment by Andreas Dilger [ 25/Jan/14 ] |
|
I think all three issues are still blockers for 2.6, and #1 back porting the fix to 2.4.3 and 2.5.1 is a blocker there, so that users can downgrade again. |
| Comment by Mikhail Pershin [ 04/Feb/14 ] |
|
Andreas, if server supports FID_SEQ_NORMAL and create such objects, does that mean the old client will be incompatible? I had impression that client will work anyway with FID being just converted to OID/SEQ format. So I wonder do we have here incompatible case at all? |
| Comment by Andreas Dilger [ 04/Feb/14 ] |
|
Mike, the use of OBD_INCOMPAT_FID is completely separate from any client interoperability. That is needed to prevent old OST code from mounting the filesystem after it has started to create FID_SEQ_NORMAL objects that the 2.3 and older server does not understand. |
| Comment by Mikhail Pershin [ 27/Feb/14 ] |
|
patches to set OBD_INCOMPAT_FID bit: master - http://review.whamcloud.com/9375 The checking fix is needed only in master and it is landed. |
| Comment by Peter Jones [ 17/Mar/14 ] |
|
Landed for master. Fixes for maintenance branches will be tracked separately. |