Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0, Lustre 2.1.4
-
None
-
before upgrade, server and client: 2.1.4 RHEL6
after upgrade, server and client: lustre-master build# 1270 RHEL6
-
3
-
6970
Description
Here are what I did:
1. format the system as 2.1.4 and then upgrade to 2.4, success.
2. showdown the filesystem and disable quota
3. downgrade the system to 2.1.4 again, when mount MDS, hit following errors
Here is the console of MDS:
Lustre: DEBUG MARKER: == upgrade-downgrade End == 18:53:45 (1362020025) LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs. LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs. LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs. LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: Lustre: MGS MGS started Lustre: 7888:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 7306ea48-8511-52b2-40cf-6424fc417e41@0@lo t0 exp (null) cur 1362020029 last 0 Lustre: MGC10.10.4.132@tcp: Reactivating import Lustre: MGS: Logs for fs lustre were removed by user request. All servers must be restarted in order to regenerate the logs. Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000 Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client Lustre: Enabling ACL Lustre: Enabling user_xattr LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) LBUG Pid: 7901, comm: llog_process_th Message from Call Trace: syslogd@fat-amd [<ffffffffa03797f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] -1 at Feb 27 18: [<ffffffffa0379e07>] lbug_with_loc+0x47/0xb0 [libcfs] 53:49 ... ker [<ffffffffa0d6bd74>] osd_index_try+0x84/0x540 [osd_ldiskfs] nel:LustreError: [<ffffffffa04c1dfe>] dt_try_as_dir+0x3e/0x60 [obdclass] 7901:0:(osd_han [<ffffffffa0c5eb3a>] orph_index_init+0x6a/0x1e0 [mdd] dler.c:2343:osd_ [<ffffffffa0c6ec45>] mdd_prepare+0x1d5/0x640 [mdd] index_try()) ASS [<ffffffffa0ccd23c>] ? mdt_process_config+0x6c/0x1030 [mdt] ERTION( dt_objec [<ffffffffa0da0499>] cmm_prepare+0x39/0xe0 [cmm] t_exists(dt) ) f [<ffffffffa0ccfd7d>] mdt_device_alloc+0xe0d/0x2190 [mdt] ailed: Me [<ffffffffa04bdeff>] ? keys_fill+0x6f/0x1a0 [obdclass] ssage from syslo [<ffffffffa04a2c87>] obd_setup+0x1d7/0x2f0 [obdclass] gd@fat-amd-1 at [<ffffffffa048ef3b>] ? class_new_export+0x72b/0x960 [obdclass] Feb 27 18:53:49 [<ffffffffa04a2fa8>] class_setup+0x208/0x890 [obdclass] ... kernel:Lu [<ffffffffa04aac6c>] class_process_config+0xc3c/0x1c30 [obdclass] streError: 7901: [<ffffffffa037a993>] ? cfs_alloc+0x63/0x90 [libcfs] 0:(osd_handler.c [<ffffffffa04a5813>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] :2343:osd_index_ [<ffffffffa04acd0b>] class_config_llog_handler+0x9bb/0x1610 [obdclass] try()) LBUG [<ffffffffa0637e3b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc] [<ffffffffa0478098>] llog_process_thread+0x888/0xd00 [obdclass] [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 7901, comm: llog_process_th Not tainted 2.6.32-279.14.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff814fdcba>] ? panic+0xa0/0x168 Message from sy [<ffffffffa0379e5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] slogd@fat-amd-1 [<ffffffffa0d6bd74>] ? osd_index_try+0x84/0x540 [osd_ldiskfs] at Feb 27 18:53: [<ffffffffa04c1dfe>] ? dt_try_as_dir+0x3e/0x60 [obdclass] 49 ... kernel [<ffffffffa0c5eb3a>] ? orph_index_init+0x6a/0x1e0 [mdd] :Kernel panic - [<ffffffffa0c6ec45>] ? mdd_prepare+0x1d5/0x640 [mdd] not syncing: LBU [<ffffffffa0ccd23c>] ? mdt_process_config+0x6c/0x1030 [mdt] G [<ffffffffa0da0499>] ? cmm_prepare+0x39/0xe0 [cmm] [<ffffffffa0ccfd7d>] ? mdt_device_alloc+0xe0d/0x2190 [mdt] [<ffffffffa04bdeff>] ? keys_fill+0x6f/0x1a0 [obdclass] [<ffffffffa04a2c87>] ? obd_setup+0x1d7/0x2f0 [obdclass] [<ffffffffa048ef3b>] ? class_new_export+0x72b/0x960 [obdclass] [<ffffffffa04a2fa8>] ? class_setup+0x208/0x890 [obdclass] [<ffffffffa04aac6c>] ? class_process_config+0xc3c/0x1c30 [obdclass] [<ffffffffa037a993>] ? cfs_alloc+0x63/0x90 [libcfs] [<ffffffffa04a5813>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] [<ffffffffa04acd0b>] ? class_config_llog_handler+0x9bb/0x1610 [obdclass] [<ffffffffa0637e3b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc] [<ffffffffa0478098>] ? llog_process_thread+0x888/0xd00 [obdclass] [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Initializing cgroup subsys cpuset Initializing cgroup subsys cpu
Attachments
Issue Links
- duplicates
-
LU-3574 MDT cannot sync with OST after upgrade from 2.1 to 2.4 then downgrade
-
- Resolved
-
- is related to
-
LU-3267 cl_echo_object_find()) ASSERTION( ostid_seq(&lsm->lsm_wire.lw_object_oi) == FID_SEQ_ECHO ) failed:
-
- Resolved
-
- is related to
-
LU-2684 convert ost_id to lu_fid for FID_SEQ_NORMAL objects
-
- Resolved
-
Di, it appears that this might relate to http://review.whamcloud.com/5820 incorrectly converting struct llog_logid to use struct ost_id. The fields in that structure were already used a bit strangely, and I think that the ostid_to_fid() and fid_to_ostid() macros may be mangling the fields in this structure, since they were never really used as OST id/seq/, but rather MDS inode/0/generation.
On 1.8 the CATALOGS file looks like oid=ino/ogr=0/ogen=gen:
On 2.1 (upgraded from 1.8, if it matters) the CATALOGS file looks like oid=ino/oseq=1/ogen=gen:
On 2.4 (new) the CATALOGS file looks like (f_seq=1/f_oid=oid/ogen=0):
I suspect that the latter change is unintentional, since I think it breaks the on-disk and on-wire llog_logid protocol. I'm just running another test with a pre 6794d7654b4c459519a9e6d85ed439c8c594c2e7 build to see what the CATALOGS file looks like.
Di, could you please work up a patch that reverts the changes to llog_logid and its usage, and we can see if that fixes this latest issue.