[LU-3639] After downgrade from 2.5 to 2.3.0, hit (osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed Created: 25/Jul/13 Updated: 23/Oct/15 Resolved: 23/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Trivial |
| Reporter: | Sarah Liu | Assignee: | nasf (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
before upgrade, server and client: 2.3.0 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9373 |
| Description |
|
This is the same error described in MDS console: LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: Lustre: MGC10.10.4.132@tcp: Reactivating import Lustre: MGS: Logs for fs lustre were removed by user request. All servers must be restarted in order to regenerate the logs. Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000 Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client LustreError: 8000:0:(osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: LustreError: 8000:0:(osd_handler.c:2720:osd_index_try()) LBUG Pid: 8000, comm: llog_process_th Call Trace: [<ffffffffa0379905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0379f17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0e08735>] osd_index_try+0x175/0x620 [osd_ldiskfs] [<ffffffffa0842c08>] fld_index_init+0x88/0x4d0 [fld] [<ffffffffa084013d>] ? fld_cache_init+0x14d/0x430 [fld] [<ffffffffa083ba3e>] fld_server_init+0x29e/0x450 [fld] [<ffffffffa0d5c1b6>] mdt_fld_init+0x126/0x430 [mdt] [<ffffffffa0d61326>] mdt_init0+0x8c6/0x23f0 [mdt] [<ffffffffa0d5bf49>] ? mdt_key_init+0x59/0x1a0 [mdt] [<ffffffffa0d62f43>] mdt_device_alloc+0xf3/0x220 [mdt] [<ffffffffa04cb0d7>] obd_setup+0x1d7/0x2f0 [obdclass] [<ffffffffa04cb3f8>] class_setup+0x208/0x890 [obdclass] [<ffffffffa04d308c>] class_process_config+0xc0c/0x1c30 [obdclass] [<ffffffffa037abe0>] ? cfs_alloc+0x30/0x60 [libcfs] [<ffffffffa04cceb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] [<ffffffffa04d515b>] class_config_llog_handler+0x9bb/0x1610 [obdclass] [<ffffffffa067530b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc] [<ffffffffa049e1f8>] llog_process_thread+0x888/0xd00 [obdclass] [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Pid: 8000, comm: llog_process_th Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1 Call Trace: [<ffffffff814fd58a>] ? panic+0xa0/0x168 [<ffffffffa0379f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0e08735>] ? osd_index_try+0x175/0x620 [osd_ldiskfs] [<ffffffffa0842c08>] ? fld_index_init+0x88/0x4d0 [fld] [<ffffffffa084013d>] ? fld_cache_init+0x14d/0x430 [fld] [<ffffffffa083ba3e>] ? fld_server_init+0x29e/0x450 [fld] [<ffffffffa0d5c1b6>] ? mdt_fld_init+0x126/0x430 [mdt] [<ffffffffa0d61326>] ? mdt_init0+0x8c6/0x23f0 [mdt] [<ffffffffa0d5bf49>] ? mdt_key_init+0x59/0x1a0 [mdt] [<ffffffffa0d62f43>] ? mdt_device_alloc+0xf3/0x220 [mdt] [<ffffffffa04cb0d7>] ? obd_setup+0x1d7/0x2f0 [obdclass] [<ffffffffa04cb3f8>] ? class_setup+0x208/0x890 [obdclass] [<ffffffffa04d308c>] ? class_process_config+0xc0c/0x1c30 [obdclass] [<ffffffffa037abe0>] ? cfs_alloc+0x30/0x60 [libcfs] [<ffffffffa04cceb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] [<ffffffffa04d515b>] ? class_config_llog_handler+0x9bb/0x1610 [obdclass] [<ffffffffa067530b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc] [<ffffffffa049e1f8>] ? llog_process_thread+0x888/0xd00 [obdclass] [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Initializing cgroup subsys cpuset Initializing cgroup subsys cpu |
| Comments |
| Comment by Andreas Dilger [ 08/Aug/13 ] |
|
Is this the 2.3.0 server or IEEL? |
| Comment by Sarah Liu [ 13/Aug/13 ] |
|
Hi Andreas: As Oleg suggested, I will rerun the test with IEEL server and see if it hit the same problem and will update this ticket when I have result. |
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
Sarah, any chance to run this test with IEEL? |
| Comment by Sarah Liu [ 27/Sep/13 ] |
|
Hi Andreas, this is blocked by TEI-578 |
| Comment by Di Wang [ 11/Nov/13 ] |
|
Hmm, it seems FLD object(as a special FID) is not being inserted properly in 2.5, i.e. "fld" is not being insert as a special name. So when downgrade to 2.3, it will use name("fld") to locate FLD, but since the name is not being insert, so it caused the LBUG. Hmm, I saw osd_oi_lookup ignore LOCAL seq, Fan Yong could you please comment? |
| Comment by nasf (Inactive) [ 12/Nov/13 ] |
|
Hi Di, you mean that in Lustre-2.5, the local file FLD, its name "fld", is not correctly inserted as special name, then when downgrade to Lustre-2.3, the old osd_oi_lookup() will try to lookup such special name, then failed? But in fact, in Lustre-2.5, we add the local objects both into OI tables and insert its special name as following: int osd_oi_insert(struct osd_thread_info *info, struct osd_device *osd, const struct lu_fid *fid, const struct osd_inode_id *id, handle_t *th, enum oi_check_flags flags) { … rc = osd_oi_iam_refresh(info, osd_fid2oi(osd, fid), (const struct dt_rec *)oi_id, (const struct dt_key *)oi_fid, th, true); … if (unlikely(fid_seq(fid) == FID_SEQ_LOCAL_FILE)) rc = osd_obj_spec_insert(info, osd, fid, id, th); return rc; } Here is the file one the ldiskfs partition created under Lustre-2.5: # ls -a /mnt/mds1 ./ O/ changelog_catalog lfsck_bookmark lov_objseq oi.16.12 oi.16.17 oi.16.21 oi.16.26 oi.16.30 oi.16.35 oi.16.4 oi.16.44 oi.16.49 oi.16.53 oi.16.58 oi.16.62 quota_master/ ../ OI_scrub changelog_users lfsck_layout oi.16.0 oi.16.13 oi.16.18 oi.16.22 oi.16.27 oi.16.31 oi.16.36 oi.16.40 oi.16.45 oi.16.5 oi.16.54 oi.16.59 oi.16.63 quota_slave/ CATALOGS PENDING/ fld lfsck_namespace oi.16.1 oi.16.14 oi.16.19 oi.16.23 oi.16.28 oi.16.32 oi.16.37 oi.16.41 oi.16.46 oi.16.50 oi.16.55 oi.16.6 oi.16.7 seq_ctl CONFIGS/ REMOTE_PARENT_DIR/ hsm_actions lost+found/ oi.16.10 oi.16.15 oi.16.2 oi.16.24 oi.16.29 oi.16.33 oi.16.38 oi.16.42 oi.16.47 oi.16.51 oi.16.56 oi.16.60 oi.16.8 seq_srv NIDTBL_VERSIONS/ ROOT/ last_rcvd lov_objid oi.16.11 oi.16.16 oi.16.20 oi.16.25 oi.16.3 oi.16.34 oi.16.39 oi.16.43 oi.16.48 oi.16.52 oi.16.57 oi.16.61 oi.16.9 The "fld" is there. I am not sure whether it is your expected or not. |
| Comment by Di Wang [ 12/Nov/13 ] |
|
oh, I mean osd_oi_lookup does not do special lookup for [FID_SEQ_LOCAL_FILE, FLD_INDEX_OID, 0]"fld", and it might cause this problem. See this downgrade process, the filesystem |
| Comment by nasf (Inactive) [ 06/May/14 ] |
|
Here is one patch against b2_3 to resolve the issue: |
| Comment by nasf (Inactive) [ 05/Jun/14 ] |
|
Do we still maintain b2_3? If not, I will abandon the the patch http://review.whamcloud.com/10224 and close the ticket. |
| Comment by nasf (Inactive) [ 12/Jun/14 ] |
|
Downgrade the priority since we have no detailed plan to land more patches to b2_3 recently. |
| Comment by nasf (Inactive) [ 23/Oct/15 ] |
|
Since we will not land more patches to b2_3, close it. |