[LU-3639] After downgrade from 2.5 to 2.3.0, hit (osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed Created: 25/Jul/13  Updated: 23/Oct/15  Resolved: 23/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Trivial
Reporter: Sarah Liu Assignee: nasf (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

before upgrade, server and client: 2.3.0
after upgrade, server is 2.5, 2 clients are 2.5, 1 client is 2.3.0
after downgrade, server and client: 2.3.0


Severity: 3
Rank (Obsolete): 9373

 Description   

This is the same error described in LU-2888

MDS console:

LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: 
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: 
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=off. Opts: 
Lustre: MGC10.10.4.132@tcp: Reactivating import
Lustre: MGS: Logs for fs lustre were removed by user request.  All servers must be restarted in order to regenerate the logs.
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client
LustreError: 8000:0:(osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 
LustreError: 8000:0:(osd_handler.c:2720:osd_index_try()) LBUG
Pid: 8000, comm: llog_process_th

Call Trace:
 [<ffffffffa0379905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0379f17>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0e08735>] osd_index_try+0x175/0x620 [osd_ldiskfs]
 [<ffffffffa0842c08>] fld_index_init+0x88/0x4d0 [fld]
 [<ffffffffa084013d>] ? fld_cache_init+0x14d/0x430 [fld]
 [<ffffffffa083ba3e>] fld_server_init+0x29e/0x450 [fld]
 [<ffffffffa0d5c1b6>] mdt_fld_init+0x126/0x430 [mdt]
 [<ffffffffa0d61326>] mdt_init0+0x8c6/0x23f0 [mdt]
 [<ffffffffa0d5bf49>] ? mdt_key_init+0x59/0x1a0 [mdt]
 [<ffffffffa0d62f43>] mdt_device_alloc+0xf3/0x220 [mdt]
 [<ffffffffa04cb0d7>] obd_setup+0x1d7/0x2f0 [obdclass]
 [<ffffffffa04cb3f8>] class_setup+0x208/0x890 [obdclass]
 [<ffffffffa04d308c>] class_process_config+0xc0c/0x1c30 [obdclass]
 [<ffffffffa037abe0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa04cceb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
 [<ffffffffa04d515b>] class_config_llog_handler+0x9bb/0x1610 [obdclass]
 [<ffffffffa067530b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
 [<ffffffffa049e1f8>] llog_process_thread+0x888/0xd00 [obdclass]
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 8000, comm: llog_process_th Not tainted 2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 #1
Call Trace:
 [<ffffffff814fd58a>] ? panic+0xa0/0x168
 [<ffffffffa0379f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0e08735>] ? osd_index_try+0x175/0x620 [osd_ldiskfs]
 [<ffffffffa0842c08>] ? fld_index_init+0x88/0x4d0 [fld]
 [<ffffffffa084013d>] ? fld_cache_init+0x14d/0x430 [fld]
 [<ffffffffa083ba3e>] ? fld_server_init+0x29e/0x450 [fld]
 [<ffffffffa0d5c1b6>] ? mdt_fld_init+0x126/0x430 [mdt]
 [<ffffffffa0d61326>] ? mdt_init0+0x8c6/0x23f0 [mdt]
 [<ffffffffa0d5bf49>] ? mdt_key_init+0x59/0x1a0 [mdt]
 [<ffffffffa0d62f43>] ? mdt_device_alloc+0xf3/0x220 [mdt]
 [<ffffffffa04cb0d7>] ? obd_setup+0x1d7/0x2f0 [obdclass]
 [<ffffffffa04cb3f8>] ? class_setup+0x208/0x890 [obdclass]
 [<ffffffffa04d308c>] ? class_process_config+0xc0c/0x1c30 [obdclass]
 [<ffffffffa037abe0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa04cceb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
 [<ffffffffa04d515b>] ? class_config_llog_handler+0x9bb/0x1610 [obdclass]
 [<ffffffffa067530b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
 [<ffffffffa049e1f8>] ? llog_process_thread+0x888/0xd00 [obdclass]
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffffa049d970>] ? llog_process_thread+0x0/0xd00 [obdclass]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu


 Comments   
Comment by Andreas Dilger [ 08/Aug/13 ]

Is this the 2.3.0 server or IEEL?

Comment by Sarah Liu [ 13/Aug/13 ]

Hi Andreas:
this is 2.3.0 server

As Oleg suggested, I will rerun the test with IEEL server and see if it hit the same problem and will update this ticket when I have result.

Comment by Andreas Dilger [ 24/Sep/13 ]

Sarah, any chance to run this test with IEEL?

Comment by Sarah Liu [ 27/Sep/13 ]

Hi Andreas,

this is blocked by TEI-578

Comment by Di Wang [ 11/Nov/13 ]

Hmm, it seems FLD object(as a special FID) is not being inserted properly in 2.5, i.e. "fld" is not being insert as a special name. So when downgrade to 2.3, it will use name("fld") to locate FLD, but since the name is not being insert, so it caused the LBUG. Hmm, I saw osd_oi_lookup ignore LOCAL seq, Fan Yong could you please comment?

Comment by nasf (Inactive) [ 12/Nov/13 ]

Hi Di, you mean that in Lustre-2.5, the local file FLD, its name "fld", is not correctly inserted as special name, then when downgrade to Lustre-2.3, the old osd_oi_lookup() will try to lookup such special name, then failed?

But in fact, in Lustre-2.5, we add the local objects both into OI tables and insert its special name as following:

int osd_oi_insert(struct osd_thread_info *info, struct osd_device *osd,
                  const struct lu_fid *fid, const struct osd_inode_id *id,
                  handle_t *th, enum oi_check_flags flags)
{
…
        rc = osd_oi_iam_refresh(info, osd_fid2oi(osd, fid),
                               (const struct dt_rec *)oi_id,
                               (const struct dt_key *)oi_fid, th, true);
…
        if (unlikely(fid_seq(fid) == FID_SEQ_LOCAL_FILE))
                rc = osd_obj_spec_insert(info, osd, fid, id, th);
        return rc;
}

Here is the file one the ldiskfs partition created under Lustre-2.5:

# ls -a /mnt/mds1
./                O/                  changelog_catalog  lfsck_bookmark   lov_objseq  oi.16.12  oi.16.17  oi.16.21  oi.16.26  oi.16.30  oi.16.35  oi.16.4   oi.16.44  oi.16.49  oi.16.53  oi.16.58  oi.16.62  quota_master/
../               OI_scrub            changelog_users    lfsck_layout     oi.16.0     oi.16.13  oi.16.18  oi.16.22  oi.16.27  oi.16.31  oi.16.36  oi.16.40  oi.16.45  oi.16.5   oi.16.54  oi.16.59  oi.16.63  quota_slave/
CATALOGS          PENDING/            fld                lfsck_namespace  oi.16.1     oi.16.14  oi.16.19  oi.16.23  oi.16.28  oi.16.32  oi.16.37  oi.16.41  oi.16.46  oi.16.50  oi.16.55  oi.16.6   oi.16.7   seq_ctl
CONFIGS/          REMOTE_PARENT_DIR/  hsm_actions        lost+found/      oi.16.10    oi.16.15  oi.16.2   oi.16.24  oi.16.29  oi.16.33  oi.16.38  oi.16.42  oi.16.47  oi.16.51  oi.16.56  oi.16.60  oi.16.8   seq_srv
NIDTBL_VERSIONS/  ROOT/               last_rcvd          lov_objid        oi.16.11    oi.16.16  oi.16.20  oi.16.25  oi.16.3   oi.16.34  oi.16.39  oi.16.43  oi.16.48  oi.16.52  oi.16.57  oi.16.61  oi.16.9

The "fld" is there. I am not sure whether it is your expected or not.

Comment by Di Wang [ 12/Nov/13 ]

oh, I mean osd_oi_lookup does not do special lookup for [FID_SEQ_LOCAL_FILE, FLD_INDEX_OID, 0]"fld", and it might cause this problem.

See this downgrade process, the filesystem
1. the FS is formatted in 2.3 first, fld is created with the name "fld"
2. then upgrade to 2.5, but in 2.5, it did not lookup "fld" in osd_oi_lookup, and instead creating a new one. (This is obviously wrong)
3. Then downgrade to 2.3, it can not find the old one. So I guess the fld might be deleted somehow because of 2.

Comment by nasf (Inactive) [ 06/May/14 ]

Here is one patch against b2_3 to resolve the issue:

http://review.whamcloud.com/10224

Comment by nasf (Inactive) [ 05/Jun/14 ]

Do we still maintain b2_3? If not, I will abandon the the patch http://review.whamcloud.com/10224 and close the ticket.

Comment by nasf (Inactive) [ 12/Jun/14 ]

Downgrade the priority since we have no detailed plan to land more patches to b2_3 recently.

Comment by nasf (Inactive) [ 23/Oct/15 ]

Since we will not land more patches to b2_3, close it.

Generated at Sat Feb 10 01:35:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.