[LU-5487] Test failure on test suite sanity-lfsck, subtest test_18d Created: 14/Aug/14  Updated: 11/Apr/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 15309

 Description   

This issue was created by maloo for Minh Diep <minh.diep@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b04945f0-236c-11e4-84ee-5254006e85c2.

The sub-test test_18d failed with the following error:

(3.0) MDS1 is not the expected 'scanning-phase2'

Info required for matching: sanity-lfsck 18d



 Comments   
Comment by Oleg Drokin [ 15/Aug/14 ]

I suspect this is the case of background task completing too fast?

Changed after 0s: from 'scanning-phase2' to 'completed'
Waiting 6 secs for update
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: onyx-51vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
Update not seen after 6s: wanted 'scanning-phase2' got 'completed'
 sanity-lfsck test_18d: @@@@@@ FAIL: (3.0) MDS1 is not the expected 'scanning-phase2' 
Comment by Jian Yu [ 08/Jul/16 ]

More failure instance on master branch: https://testing.hpdd.intel.com/test_sets/c53fdc74-403b-11e6-acf3-5254006e85c2

Comment by Dmitry Eremin (Inactive) [ 11/Apr/17 ]

I see the following crash in master:

 16:46:36:[14470.127738] LustreError: 22186:0:(vvp_io.c:345:vvp_io_fini()) ASSERTION( io->ci_type == CIT_WRITE || cl_io_is_trunc(io) ) failed: 
 16:46:36:[14470.132048] LustreError: 22186:0:(vvp_io.c:345:vvp_io_fini()) LBUG
 16:46:36:[14470.134194] Pid: 22186, comm: cat
 16:46:36:[14470.135970] 
 16:46:36:[14470.135970] Call Trace:
 16:46:36:[14470.139113] [<ffffffffa07107f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
 16:46:36:[14470.141168] [<ffffffffa0710861>] lbug_with_loc+0x41/0xb0 [libcfs]
 16:46:36:[14470.143473] [<ffffffffa0c93761>] vvp_io_fini+0x321/0x360 [lustre]
 16:46:36:[14470.145411] [<ffffffffa0beaff2>] ? lov_io_fini+0x282/0x460 [lov]
 16:46:36:[14470.147499] [<ffffffffa0805165>] cl_io_fini+0x75/0x240 [obdclass]
 16:46:36:[14470.149358] [<ffffffffa0c42f73>] ll_file_io_generic+0x2a3/0xb00 [lustre]
 16:46:36:[14470.151383] [<ffffffff81219cff>] ? touch_atime+0x12f/0x160
 16:46:36:[14470.153202] [<ffffffffa0c4409a>] ll_file_aio_read+0x34a/0x3e0 [lustre]
 16:46:36:[14470.155178] [<ffffffffa0c441fe>] ll_file_read+0xce/0x1e0 [lustre]
 16:46:36:[14470.157019] [<ffffffff811fe19e>] vfs_read+0x9e/0x170
 16:46:36:[14470.158806] [<ffffffff811fed6f>] SyS_read+0x7f/0xe0
 16:46:36:[14470.160524] [<ffffffff81696b09>] system_call_fastpath+0x16/0x1b




The code is following:

 	/**
	 * dynamic layout change needed, send layout intent
	 * RPC.
	 */
	if (io->ci_need_write_intent) {
		loff_t start = 0;
		loff_t end = 0;

		LASSERT(io->ci_type == CIT_WRITE || cl_io_is_trunc(io));


Comment by Dmitry Eremin (Inactive) [ 11/Apr/17 ]

This crash was introduced in https://review.whamcloud.com/25317

static int lov_io_rw_iter_init(const struct lu_env *env, const struct cl_io_slice *ios)
{
...
index = lov_lsm_entry(lsm, lio->lis_endpos - 1);
if (index > 0 && !lsm_entry_inited(lsm, index)) {
    io->ci_need_write_intent = 1;
    RETURN(io->ci_result = -ENODATA);
}

So, "io->ci_need_write_intent" can be set to "1" in read also.

Generated at Sat Feb 10 01:51:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.