[LU-4472] sanity-quota test_8: mdc_page_locate() ASSERTION( *start <= *hash ) Created: 10/Jan/14  Updated: 27/Feb/14  Resolved: 27/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: MB

Severity: 3
Rank (Obsolete): 12252

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/a7c7a94c-6f50-11e3-ad93-52540035b04c.

The sub-test test_8 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity-quota 8

Client console log:

12:07:36:LustreError: 28141:0:(mdc_request.c:1237:mdc_page_locate()) ASSERTION( *start <= *hash ) failed: start = 0x1,end = 0xfffffffffffffffe,hash = 0x0
12:07:36:LustreError: 28141:0:(mdc_request.c:1237:mdc_page_locate()) LBUG
12:07:36:Pid: 28141, comm: rm
12:07:36:
12:07:36:Call Trace:
12:07:36: [<ffffffffa1325895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
12:07:36: [<ffffffffa1325e97>] lbug_with_loc+0x47/0xb0 [libcfs]
12:07:36: [<ffffffffa03d3443>] mdc_read_page+0x853/0x920 [mdc]
12:07:36: [<ffffffffa0580320>] ? ll_md_blocking_ast+0x0/0x810 [lustre]
12:07:36: [<ffffffff81281436>] ? vsnprintf+0x336/0x5e0
12:07:36: [<ffffffffa03d3577>] mdc_read_entry+0x67/0x390 [mdc]
12:07:36: [<ffffffffa1832f0d>] lmv_read_entry+0x3fd/0xa70 [lmv]
12:07:36: [<ffffffffa053a97c>] ll_dir_entry_start+0xbc/0x330 [lustre]
12:07:36: [<ffffffffa0580320>] ? ll_md_blocking_ast+0x0/0x810 [lustre]
12:07:36: [<ffffffffa053c0c0>] ? ll_update_inode_size+0x0/0x40 [lustre]
12:07:36: [<ffffffff8109f641>] ? in_group_p+0x31/0x40
12:07:36: [<ffffffffa053c693>] ll_dir_read+0x83/0x230 [lustre]
12:07:36: [<ffffffffa057e13d>] ? ll_i2gids+0x3d/0xd0 [lustre]
12:07:36: [<ffffffff81196550>] ? filldir+0x0/0xe0
12:07:36: [<ffffffffa053c95d>] ll_readdir+0x11d/0x3b0 [lustre]
12:07:36: [<ffffffff81196550>] ? filldir+0x0/0xe0
12:07:36: [<ffffffff81196550>] ? filldir+0x0/0xe0
12:07:36: [<ffffffff811967d0>] vfs_readdir+0xc0/0xe0
12:07:36: [<ffffffff81196959>] sys_getdents+0x89/0xf0
12:07:36: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
12:07:36:


 Comments   
Comment by Nathaniel Clark [ 13/Jan/14 ]

Remove offending ASSERT

http://review.whamcloud.com/8821

Comment by Nathaniel Clark [ 17/Jan/14 ]

So the issue appears to be hash_x_index()

static inline __u64 hash_x_index(__u64 hash, int hash64)
{
	if (BITS_PER_LONG == 32 && hash64)
		hash >>= 32;
	/* save hash 0 as index 0 because otherwise we'll save it at
	 * page index end (~0UL) and it causes truncate_inode_pages_range()
	 * to loop forever. */
	return ~0ULL - (hash + !hash);
}

The problem appears to be hash_x_index(0) == hash_x_index(1) == -2ULL, according to the comment it would appear that the indent of http://review.whamcloud.com/8237 (and kernel patch 363090e) would be hash_x_index(0) == 0ULL

Comment by Andreas Dilger [ 25/Jan/14 ]

I think the comment in hash_x_index() is outdated. That was the original proposal, but it means that hash==0 went from 0xffffffffffffffff to 0, which is strange. Instead, it just puts it into the same bucket as hash==1 where one would expect it to be.

It may be that the LASSERT() needs to be fixed to handle this one case?

Comment by Jian Yu [ 29/Jan/14 ]

More instances on master branch:
https://maloo.whamcloud.com/test_sets/81732fd2-887d-11e3-b26e-52540035b04c
https://maloo.whamcloud.com/test_sets/1610607e-887e-11e3-b26e-52540035b04c

Comment by Jodi Levi (Inactive) [ 27/Feb/14 ]

Patch landed to Master. Please reopen if this problem persists.

Generated at Sat Feb 10 01:43:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.