[LU-7603] sanity test_69: read succeeded, expect -ENOENT Created: 23/Dec/15  Updated: 09/Jan/17  Resolved: 09/Jan/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Yang Sheng
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/8c62edae-a975-11e5-b85c-5254006e85c2.

The sub-test test_69 failed with the following error:

read succeeded, expect -ENOENT

only a small number of these found on a maloo search

Info required for matching: sanity 69



 Comments   
Comment by Bob Glossman (Inactive) [ 29/Dec/15 ]

another on master, sles12 client:
https://testing.hpdd.intel.com/test_sets/495bbe8e-ade7-11e5-bf32-5254006e85c2

Comment by Bob Glossman (Inactive) [ 12/Jan/16 ]

another on master with sles12:
https://testing.hpdd.intel.com/test_sets/ac344f0e-b935-11e5-825c-5254006e85c2

starting to wonder if this is a 100% fail on sles12

Comment by Peter Jones [ 12/Jan/16 ]

Yang Sheng

Could you please look into this one?

Thanks

Peter

Comment by James Nunez (Inactive) [ 13/Jan/16 ]

Another failure on master:
2016-01-12 06:16:29 - https://testing.hpdd.intel.com/test_sets/ac344f0e-b935-11e5-825c-5254006e85c2

Comment by Yang Sheng [ 14/Jan/16 ]

This is interesting. Looks like range_lock was invoked:

170574:00000001:00000001:1.0:1452583769.626693:0:9171:0:(range_lock.c:200:range_lock()) Process entered
170581:00000001:00000001:1.0:1452583769.626703:0:9171:0:(range_lock.c:237:range_lock()) Process leaving (rc=0 : 0 : 0)

But ll_direct_IO was not.

00000080:00000002:1.0:1452583769.626962:0:9171:0:(vvp_io.c:816:vvp_io_read_start()) header@ffff88005f1c3768[0x0, 2049, [0x200000404:0xeab8:0x0] hash]

00000080:00000002:1.0:1452583769.626964:0:9171:0:(vvp_io.c:816:vvp_io_read_start()) Read ino 144115205322894008, 4194304 bytes, offset 4194304, size 8388608
00020000:00000001:1.0:1452583769.628779:0:9171:0:(lov_io.c:531:lov_io_start()) Process entered
00020000:00000001:1.0:1452583769.628781:0:9171:0:(lov_io.c:509:lov_io_call()) Process entered
00000020:00000001:1.0:1452583769.628782:0:9171:0:(cl_io.c:571:cl_io_start()) Process entered

It is very strange. I'll produce a debug patch to see what happened.

Comment by Gerrit Updater [ 14/Jan/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/18008
Subject: LU-7603 tests: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2acde5617c3081c2b1fd7724aa21473bbb27c48f

Comment by Yang Sheng [ 05/Feb/16 ]

I haven't got a instance from autotest system since debug patch. Can other one hit it?

Comment by Yang Sheng [ 17/Feb/16 ]

There some interesting log catch with debug patch while test_69 passed.

02:21:29:[ 5165.102167] Lustre: DEBUG MARKER: == sanity test 69: verify oa2dentry return -ENOENT doesn't LBUG ====================================== 02:20:46 (1455445246)
02:21:29:[ 5166.932255] LustreError: 5490:0:(vvp_io.c:831:vvp_io_read_start()) LU7603: type: 0 
02:21:29:[ 5166.937738] LustreError: 5490:0:(vvp_io.c:831:vvp_io_read_start()) Skipped 102 previous similar messages
02:21:29:[ 5166.941704] LustreError: 5490:0:(vvp_io.c:838:vvp_io_read_start()) LU7603: file: f69.sanity, type: 0, flags=0, ns=1
--------------------------------------------------------------------------------------------------------------------------------------------------------^^^^
02:21:29:[ 5166.944069] LustreError: 5490:0:(vvp_io.c:838:vvp_io_read_start()) Skipped 102 previous similar messages
02:21:29:[ 5166.944071] LustreError: 5490:0:(vvp_io.c:843:vvp_io_read_start()) LU7603: scnt=4194304, nr_segs=1, rv=0, rs = 0
02:21:29:[ 5166.944071] LustreError: 5490:0:(vvp_io.c:843:vvp_io_read_start()) Skipped 102 previous similar messages
02:21:29:[ 5167.848196] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
02:21:29:[ 5168.213028] Lustre: DEBUG MARKER: rc=0;
02:21:29:[ 5168.213028] val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1);
02:21:29:[ 5168.213028] if [[ $? -eq 0 && $val -ne 0 ]]; then
02:21:29:[ 5168.213028] 	echo $(hostname -s): $val;
02:21:29:[ 5168.213028] 	rc=$val;
02:21:29:[ 5168.213028] fi;
02:21:29:[ 5168.213028] exit $rc
02:21:29:[ 5168.642126] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  SKIP: sanity test_71 skipping SLOW test 71
02:21:29:[ 5168.782172] Lustre: DEBUG MARKER: SKIP: sanity test_71 skipping SLOW test 71
02:21:29:[ 5169.109483] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 72a: Test that remove suid works properly \(bug5695\) =================================== 02:20:50 \(1455445250\)

As mark line reveal the flags = 0. It means kiocb->ki_ctx == NULL. That say direct IO flag is disappeared. It is abnormal though test case is passed. I have update debug patch trying to catch detail information for further work.

Comment by Yang Sheng [ 09/Jan/17 ]

Please feel free to reopen it.

Generated at Sat Feb 10 02:10:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.