[LU-3742] Failure on test suite replay-single test_0c Created: 13/Aug/13  Updated: 02/Jun/14  Resolved: 17/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Bob Glossman (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-4470 replay-dual test_21b: FAIL: lustre-MD... Resolved
Severity: 3
Rank (Obsolete): 9659

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/8fb8e4fa-00b1-11e3-b58a-52540035b04c.

The sub-test test_0c failed with the following error:

File exists and it shouldn't

Info required for matching: replay-single 0c



 Comments   
Comment by Jodi Levi (Inactive) [ 14/Aug/13 ]

Bob,
could you have a look at this one?
Thank you!

Comment by Minh Diep [ 28/Aug/13 ]

This issue happens very often and is not on SLES
https://maloo.whamcloud.com/test_sets/0236d094-0fbe-11e3-81e3-52540035b04c

Comment by Stephen Champion [ 04/Dec/13 ]

I've been seeing this very consistently testing 2.4.1/ldiskfs with both SLES and RHEL on a small o2ib cluster.
I also observed it with master/ldiskfs circa 2013.10.16 on sles11sp3.
Haven't tested RHEL/master.

As I understand this test, it looks like replay_barrier fails to set the mds to read only:

00000004:00020000:13.0:1386118021.849836:0:15877:0:(osd_handler.c:1202:osd_ro()) accfs1-MDT0000: ffff8802d91af100 CANNOT BE SET READONLY: rc = -95

so this op from the client succeeds:

00000080:00200000:9.0:1386118022.019764:0:6304:0:(namei.c:931:ll_mknod_generic()) VFS Op:name=f.replay-single.0c,dir=144115188193296385/33554432(ffff880231856ab8) mode 100644 dev 0

when it should not have. So the underlying problem is the failure to set the mds to read only, which I haven't looked into yet.

Comment by Jian Yu [ 10/Jan/14 ]

Lustre build: http://build.whamcloud.com/job/lustre-reviews/20841/
Distro/Arch: SLES11SP3/x86_64 (both server and client, kernel version: 3.0.101-0.8)

The same failure occurred:
https://maloo.whamcloud.com/test_sets/bba9bbc0-7988-11e3-a27b-52540035b04c

replay-vbr also failed:

Started lustre-MDT0000
 replay-vbr test_1b: @@@@@@ FAIL: client-9vm2 not evicted 

Dmesg on MDS:

[34280.771742] Lustre: DEBUG MARKER: == replay-vbr test 1b: open (O_CREAT) checks version of parent ======================================= 23:32:02 (1389252722)
[34281.224214] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.sync_permission=0
[34281.288646] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0000.commit_on_sharing=0
[34281.557155] Lustre: DEBUG MARKER: sync; sync; sync
[34282.532809] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
[34282.604148] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
[34282.638590] LustreError: 25608:0:(osd_handler.c:1267:osd_ro()) lustre-MDT0000: ffff88007f7b2100 CANNOT BE SET READONLY: rc = -95
[34282.638596] LustreError: 25608:0:(osd_handler.c:1267:osd_ro()) Skipped 7 previous similar messages

Maloo report: https://maloo.whamcloud.com/test_sets/43ce86d2-798b-11e3-a27b-52540035b04c

Generated at Sat Feb 10 01:36:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.