[LU-12661] sanity test_817: FAIL: failed to execute 'true' command Created: 13/Aug/19  Updated: 23/Jan/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6, Lustre 2.12.7, Lustre 2.12.8
Fix Version/s: Upstream

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Dongyang Li
Resolution: Unresolved Votes: 0
Labels: always_except, rhel8, sles12, sles15

Issue Links:
Related
is related to LU-4398 mdt_object_open_lock() may not flush ... Resolved
is related to LU-10457 open_by_handle_at() in write mode tri... Resolved
is related to LU-8585 All Lustre test suites should pass wi... Open
is related to LU-12511 Prepare lustre for adoption into the ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/60697574-bbc3-11e9-a25b-52540065bddc

test_817 failed with the following error:

== sanity test 817: nfsd won't cache write lock for exec file ======================================== 13:19:00 (1565381940)
/usr/lib64/lustre/tests/sanity.sh: line 21700: /mnt/lustre/nfsexp/true: Text file busy
 sanity test_817: @@@@@@ FAIL: failed to execute 'true' command 

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_817 - failed to execute 'true' command



 Comments   
Comment by James A Simmons [ 10/Sep/19 ]

Is this a ARM only bug?

Comment by James Nunez (Inactive) [ 16/Sep/19 ]

We see this with RHEL8 clients; https://testing.whamcloud.com/test_sets/fff5bbfc-d70c-11e9-9fc9-52540065bddc

Comment by Peter Jones [ 18/Sep/19 ]

Dongyang

Could you please investigate?

Thanks

Peter

Comment by Gerrit Updater [ 08/Nov/19 ]

Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/36712
Subject: LU-12661 tests: skip sanity 817 if kernel version >= 4.14
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5a648bdc968a6bd4bcd776dcd5ad41829e71d92d

Comment by Andreas Dilger [ 15/Nov/19 ]

On new kernels nfsd is not releasing the file after write, it will fail with ETXTBSY regardless of whether the NFS export is backed by a Lustre mount or not. That is because newer kernels delay fput() in the write file descriptor, holding it open with write mode and preventing it from being opened in exec mode.

Comment by Andreas Dilger [ 15/Nov/19 ]

Oleg, isn't there a patch to drop the write mode lock more quickly, or similar, that would fix this?

Comment by Gerrit Updater [ 16/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36712/
Subject: LU-12661 tests: skip sanity 817 if kernel version >= 4.14
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4fed33473ca2964ff19f61fdb8501b2210f923de

Comment by Andreas Dilger [ 09/Jan/20 ]

The patch https://review.whamcloud.com/32265 "LU-4398 llite: do not cache write open lock for exec file" was landed as 2.12.56-76-g6dd9d57bc0, which was just before the 2.12.58 build that this was first reported on, so there is likely a connection. I would have thought the 32265 patch would fix this problem, but maybe it only became evident for the newer kernels because sanity test_817 was re-enabled by that patch?

Comment by Dongyang Li [ 17/Jan/20 ]

I just tested 32265 on a centos8 box to be sure. the issue is still there.

It can be reproduced on 4.14+ without lustre, so again it is a kernel problem.

Comment by Jian Yu [ 10/Feb/20 ]

The same failure also occurred on SLES15 SP1 client (Kernel version 4.12.14-197.29-default):
https://testing.whamcloud.com/test_sets/86cd8382-4a79-11ea-b58e-52540065bddc

Comment by James A Simmons [ 10/Feb/20 ]

I see this with the linux lustre client as well.

Comment by Gerrit Updater [ 16/Apr/20 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38259
Subject: LU-12661 tests: skip sanity 817 if kernel version >= 4.14
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a011003f4a99402c87c758558fe1f643ca2f3708

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38259/
Subject: LU-12661 tests: skip sanity 817 if kernel version >= 4.14
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 42923270537d52d2b90c59a2db16d428b4f5a90c

Comment by Hongchao Zhang [ 07/Aug/20 ]

it also fails on SLES 12.5 (kernel version is 4.12.14-122.20-default)
https://testing.whamcloud.com/test_sets/960091e2-c4eb-423f-a2cc-4c5ec02118a3

Comment by Andreas Dilger [ 09/Sep/20 ]

Dongyang, I see comment-263035 and comment-276924 report failures for SLES 4.12.14 kernels, but the patch skips kernels 4.14 and later. Is that a typo in the patch? Should this test be skipped for kernels >= 4.12 instead?

Comment by Gerrit Updater [ 09/Sep/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39838
Subject: LU-12661 tests: skip sanity 817 for kernel 4.12+
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 16b41461822854124ac074aebe394602909644cc

Comment by Dongyang Li [ 09/Sep/20 ]

Andreas, the patch is only skipping for kernel 4.14 because we were only seeing failures on rhel8, I suspect we might need to skip for kernels >= 4.10, we will see.

Comment by Gerrit Updater [ 10/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39838/
Subject: LU-12661 tests: skip sanity 817 for kernel 4.12+
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3e2c28437404b0ccbd7bbfb8f77788678975b63d

Comment by Gerrit Updater [ 10/Sep/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39863
Subject: LU-12661 tests: skip sanity 817 for kernel 4.12+
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 45c8a5cda0727c8ed4aa0e3d01f1edf22ea783d0

Comment by Alena Nikitenko [ 30/Nov/21 ]

+1 in 2.12.8 tests: https://testing.whamcloud.com/test_sets/f95832ef-44a3-4376-8e8e-7b2bb408d560

== sanity test 817: nfsd won't cache write lock for exec file ======================================== 04:32:28 (1637382748)
/usr/lib64/lustre/tests/sanity.sh: line 22011: /mnt/lustre/nfsexp/true: Text file busy
 sanity test_817: @@@@@@ FAIL: failed to execute 'true' command  
Comment by Gerrit Updater [ 03/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39863/
Subject: LU-12661 tests: skip sanity 817 for kernel 4.12+
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: dbcc2c55e1a23337d2b47ed3a9549784c47b4208

Comment by Gerrit Updater [ 23/Jan/23 ]

"jsimmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49732
Subject: LU-12661 llite: don't take open cach lock for files with exec bit set
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e4ebba6b93ac5eb621efe9f2c3fab6025a3b24a5

Generated at Sat Feb 10 02:54:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.