[LU-1639] Test failure parallel-scale-nfsv3, test_iorssf Created: 17/Jul/12  Updated: 24/Jul/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1, Lustre 2.8.0, Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Unresolved Votes: 1
Labels: yuc2

Severity: 3
Rank (Obsolete): 5785

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ef641dcc-cd20-11e1-957a-52540035b04c.

The sub-test test_iorssf failed with the following error:

test_iorssf failed with 1

Max Write: 19.36 MiB/sec (20.30 MB/sec)
Max Read:  15.61 MiB/sec (16.37 MB/sec)

Run finished: Fri Jul 13 11:11:40 2012
rm: cannot remove `/mnt/lustre/d0.ior.ssf': Directory not empty
 parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1 
Dumping lctl log to /logdir/test_logs/2012-07-12/lustre-master-el6-x86_64-ofa__713__-7f175835ae28/parallel-scale-nfsv3.test_iorssf.*.1342203102.log


 Comments   
Comment by Peter Jones [ 20/Jul/12 ]

Hongchao, could you please look into this one?

Comment by Hongchao Zhang [ 23/Jul/12 ]

Hi Sarah,
Is this issue reproducible, and if it is, could you please list the content in "/mnt/lustre/d0.ior.ssf" in run_ior? thanks!

Comment by Sarah Liu [ 26/Jul/12 ]

This can be reproduced on the external OFED build which I tested manually. The content in /mnt/lustre/d0.ior.ssf is iorData

Run finished: Thu Jul 26 16:42:27 2012
++content: iorData
rm: cannot remove `/mnt/lustre/d0.ior.ssf': Directory not empty
parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1

Comment by Hongchao Zhang [ 27/Jul/12 ]

Hi Sarah,
sorry, I logged in your booked node to see the logs and re-test it and it passed the first time, but the second time failed
and client-4 seems to be stuck!

I check the logs at client-4(client 1) and client-3(MDS 1), and the -ENOTEMPTY(-39) is not found.
which build do you use to test? I plan to test it myself, Thanks!

Comment by Sarah Liu [ 27/Jul/12 ]

Hi Hongchao,

Here is the build I used for testing, lustre-master-#733-RHEL6-ofa build for both server and client, you may have to manually load module mlx4_ib to make the IB work.

http://build.whamcloud.com/job/lustre-master/733/arch=x86_64,build_type=server,distro=el6,ib_stack=ofa/
http://build.whamcloud.com/job/lustre-master/733/arch=x86_64,build_type=client,distro=el6,ib_stack=ofa/

Comment by Hongchao Zhang [ 05/Aug/12 ]

I have tested it several times without this issue, but it uses TCP for there was a problem when setup with IB (but mix4_ib was loaded).

Hi Sarah, is the issue only be reproduced under IB?

Comment by Sarah Liu [ 06/Aug/12 ]

Hongchao,

Yes, I only see this error in IB

Comment by Hongchao Zhang [ 09/Aug/12 ]

status update:

there is some problem to reproduce the issue, and have requested some help to do it.
for the current collected information, this bug could be related to NFS for there is no error found in the logs of Lustre.

Comment by Sarah Liu [ 26/Sep/12 ]

Hit this issue again in interop testing between 2.1.3 client and 2.3-RC1 server
https://maloo.whamcloud.com/test_sets/dc49ce3a-079f-11e2-b8a8-52540035b04c

Comment by Jian Yu [ 18/Dec/12 ]

Lustre Client: v2_1_4_RC1
Lustre Server: 2.1.3
Distro/Arch: RHEL6.3/x86_64
Network: IB (in-kernel OFED)

https://maloo.whamcloud.com/test_sets/36e3e7b6-487f-11e2-8cdc-52540035b04c

Comment by Jian Yu [ 27/Mar/13 ]

Lustre Client: v2_1_5_RC1
Lustre Server: 2.2.0
Distro/Arch: RHEL6.3/x86_64
Network: IB (in-kernel OFED)

The issue occurred again: https://maloo.whamcloud.com/test_sets/c059a4f8-96c8-11e2-9ec7-52540035b04c

Comment by Sarah Liu [ 05/Dec/13 ]

Hit this issue in current tag-2.5.52 testing with DNE enabled.

https://maloo.whamcloud.com/test_sets/a3b7701c-5d26-11e3-ad71-52540035b04c

client and server: lustre-master build #1791 RHEL6 ldiskfs

test log

Run finished: Wed Dec  4 07:08:49 2013
rm: cannot remove `/mnt/lustre/d0.ior.ssf': Directory not empty
 parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1 
Comment by Jian Yu [ 11/Dec/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/63/
Distro/Arch: RHEL6.4/x86_64
Network: TCP

parallel-scale-nfsv3 test iorssf hit this failure:
https://maloo.whamcloud.com/test_sets/9a2f2c16-6022-11e3-abbc-52540035b04c
https://maloo.whamcloud.com/test_sets/90200c6a-5f8a-11e3-85c5-52540035b04c

It passed in another test run on the same build:
https://maloo.whamcloud.com/test_sets/37bd14a2-5cdb-11e3-956b-52540035b04c

Comment by Jian Yu [ 13/Dec/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/67/
Distro/Arch: RHEL6.4/x86_64
Network: TCP

The same failure occurred again:
https://maloo.whamcloud.com/test_sets/7d5d7bb0-6289-11e3-bf4a-52540035b04c

Comment by Jian Yu [ 19/Dec/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/69/ (2.4.2 RC1)

The same failure occurred again:
https://maloo.whamcloud.com/test_sets/808f76b6-6861-11e3-a16f-52540035b04c

Comment by Jian Yu [ 23/Dec/13 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2 RC2)
Distro/Arch: RHEL6.4/x86_64

The same failure occurred:
https://maloo.whamcloud.com/test_sets/726a1618-6a73-11e3-8e21-52540035b04c

Comment by Jian Yu [ 02/Jan/14 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_5/5/
Lustre server: http://build.whamcloud.com/job/lustre-b2_1/220/ (2.1.6)

The same failure occurred:
https://maloo.whamcloud.com/test_sets/cf401804-730d-11e3-9955-52540035b04c

Comment by Jian Yu [ 26/Jan/14 ]

More instance on Lustre b2_5 branch:
https://maloo.whamcloud.com/test_sets/08c7a914-8639-11e3-8155-52540035b04c

Comment by Hongchao Zhang [ 27/Jan/14 ]

this should be an issue related to NFS, which doesn't send deletion requests for the child dentries before deleting the parent dentry by "rm -rf" command

https://bugzilla.redhat.com/show_bug.cgi?id=770250
https://bugzilla.redhat.com/show_bug.cgi?id=814052

will create a debug patch to verify whether it is the case or not.

Comment by Hongchao Zhang [ 27/Jan/14 ]

the patch is tracked at http://review.whamcloud.com/#/c/9009/

Hi YuJian, could you please test with the patch to reproduce the issue, Thanks!

Comment by Jian Yu [ 27/Jan/14 ]

Hi YuJian, could you please test with the patch to reproduce the issue, Thanks!

Hi Hongchao,

Please add the following test parameters into commit message to reproduce the failure:

Test-Parameters: fortestonly allwaysuploadlogs \
envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
testlist=parallel-scale-nfsv3,parallel-scale-nfsv3

The test name can be specified for multiple times.

Comment by Hongchao Zhang [ 30/Jan/14 ]

from the output, this issue can be verified to be the NFS issue,

execve("/bin/rm", ["rm", "-rf", "/mnt/lustre/d0.ior.ssf"],
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb6796b000
...
fcntl(4, F_GETFD)                       = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, {{d_ino=288230393331613511, d_off=1, d_reclen=24, d_name="."} {d_ino=144115188193296385, d_off=2, d_reclen=24, d_name=".."} {d_ino=288230393331613513, d_off=3, d_reclen=32, d_name="iorData"}}, 262144) = 80
getdents(3, {{d_ino=288230393331613513, d_off=3, d_reclen=32, d_name="iorData"}}, 262144) = 32
getdents(3, 0x7fbb6791e038, 262144)     = -1 ELOOP (Too many levels of symbolic links)

ELOOP is encountered during read the subdirectories.

https://maloo.whamcloud.com/test_logs/f6ddc006-88d3-11e3-b1c0-52540035b04c

Comment by Jian Yu [ 07/Mar/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1)
Distro/Arch: RHEL6.5/x86_64
MDSCOUNT=2

The same failure occurred:
https://maloo.whamcloud.com/test_sets/fd7da6b0-a557-11e3-a61d-52540035b04c

Comment by Jian Yu [ 05/Jun/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/
Distro/Arch: RHEL6.5/x86_64

The same failure occurred:
https://maloo.whamcloud.com/test_sets/0ac3bd28-eafe-11e3-966a-52540035b04c

Comment by Sarah Liu [ 18/Jul/14 ]

Hit this error in lustre-b2_6-RC2 testing
server and client: RHEL6 ldiskfs

https://testing.hpdd.intel.com/test_sets/dd820620-0dc7-11e4-af8b-5254006e85c2

Comment by Sarah Liu [ 08/Jul/15 ]

similar failure:

https://testing.hpdd.intel.com/test_sets/dd48bbae-255e-11e5-a713-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ]

Another instance found for interop : 2.5.5 Server/EL6.7 Client
Server: 2.5.5, b2_5_fe/62
Client: master, build# 3303, RHEL 6.7
https://testing.hpdd.intel.com/test_sets/16c36b0c-bb25-11e5-861c-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ]

Encountered same issue for tag 2.7.66 for FULL- EL7.1 Server/EL6.7 Client , master , build# 3314.
https://testing.hpdd.intel.com/test_sets/91eaee06-ca91-11e5-9609-5254006e85c2

Another failure for master : Tag 2.7.66 FULL - EL7.1 Server/SLES11 SP3 Client, build# 3314
https://testing.hpdd.intel.com/test_sets/b154fcf2-ca7b-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for Full tag 2.7.66 - EL7.1 Server/EL6.7 Client, build# 3314
https://testing.hpdd.intel.com/test_sets/91eaee06-ca91-11e5-9609-5254006e85c2

Another instance found for Full tag 2.7.66 -EL7.1 Server/SLES11 SP3 Client, build# 3314
https://testing.hpdd.intel.com/test_sets/b154fcf2-ca7b-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 11/May/18 ]

2.10.3_132 <-> EE3

https://testing.hpdd.intel.com/test_sets/ca38c5f8-509f-11e8-abc3-52540065bddc

Generated at Sat Feb 10 01:18:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.