[LU-1639] Test failure parallel-scale-nfsv3, test_iorssf Created: 17/Jul/12 Updated: 24/Jul/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1, Lustre 2.8.0, Lustre 2.10.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | yuc2 | ||
| Severity: | 3 |
| Rank (Obsolete): | 5785 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ef641dcc-cd20-11e1-957a-52540035b04c. The sub-test test_iorssf failed with the following error:
Max Write: 19.36 MiB/sec (20.30 MB/sec) Max Read: 15.61 MiB/sec (16.37 MB/sec) Run finished: Fri Jul 13 11:11:40 2012 rm: cannot remove `/mnt/lustre/d0.ior.ssf': Directory not empty parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1 Dumping lctl log to /logdir/test_logs/2012-07-12/lustre-master-el6-x86_64-ofa__713__-7f175835ae28/parallel-scale-nfsv3.test_iorssf.*.1342203102.log |
| Comments |
| Comment by Peter Jones [ 20/Jul/12 ] |
|
Hongchao, could you please look into this one? |
| Comment by Hongchao Zhang [ 23/Jul/12 ] |
|
Hi Sarah, |
| Comment by Sarah Liu [ 26/Jul/12 ] |
|
This can be reproduced on the external OFED build which I tested manually. The content in /mnt/lustre/d0.ior.ssf is iorData Run finished: Thu Jul 26 16:42:27 2012 |
| Comment by Hongchao Zhang [ 27/Jul/12 ] |
|
Hi Sarah, I check the logs at client-4(client 1) and client-3(MDS 1), and the -ENOTEMPTY(-39) is not found. |
| Comment by Sarah Liu [ 27/Jul/12 ] |
|
Hi Hongchao, Here is the build I used for testing, lustre-master-#733-RHEL6-ofa build for both server and client, you may have to manually load module mlx4_ib to make the IB work. http://build.whamcloud.com/job/lustre-master/733/arch=x86_64,build_type=server,distro=el6,ib_stack=ofa/ |
| Comment by Hongchao Zhang [ 05/Aug/12 ] |
|
I have tested it several times without this issue, but it uses TCP for there was a problem when setup with IB (but mix4_ib was loaded). Hi Sarah, is the issue only be reproduced under IB? |
| Comment by Sarah Liu [ 06/Aug/12 ] |
|
Hongchao, Yes, I only see this error in IB |
| Comment by Hongchao Zhang [ 09/Aug/12 ] |
|
status update: there is some problem to reproduce the issue, and have requested some help to do it. |
| Comment by Sarah Liu [ 26/Sep/12 ] |
|
Hit this issue again in interop testing between 2.1.3 client and 2.3-RC1 server |
| Comment by Jian Yu [ 18/Dec/12 ] |
|
Lustre Client: v2_1_4_RC1 https://maloo.whamcloud.com/test_sets/36e3e7b6-487f-11e2-8cdc-52540035b04c |
| Comment by Jian Yu [ 27/Mar/13 ] |
|
Lustre Client: v2_1_5_RC1 The issue occurred again: https://maloo.whamcloud.com/test_sets/c059a4f8-96c8-11e2-9ec7-52540035b04c |
| Comment by Sarah Liu [ 05/Dec/13 ] |
|
Hit this issue in current tag-2.5.52 testing with DNE enabled. https://maloo.whamcloud.com/test_sets/a3b7701c-5d26-11e3-ad71-52540035b04c client and server: lustre-master build #1791 RHEL6 ldiskfs test log Run finished: Wed Dec 4 07:08:49 2013 rm: cannot remove `/mnt/lustre/d0.ior.ssf': Directory not empty parallel-scale-nfsv3 test_iorssf: @@@@@@ FAIL: test_iorssf failed with 1 |
| Comment by Jian Yu [ 11/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/63/ parallel-scale-nfsv3 test iorssf hit this failure: It passed in another test run on the same build: |
| Comment by Jian Yu [ 13/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/67/ The same failure occurred again: |
| Comment by Jian Yu [ 19/Dec/13 ] |
|
Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0) The same failure occurred again: |
| Comment by Jian Yu [ 23/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2 RC2) The same failure occurred: |
| Comment by Jian Yu [ 02/Jan/14 ] |
|
Lustre client: http://build.whamcloud.com/job/lustre-b2_5/5/ The same failure occurred: |
| Comment by Jian Yu [ 26/Jan/14 ] |
|
More instance on Lustre b2_5 branch: |
| Comment by Hongchao Zhang [ 27/Jan/14 ] |
|
this should be an issue related to NFS, which doesn't send deletion requests for the child dentries before deleting the parent dentry by "rm -rf" command https://bugzilla.redhat.com/show_bug.cgi?id=770250 will create a debug patch to verify whether it is the case or not. |
| Comment by Hongchao Zhang [ 27/Jan/14 ] |
|
the patch is tracked at http://review.whamcloud.com/#/c/9009/ Hi YuJian, could you please test with the patch to reproduce the issue, Thanks! |
| Comment by Jian Yu [ 27/Jan/14 ] |
Hi Hongchao, Please add the following test parameters into commit message to reproduce the failure: Test-Parameters: fortestonly allwaysuploadlogs \ envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \ testlist=parallel-scale-nfsv3,parallel-scale-nfsv3 The test name can be specified for multiple times. |
| Comment by Hongchao Zhang [ 30/Jan/14 ] |
|
from the output, this issue can be verified to be the NFS issue, execve("/bin/rm", ["rm", "-rf", "/mnt/lustre/d0.ior.ssf"], mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb6796b000 ... fcntl(4, F_GETFD) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getdents(3, {{d_ino=288230393331613511, d_off=1, d_reclen=24, d_name="."} {d_ino=144115188193296385, d_off=2, d_reclen=24, d_name=".."} {d_ino=288230393331613513, d_off=3, d_reclen=32, d_name="iorData"}}, 262144) = 80 getdents(3, {{d_ino=288230393331613513, d_off=3, d_reclen=32, d_name="iorData"}}, 262144) = 32 getdents(3, 0x7fbb6791e038, 262144) = -1 ELOOP (Too many levels of symbolic links) ELOOP is encountered during read the subdirectories. https://maloo.whamcloud.com/test_logs/f6ddc006-88d3-11e3-b1c0-52540035b04c |
| Comment by Jian Yu [ 07/Mar/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1) The same failure occurred: |
| Comment by Jian Yu [ 05/Jun/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/ The same failure occurred: |
| Comment by Sarah Liu [ 18/Jul/14 ] |
|
Hit this error in lustre-b2_6-RC2 testing https://testing.hpdd.intel.com/test_sets/dd820620-0dc7-11e4-af8b-5254006e85c2 |
| Comment by Sarah Liu [ 08/Jul/15 ] |
|
similar failure: https://testing.hpdd.intel.com/test_sets/dd48bbae-255e-11e5-a713-5254006e85c2 |
| Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ] |
|
Another instance found for interop : 2.5.5 Server/EL6.7 Client |
| Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ] |
|
Encountered same issue for tag 2.7.66 for FULL- EL7.1 Server/EL6.7 Client , master , build# 3314. Another failure for master : Tag 2.7.66 FULL - EL7.1 Server/SLES11 SP3 Client, build# 3314 |
| Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ] |
|
Another instance found for Full tag 2.7.66 - EL7.1 Server/EL6.7 Client, build# 3314 Another instance found for Full tag 2.7.66 -EL7.1 Server/SLES11 SP3 Client, build# 3314 |
| Comment by Saurabh Tandan (Inactive) [ 11/May/18 ] |
|
2.10.3_132 <-> EE3 https://testing.hpdd.intel.com/test_sets/ca38c5f8-509f-11e8-abc3-52540065bddc |