[LU-1191] Test failure on test suite parallel-scale-nfsv3, subtest test_metabench. Created: 05/Mar/12  Updated: 24/May/12  Resolved: 24/May/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-1100 metabench failed on NFSv3/v4 over Lustre Resolved
Severity: 3
Rank (Obsolete): 4385

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6a3c567a-6702-11e1-a9a4-5254004bbbd3.

The sub-test test_metabench failed with the following error:

metabench failed! 1

Info required for matching: parallel-scale-nfsv3 metabench



 Comments   
Comment by Peter Jones [ 09/Mar/12 ]

Bobi

Could you look into this one please?

Thanks

Peter

Comment by Zhenyu Xu [ 13/Mar/12 ]

Tried to manually run it on toro node for 8 times (client-6 as MDS, client-7 as OST, client-8 as client, using lustre_b2_2 build no 7 image), hadn't hit it.

client-8 console
Lustre: DEBUG MARKER: -----============= acceptance-small: parallel-scale-nfsv3 ============----- Mon Mar 12 23:25:44 PDT 2012
LustreError: 9854:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 9854:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
LustreError: 9854:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 9854:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: client ffff8103150efc00 umount complete
Lustre: DEBUG MARKER: only running test metabench
Lustre: DEBUG MARKER: excepting tests:
Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test metabench: metabench ==================================================== 23:25:52 (1331619952)
LustreError: 152-6: Ignoring deprecated mount option 'acl'.
Lustre: MGC10.10.4.6@tcp: Reactivating import
Lustre: Client lustre-client has started
Lustre: 11119:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 11119:0:(debug.c:326:libcfs_debug_str2mask()) Skipped 3 previous similar messages
Lustre: DEBUG MARKER: == parallel-scale-nfsv3 parallel-scale-nfsv3.sh test complete, duration 1599 sec ===================== 23:52:23 (1331621543)
parallel-scale-nfsv3 returned 0
auster result report
[root@client-8 tests]# cat /tmp/test_logs/2012-03-12/232511/results.yml
TestGroup:
    test_group: acc-sm-client-8
    testhost: client-8
    submission: Mon Mar 12 23:25:44 PDT 2012
    user_name: root

Tests:
-
        name: parallel-scale-nfsv3
        description: auster parallel-scale-nfsv3
        submission: Mon Mar 12 23:25:44 PDT 2012
        report_version: 2
        SubTests:
        -
            name: test_metabench
            status: PASS
            duration: 1589
            return_code: 0
            error:
        duration: 1599
        status: PASS
Comment by Peter Jones [ 14/Mar/12 ]

As per Oleg/Andreas, this is a test script issue. It should be fixed but is not a blocker for RC1

Comment by Oleg Drokin [ 14/Mar/12 ]

This seems to be a bug in parallel-scale-nfsv3.sh

in nfsv4 script we see:

zconf_mount_clients $lustre_client $MOUNT \
    "-o user_xattr,flock,32bitapi" || \
    error "mount lustre on $lustre_client failed"

but the nfsv3 one does not have it:

zconf_mount_clients $lustre_client $MOUNT || \
    error "mount lustre on $lustre_client failed"
Comment by Minh Diep [ 14/Mar/12 ]

I thought we don't need 32bitapi for nfsv3

Comment by Andreas Dilger [ 14/Mar/12 ]

Actually, I think the opposite. For 64-bit Lustre and NFS clients, it should be expected that 64-bit (well, 63-bit) directory cookies work correctly, and 32bitapi is (AFAIK) only needed if a Lustre 64-bit client is re-exporting to 32-bit NFS clients that cannot be detected by Lustre (e.g. 32-bit CPU or NFSv2), which is not the case here.

It doesn't make sense to me why exporting a 32bit readdir cookie would avoid hash collisions, rather the opposite. The whole reason that we added 64-bit hash cookies is to avoid hash collisions that could cause readdir to get stuck.

I'd like to get Fan Yong's input on this issue, rather than burying it as a test issue. Our customers are not going to know about using 32bitapi, and I suspect if they did we would begin to see the same problems that we saw in the past before moving to 64-bit hashes.

The actual test failure is reported in https://maloo.whamcloud.com/test_sets/10f76e60-6d83-11e1-9174-5254004bbbd3:

Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test metabench: metabench == 14:19:31 (1331673571)
NFS: directory d0.metabench/TIME_CREATE_007.000 contains a readdir loop. Please contact your server vendor. Offending cookie: 279037783686854792

(Note: it is always good practice to put in the real failure message into a bug to make searching easier, instead of only linking to the Maloo report, which is hard to search and may disappear at some point)

On a related note, having a whole duplicate parallel-scale-nfsv3.sh and parallel-scale-nfsv4.sh, just for a different "setup_nfs 3" vs. "setup_nfs 4" commands, is one reason that issues like the mount options can get skewed between the two patches (still not saying that "32bitapi" mount option is correct, however). It would be less maintenance and less error prone to have a single parallel-scale-nfsv4.sh script that defaults to running NFSv4, and have a simple parallel-scale-nfsv3.sh wrapper that runs it with "3" as an argument and/or environment to have it run NFSv3:

NFSVER=${NFSVER:-4}
Comment by Andreas Dilger [ 14/Mar/12 ]

Note that parallel-scale-nfsv4.sh is also failing metabench for the same reason as nfsv3:

https://maloo.whamcloud.com/test_sets/9d1a0296-6657-11e1-92b1-5254004bbbd3

15:06:32:Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 15:06:31 (1330902391)
15:09:46:NFS: directory d0.metabench/TIME_CREATE_007.000 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 81211680
15:09:50:NFS: directory d0.metabench/TIME_CREATE_007.000 contains a readdir loop.  Please contact your server vendor.  Offending cookie: 81211680

Proc 7: Cant remove directory [/mnt/lustre/d0.metabench/TIME_CREATE_007.000]: Directory not empty

I also notice that many of the "pass" and "fail" tests complete within tens of seconds, and show "0/0" tests being run, which doesn't seem right?

Comment by Minh Diep [ 20/Mar/12 ]

using 32bitapi to mount and run nfsv3 still hit this issue

https://maloo.whamcloud.com/test_sets/51ae130e-72b1-11e1-91cb-5254004bbbd3

Comment by nasf (Inactive) [ 21/Mar/12 ]

15:06:32:Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 15:06:31 (1330902391)
15:09:46:NFS: directory d0.metabench/TIME_CREATE_007.000 contains a readdir loop. Please contact your server vendor. Offending cookie: 81211680
15:09:50:NFS: directory d0.metabench/TIME_CREATE_007.000 contains a readdir loop. Please contact your server vendor. Offending cookie: 81211680

Proc 7: Cant remove directory [/mnt/lustre/d0.metabench/TIME_CREATE_007.000]: Directory not empty

It is strange that the offending cookie "81211680" is too small for Lustre client readdir() packed value. Because according to Lustre current implementation, the 64-bit cookie is composed of two parts, the high 32-bit is for major hash, the low 32-bit is minor hash. Normally, the major hash is non-zero, so the 64-bit cookie should be larger than the offending cookie. So it seems an invalid cookie. Is "32bitapi" used for this case?

Comment by nasf (Inactive) [ 21/Mar/12 ]

Minh, Is there any test failure instance with rhel5-based client (without nfs readdir() loop detect mechanism)?

Comment by nasf (Inactive) [ 22/Mar/12 ]

I have tested metabench against ext4-based (RHEL6 2.6.32-220.el6) nfs export. Got the same error:

==============================
[root@RHEL6-nasf-CSW tests]# metabench -w /mnt/nfs/d0.metabench -c 30400 -C -S -k
Metadata Test <no-name> on 03/19/2012 at 00:07:11

Rank 0 process on node RHEL6-nasf-CSW

[03/19/2012 00:07:11] Entering time_file_creation with proc_id = 0
File Creation Rates from Process 0
totfile tot time create rate interval intv time intv rate
======= ========== ============ ======== ========== ============
1000 0.2808 3561.58 1000 0.2808 3561.58
2000 0.5887 3397.54 1000 0.3079 3247.94
3000 0.8639 3472.66 1000 0.2752 3633.32
4000 1.1539 3466.40 1000 0.2900 3447.75
5000 1.4346 3485.38 1000 0.2806 3563.41
6000 1.7106 3507.59 1000 0.2760 3623.03
7000 2.0276 3452.30 1000 0.3171 3154.04
8000 2.3266 3438.53 1000 0.2989 3345.13
9000 2.6090 3449.60 1000 0.2824 3540.73
10000 2.8833 3468.22 1000 0.2743 3645.30
11000 3.1637 3476.94 1000 0.2804 3566.64
12000 3.4533 3474.91 1000 0.2896 3452.80
13000 3.7503 3466.38 1000 0.2970 3367.09
14000 3.9503 3544.01 1000 0.2000 4999.70
15000 4.1063 3652.88 1000 0.1560 6409.39
16000 4.2914 3728.42 1000 0.1850 5405.00
17000 4.4465 3823.19 1000 0.1552 6444.05
18000 4.5869 3924.23 1000 0.1403 7125.35
19000 4.7420 4006.71 1000 0.1552 6444.92
20000 4.8917 4088.56 1000 0.1497 6682.12
21000 5.0513 4157.37 1000 0.1596 6266.88
22000 5.2084 4223.94 1000 0.1571 6364.04
23000 5.3560 4294.27 1000 0.1476 6776.45
24000 5.5587 4317.52 1000 0.2028 4931.57
25000 5.7547 4344.29 1000 0.1959 5103.73
26000 6.0578 4291.99 1000 0.3031 3299.12
27000 6.2432 4324.70 1000 0.1854 5393.25
28000 6.4053 4371.38 1000 0.1621 6169.64
29000 6.5954 4396.98 1000 0.1901 5259.34
30000 6.7850 4421.54 1000 0.1895 5276.35
30400 6.8420 4443.11 400 0.0571 7006.36

[03/19/2012 00:07:18] Leaving time_file_creation with proc_id = 0
Removed 1003 files in 0.171 seconds
[03/19/2012 00:07:18] FATAL error on process 0
Proc 0: Cant remove directory [/mnt/nfs/d0.metabench/TIME_CREATE_000.000]: Directory not empty
==============================

The dmesg outout:
==============================
EXT4-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts:
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
NFS: directory d0.metabench/TIME_CREATE_000.000 contains a readdir loop. Please contact your server vendor. Offending cookie: 68723253
==============================

The configuration:
==============================
[root@RHEL6-nasf-CSW tests]# mount
/dev/sda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
172.16.114.1:/Users/nasf/Work on /Work type nfs (rw,addr=172.16.114.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/loop3 on /mnt/ISO type iso9660 (rw)
/dev/loop0 on /mnt/lustre type ext4 (rw)
rhel6:/mnt/lustre on /mnt/nfs type nfs (rw,addr=172.16.114.111)
==============================

So it seems not Lustre bug, but some NFS related issues.

Comment by Zhenyu Xu [ 22/Mar/12 ]

it looks like a kernel bug and has been fixed in 3.1 kernel, bugzilla link https://bugzilla.kernel.org/show_bug.cgi?id=38572

Comment by Zhenyu Xu [ 22/Mar/12 ]

patch tracking at http://review.whamcloud.com/2369

Comment by Oleg Drokin [ 22/Mar/12 ]

Good catch.
Unfortunately it does not help us at all in our testing as we use unpatched kernels on clients.

Additionally this helps only with 32bitapi mount option of lustre in my testing, without the option readdir just plain returns nothing for me. This is both with nfs4 and nfsv3, so I imagine we need to adjust the release notes to mention that 32bitapi is mandatory for nfs reexport at this time. Once we get to the root of it we should be able to allow mounting without it for some cases of nfs reexport.

Comment by Peter Jones [ 22/Mar/12 ]

As per discussion we will document this kernel fix in the release notes so dropping as a blocker

Comment by nasf (Inactive) [ 22/Mar/12 ]

Most customer want to use patchless client, but there are known NFS related bugs in the kernel and if they want to reexport Lustre through NFS, I think it is reasonable to require them to patch the kernel before reexporting.

Comment by Andreas Dilger [ 22/Mar/12 ]

Fan Yong,
I could agree that users should use a Lustre-patched kernel for the NFS client, if the server kernel was patched appropriately.

What I'm still wondering about is the problem when not using 32bitapi. Why is that not working properly?

Comment by Zhenyu Xu [ 23/Mar/12 ]

Found out its another kernel issue which FanYong had contributed his effort (commit 832023bffb4b493f230be901f681020caf3ed1f8)

commit message

nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir()

Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Comment by Zhenyu Xu [ 23/Mar/12 ]

without http://review.whamcloud.com/2374, metabench failed at the nfs server (v4, mount w/o 32bitapi).

# metabench -w /mnt/lustre2/ -c 10000 -C -S -k
Metadata Test <no-name> on 03/23/2012 at 10:34:15

Rank   0 process on node test3

[03/23/2012 10:34:15] Entering time_file_creation with proc_id = 0
File Creation Rates from Process 0
totfile    tot time   create rate   interval   intv time     intv rate
=======  ==========  ============   ========  ==========  ============
   1000      5.3044        188.52       1000      5.3044        188.52
   2000     10.5516        189.54       1000      5.2471        190.58
   3000     15.9621        187.95       1000      5.4105        184.83
   4000     21.6978        184.35       1000      5.7357        174.35
   5000     27.0131        185.10       1000      5.3153        188.13
   6000     32.5850        184.13       1000      5.5719        179.47
   7000     38.8467        180.20       1000      6.2617        159.70
   8000     44.1792        181.08       1000      5.3325        187.53
   9000     49.5300        181.71       1000      5.3508        186.89
  10000     55.1659        181.27       1000      5.6360        177.43

[03/23/2012 10:35:10] Leaving time_file_creation with proc_id = 0
Removed 0 files in      0.020 seconds
[03/23/2012 10:35:10] FATAL error on process 0
Proc 0: Cant remove directory [/mnt/lustre2//TIME_CREATE_000.001]: Directory not empty

with it, the same test passed.

Comment by nasf (Inactive) [ 24/Mar/12 ]

I think the two kernel patches from Bobijam can explain the failure cases we met. How do you think? Although those kernel issues have been fixed in newer kernel, as the temporary solution, we can add some entries in Changlog to tell the users how to resolve the issues against current kernel.

Comment by Andreas Dilger [ 24/Mar/12 ]

I think it would be best to patch the server kernel with the needed fixes, and if someone is doing NFS export they need to use the Lustre server kernel to do this until such a time that this problem is fixed in the vendor kernel.

We should also file a bug with RHEL and SLES that these fixes should be back-ported to their distro kernels so that the patches can hopefully be dropped quickly.

Comment by Jian Yu [ 11/May/12 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/194/
Distro/Arch: RHEL5.8/x86_64(server), RHEL6.2/x86_64(client, kernel 2.6.32-220.13.1.el6)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

The same issue occurred: https://maloo.whamcloud.com/test_sets/561adeca-9b3f-11e1-a0a0-52540035b04c

Comment by Peter Jones [ 24/May/12 ]

This should be addressed by the kernel update tracked under LU-1424 and landed for 1.8.8, 2.1.2 and 2.3

Generated at Sat Feb 10 01:14:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.