[LU-4428] performance-sanity test_5: mknod(f59999) error: Disk quota exceeded Created: 03/Jan/14  Updated: 11/Apr/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1, Lustre 2.4.3, Lustre 2.5.3, Lustre 2.10.2, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7, Lustre 2.12.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Nathaniel Clark
Resolution: Unresolved Votes: 0
Labels: zfs
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/
FSTYPE=zfs

OSTCOUNT=2
OSTSIZE=8388608


Issue Links:
Duplicate
is duplicated by LU-5146 Test failure performance-sanity test_... Resolved
Severity: 3
Rank (Obsolete): 12168

 Description   

performance-sanity test_5 failed as follows:

Total disk size: 16257024  block-softlimit: 16258048 block-hardlimit: 17070950 inode-softlimit: 55785 inode-hardlimit: 58574
<~snip~>
===== mdsrate-lookup-1dir.sh Test preparation: creating 63715 files.
+ /usr/lib64/lustre/tests/mdsrate --mknod --dir /mnt/lustre/mdsrate/lookup --nfiles 63715 --filefmt 'f%%d'
+ chmod 0777 /mnt/lustre
drwxrwxrwx 9 root root 99840 Apr 14  2107 /mnt/lustre
+ su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca boot ssh -machinefile /tmp/mdsrate-lookup-1dir.machines -np 2 /usr/lib64/lustre/tests/mdsrate --mknod --dir /mnt/lustre/mdsrate/lookup --nfiles 63715 --filefmt 'f%%d' "
<~snip~>
rank 1: mknod(f59999) error: Disk quota exceeded
rank 0: mknod(f58728) error: Disk quota exceeded

Maloo report: https://maloo.whamcloud.com/test_sets/fc281616-73c4-11e3-b4ff-52540035b04c



 Comments   
Comment by Jian Yu [ 12/Jan/14 ]

More instance on Lustre b2_5 branch:
https://maloo.whamcloud.com/test_sets/6355c6e8-7a15-11e3-bce8-52540035b04c
https://maloo.whamcloud.com/test_sets/249915d8-7ec3-11e3-a0a8-52540035b04c
https://maloo.whamcloud.com/test_sets/8728f446-9ac7-11e3-96fb-52540035b04c

Comment by Jian Yu [ 09/Mar/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1)
Distro/Arch: RHEL6.5/x86_64
FSTYPE=zfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/7a0a591c-a607-11e3-8a1b-52540035b04c

Comment by Jian Yu [ 17/Mar/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/73/ (2.4.3 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

https://maloo.whamcloud.com/test_sets/e70e3c7c-ac60-11e3-81d7-52540035b04c

Comment by Jian Yu [ 24/Apr/14 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_5/47/
FSTYPE=zfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/99e3e576-cb70-11e3-95c9-52540035b04c

Comment by Peter Jones [ 24/Apr/14 ]

Nathaniel

Could you please look into this one?

Thanks

Peter

Comment by Jian Yu [ 05/Jun/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/
FSTYPE=zfs

The same failure occurred: https://maloo.whamcloud.com/test_sets/50a6f7f0-ebe5-11e3-82b2-52540035b04c

Comment by Niu Yawei (Inactive) [ 05/Jun/14 ]
inode-softlimit: 55785 inode-hardlimit: 58574

The hardlimit was set to 58574.

/usr/lib64/lustre/tests/mdsrate --mknod --dir /mnt/lustre/mdsrate/lookup --nfiles 63715 --filefmt 'f%%d'

However, mdsrate-lookup-1dir.sh want to create 63715 files.

test-framework.sh set hardlimit based on total inodes for whole cluster. (see setup_quota() ), but mdsrate-lookup-1dir.sh creates files based on free inodes on MDT. (see inodes_available()). I think they'd be unified.

Comment by Nathaniel Clark [ 11/Jun/14 ]

This may have something to do with how free inodes are calculated, I believe it is the minimum of what's available across OSTs and what's available on the MDT. So if that's the case, this would be an issue with the test.

Comment by Jian Yu [ 31/Aug/14 ]

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1)
FSTYPE=zfs

The same failure occurred: https://testing.hpdd.intel.com/test_sets/48d589c2-311f-11e4-b503-5254006e85c2

Comment by Nathaniel Clark [ 18/Jan/18 ]

Testing on b2_10 (after 2.10.2) (302e4ec4bd923b79690d8c9e7004e0ae0c67be98)
https://testing.hpdd.intel.com/test_sets/e98a2540-e60b-11e7-8027-52540065bddc

MDS dmesg:

[29392.140663] Lustre: lustre-MDT0000: Connection restored to 10.9.6.120@tcp (at 10.9.6.120@tcp)
[29392.144060] Lustre: Skipped 1 previous similar message
[29413.881762] LustreError: 11-0: lustre-OST0005-osc-MDT0000: operation ost_connect to node 10.9.6.120@tcp failed: rc = -11
[29413.885108] LustreError: Skipped 1 previous similar message
Comment by Sarah Liu [ 20/May/18 ]

+1 on b2_10 https://testing.whamcloud.com/test_sets/be7048b2-5c17-11e8-b303-52540065bddc

Generated at Sat Feb 10 01:42:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.