[LU-7584] sanity test_129: current dir size 24576, previous limit 24576 Created: 18/Dec/15  Updated: 16/May/17  Resolved: 05/Feb/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0, Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None
Environment:

EL7.1 Server/EL7.1 Client - DNE
Master, build# 3270


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3eac1b80-a26d-11e5-bdef-5254006e85c2.

The sub-test test_129 failed with the following error:

CMD: shadow-53vm3 test -e /sys/fs/ldiskfs/dm-3/max_dir_size
CMD: shadow-53vm3 echo 0 >/sys/fs/ldiskfs/dm-3/max_dir_size
return code 28 received as expected
current dir size 24576, previous limit 24576

Appears to be similar to LU-2479



 Comments   
Comment by Sarah Liu [ 21/Dec/15 ]

it looks similar as LU-4654

Comment by Bob Glossman (Inactive) [ 11/Jan/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/57908c32-b69d-11e5-88ef-5254006e85c2

Comment by Sarah Liu [ 11/Jan/16 ]

https://testing.hpdd.intel.com/test_sets/173b63f6-b575-11e5-bf32-5254006e85c2

another instance, not sure if it is the same case
server: lustre-master build#2976 RHEL7
client 2.7.1

== sanity test 129: test directory size limit ========================== 12:59:04 (1452113944)
striped dir -i1 -c2 /mnt/lustre/d129.sanity
open(O_RDWR|O_CREAT): No space left on device
return code 28 received as expected
 sanity test_129: @@@@@@ FAIL: current dir size 8192, previous limit 24576 
Comment by James Nunez (Inactive) [ 13/Jan/16 ]

More failures on master:
2016-01-12 00:53:37 - https://testing.hpdd.intel.com/test_sets/a33af3f4-b8ee-11e5-825c-5254006e85c2
2016-01-12 04:15:31 - https://testing.hpdd.intel.com/test_sets/8b6097d2-b913-11e5-80e0-5254006e85c2

Comment by Bob Glossman (Inactive) [ 20/Jan/16 ]

more on master:
https://testing.hpdd.intel.com/test_sets/2ae7caee-bf0b-11e5-b113-5254006e85c2
https://testing.hpdd.intel.com/test_sets/81e6a2f2-bf33-11e5-9bdc-5254006e85c2

Comment by Sarah Liu [ 20/Jan/16 ]

more instance on master
client and server: lustre-master build#3305 RHEL7.1 DNE
https://testing.hpdd.intel.com/test_sets/0ef23214-bc15-11e5-8ede-5254006e85c2

Comment by Bob Glossman (Inactive) [ 21/Jan/16 ]

another instance on master, sles12sp1 client/server:
https://testing.hpdd.intel.com/test_sets/a415d5c2-c08d-11e5-a8e5-5254006e85c2

Comment by Bob Glossman (Inactive) [ 21/Jan/16 ]

I'm starting to think this problem may be a kernel version issue. many (all?) of the instances reported here are el7 or sles12.

Comment by Bob Glossman (Inactive) [ 21/Jan/16 ]

not seen in most recent test runs of el7.2 on master. continues to be seen a lot in recent test runs of sles12sp1 on master.

Comment by James A Simmons [ 21/Jan/16 ]

Yes I concur. I have been testing the upstream client and seeing this same error.

Comment by Bob Glossman (Inactive) [ 26/Jan/16 ]

another on master, with el7.2:
https://testing.hpdd.intel.com/test_sets/1ada50fe-c3e3-11e5-8866-5254006e85c2

I think this bug is becoming a serious blocker for el7 and sles12.

Comment by Bob Glossman (Inactive) [ 26/Jan/16 ]

again on master with el7.2:
https://testing.hpdd.intel.com/test_sets/bc6f6f14-c47c-11e5-a651-5254006e85c2

seem to be hitting this 100% with el7.2 on master

Comment by Jian Yu [ 27/Jan/16 ]

As per http://review.whamcloud.com/15548, ext4-give-warning-with-dir-htree-growing.patch is also needed in the new ldiskfs-3.10-rhel7.2.series and ldiskfs-3.12-sles12.series. I'm creating the patch.

Comment by Gerrit Updater [ 27/Jan/16 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/18169
Subject: LU-7584 ldiskfs: add dir htree growing warning patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f7a1eb405f17c974fe0d2747c0b871629aaf25c

Comment by Jian Yu [ 27/Jan/16 ]

The above patch just added warning messages while directory size growing but didn't resolve the issue in this ticket. All of the failure instances occurred under DNE configuration.

Di, do you think the following comparison in the current sanity test_129() is correct under DNE configuration?

                        I=$(stat -c%s "$DIR/$tdir")

                        if [ $(lustre_version_code $SINGLEMDS) -lt \
                                        $(version_code 2.4.51) ]
                        then
                                [[ $I -eq $MAX ]] && return 0
                        else
                                [[ $I -gt $MAX ]] && return 0
                        fi
                        error_exit "current dir size $I, previous limit $MAX"
Comment by Bob Glossman (Inactive) [ 27/Jan/16 ]

http://review.whamcloud.com/#/c/17874 already adds the missing patch to the el7.2 patch series, but the failure still happens anyway.

Comment by Di Wang [ 27/Jan/16 ]

Hmm, I saw it only create 1 more file then check if it pass the limit, which seems not right to me. Because the previous ENOSPC failure might happen on different stripes, i.e. if the new file is created in different stripe, then this check will fail.

IMHO, probably you can just replace test_mkdir with mkdir and only do single MDS check (of course also removed those STRIPE_COUNT thing), since this test is for checking ldiskfs parameters, this probably makes sense.

Or

you need create the new file on the specific stripe by lfs mkdir, instead of using multiop, but then you also need find out which stripe is "FULL" here.

Comment by Jian Yu [ 27/Jan/16 ]

Thank you, Di. Let me create a patch.

Comment by Gerrit Updater [ 28/Jan/16 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/18192
Subject: LU-7584 tests: create file on single MDS in sanity test 129
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a3518d6fd8e2af08f43c41bb413edee19ee56f64

Comment by Bob Glossman (Inactive) [ 28/Jan/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/be6a1f10-c555-11e5-b0fc-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 04/Feb/16 ]

Another instance for FULL - EL7.1 Server/EL7.1 Client - DNE, master, build# 3314
https://testing.hpdd.intel.com/test_sets/6b7e1dae-cac5-11e5-9609-5254006e85c2

Comment by Gerrit Updater [ 05/Feb/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18192/
Subject: LU-7584 tests: create file on single MDS in sanity test 129
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1299e2aff9af57fc8a79a6fa09c1676a61cbfa4b

Comment by Joseph Gmitter (Inactive) [ 05/Feb/16 ]

Patch has landed for 2.8

Comment by Gerrit Updater [ 27/Apr/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26874
Subject: LU-7584 tests: clean up sanity test_129
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 18a15a480d3e1138ac73a07db0a3418912ee6687

Comment by Gerrit Updater [ 16/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26874/
Subject: LU-7584 tests: clean up sanity test_129
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5611dbe56b95e79df0723e2668f977492abd68ed

Generated at Sat Feb 10 02:10:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.