[LU-7584] sanity test_129: current dir size 24576, previous limit 24576 Created: 18/Dec/15 Updated: 16/May/17 Resolved: 05/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0, Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
EL7.1 Server/EL7.1 Client - DNE |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3eac1b80-a26d-11e5-bdef-5254006e85c2. The sub-test test_129 failed with the following error: CMD: shadow-53vm3 test -e /sys/fs/ldiskfs/dm-3/max_dir_size CMD: shadow-53vm3 echo 0 >/sys/fs/ldiskfs/dm-3/max_dir_size return code 28 received as expected current dir size 24576, previous limit 24576 Appears to be similar to |
| Comments |
| Comment by Sarah Liu [ 21/Dec/15 ] |
|
it looks similar as |
| Comment by Bob Glossman (Inactive) [ 11/Jan/16 ] |
|
another on master: |
| Comment by Sarah Liu [ 11/Jan/16 ] |
|
https://testing.hpdd.intel.com/test_sets/173b63f6-b575-11e5-bf32-5254006e85c2 another instance, not sure if it is the same case == sanity test 129: test directory size limit ========================== 12:59:04 (1452113944) striped dir -i1 -c2 /mnt/lustre/d129.sanity open(O_RDWR|O_CREAT): No space left on device return code 28 received as expected sanity test_129: @@@@@@ FAIL: current dir size 8192, previous limit 24576 |
| Comment by James Nunez (Inactive) [ 13/Jan/16 ] |
|
More failures on master: |
| Comment by Bob Glossman (Inactive) [ 20/Jan/16 ] |
|
more on master: |
| Comment by Sarah Liu [ 20/Jan/16 ] |
|
more instance on master |
| Comment by Bob Glossman (Inactive) [ 21/Jan/16 ] |
|
another instance on master, sles12sp1 client/server: |
| Comment by Bob Glossman (Inactive) [ 21/Jan/16 ] |
|
I'm starting to think this problem may be a kernel version issue. many (all?) of the instances reported here are el7 or sles12. |
| Comment by Bob Glossman (Inactive) [ 21/Jan/16 ] |
|
not seen in most recent test runs of el7.2 on master. continues to be seen a lot in recent test runs of sles12sp1 on master. |
| Comment by James A Simmons [ 21/Jan/16 ] |
|
Yes I concur. I have been testing the upstream client and seeing this same error. |
| Comment by Bob Glossman (Inactive) [ 26/Jan/16 ] |
|
another on master, with el7.2: I think this bug is becoming a serious blocker for el7 and sles12. |
| Comment by Bob Glossman (Inactive) [ 26/Jan/16 ] |
|
again on master with el7.2: seem to be hitting this 100% with el7.2 on master |
| Comment by Jian Yu [ 27/Jan/16 ] |
|
As per http://review.whamcloud.com/15548, ext4-give-warning-with-dir-htree-growing.patch is also needed in the new ldiskfs-3.10-rhel7.2.series and ldiskfs-3.12-sles12.series. I'm creating the patch. |
| Comment by Gerrit Updater [ 27/Jan/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/18169 |
| Comment by Jian Yu [ 27/Jan/16 ] |
|
The above patch just added warning messages while directory size growing but didn't resolve the issue in this ticket. All of the failure instances occurred under DNE configuration. Di, do you think the following comparison in the current sanity test_129() is correct under DNE configuration? I=$(stat -c%s "$DIR/$tdir") if [ $(lustre_version_code $SINGLEMDS) -lt \ $(version_code 2.4.51) ] then [[ $I -eq $MAX ]] && return 0 else [[ $I -gt $MAX ]] && return 0 fi error_exit "current dir size $I, previous limit $MAX" |
| Comment by Bob Glossman (Inactive) [ 27/Jan/16 ] |
|
http://review.whamcloud.com/#/c/17874 already adds the missing patch to the el7.2 patch series, but the failure still happens anyway. |
| Comment by Di Wang [ 27/Jan/16 ] |
|
Hmm, I saw it only create 1 more file then check if it pass the limit, which seems not right to me. Because the previous ENOSPC failure might happen on different stripes, i.e. if the new file is created in different stripe, then this check will fail. IMHO, probably you can just replace test_mkdir with mkdir and only do single MDS check (of course also removed those STRIPE_COUNT thing), since this test is for checking ldiskfs parameters, this probably makes sense. Or you need create the new file on the specific stripe by lfs mkdir, instead of using multiop, but then you also need find out which stripe is "FULL" here. |
| Comment by Jian Yu [ 27/Jan/16 ] |
|
Thank you, Di. Let me create a patch. |
| Comment by Gerrit Updater [ 28/Jan/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/18192 |
| Comment by Bob Glossman (Inactive) [ 28/Jan/16 ] |
|
another on master: |
| Comment by Saurabh Tandan (Inactive) [ 04/Feb/16 ] |
|
Another instance for FULL - EL7.1 Server/EL7.1 Client - DNE, master, build# 3314 |
| Comment by Gerrit Updater [ 05/Feb/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18192/ |
| Comment by Joseph Gmitter (Inactive) [ 05/Feb/16 ] |
|
Patch has landed for 2.8 |
| Comment by Gerrit Updater [ 27/Apr/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26874 |
| Comment by Gerrit Updater [ 16/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26874/ |