[LU-1233] Test failure on test suite parallel-scale, subtest test_compilebench,no space left Created: 19/Mar/12 Updated: 31/Dec/13 Resolved: 11/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.1.5, Lustre 1.8.9, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.1.6, Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 5184 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7722bc68-70e8-11e1-a89e-5254004bbbd3. The sub-test test_compilebench failed with the following error:
Info required for matching: parallel-scale compilebench |
| Comments |
| Comment by Sarah Liu [ 27/Mar/12 ] |
|
another no space left: https://maloo.whamcloud.com/test_sets/8653c500-76ca-11e1-ae2e-5254004bbbd3 |
| Comment by Jian Yu [ 12/Oct/12 ] |
|
More instances: |
| Comment by Jian Yu [ 06/Dec/12 ] |
|
Lustre Branch: b2_1 performance-sanity: https://maloo.whamcloud.com/test_sets/c2a8267c-3ba0-11e2-b98e-52540035b04c |
| Comment by Peter Jones [ 06/Dec/12 ] |
|
Minh Could you please look at this issue? We need identify the OSTSIZE used for the IB test cluster and do a comparison with the one set on the TCP test cluster so that the settings in autotest for IB clusters can be adjusted accordingly. Thanks Peter |
| Comment by Minh Diep [ 06/Dec/12 ] |
|
Chris confirmed that the OST devices on the ib clusters are only 2G; which means 2G * 7 OST = 14G. This is very small filesystem. Compare to 142G * 7 OST in TCP. I see that the pv on the cluster is [root@client-21-ib ~]# pvdisplay I suggest we increase the lv for oct to use bigger space. How about 10G each? |
| Comment by Peter Jones [ 07/Dec/12 ] |
|
Thanks Minh. Chris can you please comment? |
| Comment by Jian Yu [ 17/Dec/12 ] |
|
Lustre Server: v2_1_4_RC1 Lustre Client: 1.8.8-wc1 Distro/Arch: RHEL5.8/x86_64 The same issue occurred: |
| Comment by Jian Yu [ 17/Dec/12 ] |
|
In performance-sanity, after creating large number files hit out of space issue, those files were not unlinked/removed successfully. So, the test script also needs to be improved. |
| Comment by Jian Yu [ 18/Dec/12 ] |
|
Lustre Client: v2_1_4_RC1 https://maloo.whamcloud.com/test_sets/d03b0306-487d-11e2-8cdc-52540035b04c This issue is blocking the Lustre 2.1.4 release testing on IB network in autotest runs. |
| Comment by Chris Gearing (Inactive) [ 20/Dec/12 ] |
|
The OST size under autotest is the same for IB or TCP why would this issue only effect IB if it is a OST size issue? |
| Comment by Minh Diep [ 21/Dec/12 ] |
|
Are TCP runs using VM or real HW? |
| Comment by Jian Yu [ 20/Jan/13 ] |
|
Lustre Branch: b1_8 MDSSIZE=2097152 OSTSIZE=7416428 The compilebench tests in parallel-scale {,-nfsv3,-nfsv4}all failed with: LustreError: 23875:0:(filter.c:3459:filter_precreate()) create failed rc = -28 Maloo reports: With the following values, the same tests passed on the same Lustre b1_8 build over TCP network on RHEL5.8/x86_64 disto/arch (both server and client): MDSSIZE=2097152 OSTSIZE=11311139 Maloo report: https://maloo.whamcloud.com/test_sessions/820df576-6353-11e2-ae8b-52540035b04c |
| Comment by Jian Yu [ 29/Jan/13 ] |
|
Lustre Branch: b2_1 MDSSIZE=2097152 OSTSIZE=31061817 performance-sanity: https://maloo.whamcloud.com/test_sets/ac776a56-68dd-11e2-ac0a-52540035b04c |
| Comment by Jian Yu [ 13/Mar/13 ] |
|
Lustre Branch: b2_1 The issue is still blocking the tests after performance-sanity in full test group from running under IB network configuration: |
| Comment by Jian Yu [ 22/Mar/13 ] |
|
Lustre Client: 1.8.9-wc1 Lustre Server: v2_1_5_RC1 Network: TCP (1GigE) MDSSIZE=2097152 OSTSIZE=149718677 The tests after performance-sanity were affected by the out of space issue: Dmesg on MDS node showed that: Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh Test preparation: creating 1000000 files. LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 Lustre: DEBUG MARKER: /usr/sbin/lctl mark performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1 |
| Comment by Jian Yu [ 26/May/13 ] |
|
Lustre Branch: b2_1 The tests after performance-sanity were affected by the out of space issue: |
| Comment by Jian Yu [ 30/May/13 ] |
Here is the patch to unlink the files created in performance-sanity.sh through mdsrate-{create,lookup,stat}-*.sh after create/lookup/stat operation fails: With the above patch, the issue was narrowed that only performance-sanity test 4 hit out of space issue, and the tests after test 4 were not affected. The next step is to figure out why test 4 hits out of space issue over IB network. |
| Comment by Jian Yu [ 03/Jun/13 ] |
|
Lustre Client: 1.8.9-wc1 Lustre Server: v2_1_6_RC1 Network: TCP (1GigE) performance-sanity test_8 failed with out of space issue: |
| Comment by Jian Yu [ 14/Nov/13 ] |
The above patch has landed on Lustre b2_1 branch. |
| Comment by Jian Yu [ 02/Dec/13 ] |
|
Patch landed on Lustre b2_4 branch for 2.4.2 and on master branch for 2.6.0. |
| Comment by Jodi Levi (Inactive) [ 04/Dec/13 ] |
|
Can this ticket be closed? |
| Comment by Jian Yu [ 05/Dec/13 ] |
I'll back-port the patch to Lustre b2_5 branch. |
| Comment by Jodi Levi (Inactive) [ 11/Dec/13 ] |
|
Patches have landed to Master. Yu Jian will backport to b2_5 branch. |