[LU-7309] replay-single test_70b: no space left on device Created: 16/Oct/15 Updated: 28/Nov/16 Resolved: 28/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | p4hc | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4f0d6d92-718c-11e5-bffb-5254006e85c2. The sub-test test_70b failed with the following error in the client test log: shadow-52vm5: [11429] open ./clients/client0/~dmtmp/PWRPNT/PPTC112.TMP failed for handle 12322 (No space left on device) shadow-52vm5: (11430) ERROR: handle 12322 was not found shadow-52vm5: Child failed with status 1 shadow-52vm1: [11429] open ./clients/client0/~dmtmp/PWRPNT/PPTC112.TMP failed for handle 12322 (No space left on device) shadow-52vm1: (11430) ERROR: handle 12322 was not found shadow-52vm1: Child failed with status 1 Please provide additional information about the failure here. Info required for matching: replay-single 70b |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 19/Oct/15 ] |
|
James, |
| Comment by Sarah Liu [ 15/Dec/15 ] |
|
more instance: |
| Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ] |
|
Server: 2.5.5, b2_5_fe/62 |
| Comment by Joseph Gmitter (Inactive) [ 04/Jan/16 ] |
|
Hongchao, |
| Comment by Hongchao Zhang [ 05/Jan/16 ] |
|
I have analyzed several failed test, and some didn't contain obvious error logs related to "ENOSPC", but it did indicate the problem in https://testing.hpdd.intel.com/test_logs/bdd52dbe-a080-11e5-85ed-5254006e85c2/show_text 00020000:01000000:0.0:1449883124.461043:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0000-osc-MDT0000: turns inactive 00020000:01000000:0.0:1449883124.461045:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0001-osc-MDT0000: turns inactive 00020000:01000000:0.0:1449883124.461046:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0002-osc-MDT0000: turns inactive 00020000:01000000:0.0:1449883124.461048:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0003-osc-MDT0000: turns inactive 00020000:01000000:0.0:1449883124.461048:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0004-osc-MDT0000: turns inactive 00020000:01000000:0.0:1449883124.461049:0:14270:0:(lod_qos.c:218:lod_statfs_and_check()) lustre-OST0005-osc-MDT0000: turns inactive the OSP devices were turned to inactive due to "-ENOTCONN" just between the failed request from client. |
| Comment by Andreas Dilger [ 05/Jan/16 ] |
|
I don't think the client should be retrying, since this is the MDS's job to handle the recovery properly. I think it should hold the create request until the connections to the OSTs are available after restarting the MDS. |
| Comment by Hongchao Zhang [ 06/Jan/16 ] |
|
Okay, I'll try to create the corresponding patch to wait the connections to OSTs for the creation request. |
| Comment by Gerrit Updater [ 06/Jan/16 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/17839 |
| Comment by Gerrit Updater [ 28/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17839/ |
| Comment by Joseph Gmitter (Inactive) [ 28/Jan/16 ] |
|
Landed for 2.8 |