[LU-9096] sanity test_253: File creation failed after rm Created: 09/Feb/17 Updated: 22/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0, Lustre 2.10.0, Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | always_except, zfs | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/b06b5a90-e8f0-11e6-935d-5254006e85c2. The sub-test test_253 failed with the following error: CMD: trevis-43vm3 /usr/sbin/lctl get_param -n osp.lustre-OST0000-osc-MDT0000.prealloc_status prealloc_status -28 dd: failed to open '/mnt/lustre/d253.sanity/2': No space left on device 5+0 records in 5+0 records out 5242880 bytes (5.2 MB) copied, 0.0534031 s, 98.2 MB/s CMD: trevis-43vm3 /usr/sbin/lctl set_param -n osd*.*MD*.force_sync 1 CMD: trevis-43vm3 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* : : CMD: trevis-43vm3 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Delete is not completed in 28 seconds Waiting for local destroys to complete CMD: trevis-43vm3 lctl get_param -n lov.*.qos_maxage sanity test_253: @@@@@@ FAIL: File creation failed after rm The MDS console log shows that there are no precreated objects for this OST: 20:44:10:[11571.982460] LustreError: 22503:0:(lod_qos.c:1273:lod_alloc_specific()) can't lstripe objid [0x20000234c:0x13b8:0x0]: have 0 want 1 Info required for matching: sanity 253 |
| Comments |
| Comment by Andreas Dilger [ 10/Feb/17 ] |
20:45:07:[ 4933.488092] Lustre: DEBUG MARKER: == sanity test 253: Check object allocation limit ==================================================== 20:39:14 (1484858354) 20:45:07:[ 5040.092189] INFO: task txg_sync:20227 blocked for more than 120 seconds. 20:45:07:[ 5040.096367] txg_sync D 0000000000000001 0 20227 2 0x00000080 20:45:07:[ 5040.106578] Call Trace: 20:45:07:[ 5040.108189] [<ffffffff8168b9e9>] schedule+0x29/0x70 20:45:07:[ 5040.110193] [<ffffffff81689439>] schedule_timeout+0x239/0x2d0 20:45:07:[ 5040.112031] [<ffffffff810c4ff2>] ? default_wake_function+0x12/0x20 20:45:07:[ 5040.114129] [<ffffffff810ba238>] ? __wake_up_common+0x58/0x90 20:45:07:[ 5040.115970] [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30 20:45:07:[ 5040.118039] [<ffffffff8168af8e>] io_schedule_timeout+0xae/0x130 20:45:07:[ 5040.119922] [<ffffffff810b1416>] ? prepare_to_wait_exclusive+0x56/0x90 20:45:07:[ 5040.122039] [<ffffffff8168b028>] io_schedule+0x18/0x20 20:45:07:[ 5040.123883] [<ffffffffa0662617>] cv_wait_common+0xa7/0x130 [spl] 20:45:07:[ 5040.125886] [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30 20:45:07:[ 5040.127688] [<ffffffffa06626f8>] __cv_wait_io+0x18/0x20 [spl] 20:45:07:[ 5040.129716] [<ffffffffa07bc51b>] zio_wait+0x10b/0x1f0 [zfs] 20:45:07:[ 5040.131471] [<ffffffffa0745cdf>] dsl_pool_sync+0xbf/0x440 [zfs] 20:45:07:[ 5040.133451] [<ffffffffa0760868>] spa_sync+0x388/0xb50 [zfs] 20:45:07:[ 5040.135159] [<ffffffff810b174b>] ? autoremove_wake_function+0x2b/0x40 20:45:07:[ 5040.137112] [<ffffffff81689b92>] ? mutex_lock+0x12/0x2f 20:45:07:[ 5040.138815] [<ffffffffa07724e5>] txg_sync_thread+0x3c5/0x620 [zfs] 20:45:07:[ 5040.142523] [<ffffffffa065d851>] thread_generic_wrapper+0x71/0x80 [spl] 20:45:07:[ 5040.146352] [<ffffffff810b064f>] kthread+0xcf/0xe0 |
| Comment by nasf (Inactive) [ 21/Mar/17 ] |
|
+1 on master: |
| Comment by Shuichi Ihara (Inactive) [ 22/Mar/17 ] |
|
+1 on master |
| Comment by Patrick Farrell (Inactive) [ 28/Mar/17 ] |
|
+1 on master: |
| Comment by Bob Glossman (Inactive) [ 31/Mar/17 ] |
|
another on master: |
| Comment by Bob Glossman (Inactive) [ 03/Apr/17 ] |
|
more on master: |
| Comment by Emoly Liu [ 14/Apr/17 ] |
|
+1 on master: |
| Comment by Peter Jones [ 18/Apr/17 ] |
|
Nathaniel It seems like this test failure only impacts ZFS runs Peter |
| Comment by Andreas Dilger [ 18/Apr/17 ] |
|
Alex, you have a patch https://review.whamcloud.com/18341 that may be related to this problem? |
| Comment by Steve Guminski (Inactive) [ 18/Apr/17 ] |
|
Another on master: https://testing.hpdd.intel.com/test_sessions/10710b40-d31b-41be-8e9a-07f9c3aec583 |
| Comment by Alex Zhuravlev [ 24/Apr/17 ] |
|
Andreas, that patch prevents indefinit blocking which isn't the case (the tests completed) ? |
| Comment by nasf (Inactive) [ 24/Apr/17 ] |
|
+1 on master: |
| Comment by Sebastien Buisson (Inactive) [ 26/Apr/17 ] |
|
+1 on master: |
| Comment by Bruno Faccini (Inactive) [ 26/Apr/17 ] |
|
+1 on master :https://testing.hpdd.intel.com/test_sets/34f1e788-2aab-11e7-8920-5254006e85c2 |
| Comment by Alex Zhuravlev [ 05/May/17 ] |
|
updated statfs: rc = 0, state, ffree 57923, bavail 52864, avail 206, low 1 sanity/253 sets low to 209: ') osp.lustre-OST0000-osc-MDT0000.reserved_mb_low=209 |
| Comment by Alex Zhuravlev [ 05/May/17 ] |
|
@@ -14318,7 +14318,7 @@ test_253() { ')
solves the problem.. I'm wondering where 10 comes from. |
| Comment by John Hammond [ 05/May/17 ] |
|
> solves the problem.. I'm wondering where 10 comes from. I think it matches the 10 used a few lines before: |
| Comment by John Hammond [ 05/May/17 ] |
|
In the interim it would be good to dump all the osp params when this fails (rather than just prealloc_status). |
| Comment by Nathaniel Clark [ 09/May/17 ] |
|
bzzz, Instead of hard coding a constant, would it make sense to use test_framework.sh::fs_log_size() to ensure there's enough room for the fs log? |
| Comment by Gerrit Updater [ 09/May/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27013 |
| Comment by James Nunez (Inactive) [ 09/May/17 ] |
|
Since sanity 253 is failing intermittently and the correctness of the test is being questioned, I've uploaded a patch to add the test to the ALWAYS_EXCEPT list and, thus, not be run. |
| Comment by Gerrit Updater [ 12/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27013/ |
| Comment by Peter Jones [ 12/May/17 ] |
|
The test is no longer being run so I will remove the fixversion from this ticket and it can be reintroduced when the test has been given a clean bill of health |
| Comment by Andreas Dilger [ 03/Dec/20 ] |
|
For some reason this test is now running again and failing intermittently (about 4x per month). |
| Comment by Andreas Dilger [ 03/Dec/20 ] |
|
It looks like this test was re-enabled by patch https://review.whamcloud.com/33778 " |
| Comment by Sergey Cheremencev [ 29/Jun/23 ] |
|
+1 on master https://testing.whamcloud.com/test_sessions/c590176e-852a-4080-abb9-e601015fbed9 |
| Comment by Andreas Dilger [ 17/Nov/23 ] |
|
Patch https://review.whamcloud.com/51861 "LU-15553 test: mkdir_on_mdt0 in sanity" is adding this subtest to the always_except list again. |