[LU-11823] sanity-flr test 0b and 0c fails with ‘pool_new failed lustre.test_0b' on ARM Created: 21/Dec/18 Updated: 24/Jun/19 Resolved: 24/Jun/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ubuntu | ||
| Environment: |
Ubuntu 18.04 clients |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
sanity-flr test_0b and test_0c fails to create a new OST pool and fails with error ‘pool_new failed lustre.test_0b'. So far, we see this error only for ARM/PPC clients. Looking at the client test_log for https://testing.whamcloud.com/test_sets/d794bc4c-fdd4-11e8-a97c-52540065bddc , == sanity-flr test 0b: lfs mirror create plain layout mirrors ======================================== 00:37:49 (1544488669) CMD: trevis-19vm4 lctl pool_new lustre.test_0b trevis-19vm4: Pool lustre.test_0b created CMD: trevis-19vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm4 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo Waiting 90 secs for update CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo … Waiting 10 secs for update CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo Update not seen after 90s: wanted '' got 'foo' sanity-flr test_0b: @@@@@@ FAIL: pool_new failed lustre.test_0b We see some errors in the Client 1 (vm1) console log [ 379.473969] Lustre: DEBUG MARKER: == sanity-flr test 0b: lfs mirror create plain layout mirrors ======================================== 00:37:49 (1544488669) [ 380.882615] LustreError: 10433:0:(obd_config.c:1264:class_process_config()) no device for: lustre-clilov-000000007c485d00 [ 380.883815] Lustre: 10433:0:(obd_config.c:1351:class_process_config()) Ignoring error -22 on optional command 0xce020 [ 390.562916] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 390.574202] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 391.586752] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 392.598782] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 393.610433] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 394.622561] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 395.471515] Lustre: lustre-OST0000-osc- (ptrval): disconnect after 22s idle [ 395.472390] Lustre: Skipped 5 previous similar messages [ 395.634478] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo We see similar errors the same test, but the test failure error is different; 'destroy pool failed lustre.test_0b' Looking at the Client 2 (vm9) dmesg log for https://testing.whamcloud.com/test_sets/147f0c12-e2f4-11e8-b67f-52540065bddc , we see [ 210.247534] Lustre: DEBUG MARKER: == sanity-flr test 0b: lfs mirror create plain layout mirrors ======================================== 09:05:40 (1541495140) [ 217.367773] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 217.379702] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo [ 225.070588] Lustre: lustre-OST0000-osc- (ptrval): disconnect after 21s idle [ 228.360951] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b | sort -u | tr '\n' ' ' [ 228.372621] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b | sort -u | tr '\n' ' ' [ 250.803910] random: crng init done [ 260.423743] LustreError: 10730:0:(obd_config.c:1264:class_process_config()) no device for: lustre-clilov-00000000b93a97e7 [ 260.425715] Lustre: 10730:0:(obd_config.c:1351:class_process_config()) Ignoring error -22 on optional command 0xce022 [ 270.916901] LustreError: 10742:0:(obd_config.c:1264:class_process_config()) no device for: lustre-clilov-00000000b93a97e7 [ 270.918884] LustreError: 10742:0:(obd_config.c:1264:class_process_config()) Skipped 1 previous similar message [ 270.920584] Lustre: 10742:0:(obd_config.c:1351:class_process_config()) Ignoring error -22 on optional command 0xce022 [ 270.922396] Lustre: 10742:0:(obd_config.c:1351:class_process_config()) Skipped 1 previous similar message [ 285.248874] LustreError: 10765:0:(obd_config.c:1264:class_process_config()) no device for: lustre-clilov-00000000b93a97e7 [ 285.250873] Lustre: 10765:0:(obd_config.c:1351:class_process_config()) Ignoring error -22 on optional command 0xce022 [ 290.469224] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_0b 2>/dev/null || echo foo |
| Comments |
| Comment by James A Simmons [ 24/Jun/19 ] |
|
The work around https://review.whamcloud.com/33900 stopped these failures. Proper long term fix is being done under |