Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0, Lustre 2.15.6
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/daa7243f-f748-4b63-ad5b-5423e68f04b7
test_116a failed with the following error:
test_116a returned 1
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/104641 - 4.18.0-513.5.1.el8_9.x86_64/RHEL 8.9/2.15.63.1
servers: https://build.whamcloud.com/job/lustre-reviews/104641 - 4.18.0-513.18.1.el8_lustre.x86_64/RHEL 8.9/2.15.63.1/ldiskfs
Client
== sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 (1715094700) Free space priority CMD: onyx-64vm8 lctl get_param -n lo[vd].*-mdtlov.qos_prio_free 90% CMD: onyx-64vm8 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Delete is not completed in 28 seconds CMD: onyx-64vm8 /usr/sbin/lctl get_param osc.*MDT*.sync_* osc.lustre-OST0000-osc-MDT0000.sync_changes=0 osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0000-osc-MDT0000.sync_in_progress=0 osc.lustre-OST0001-osc-MDT0000.sync_changes=0 osc.lustre-OST0001-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0001-osc-MDT0000.sync_in_progress=1 osc.lustre-OST0002-osc-MDT0000.sync_changes=0 osc.lustre-OST0002-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0002-osc-MDT0000.sync_in_progress=1 osc.lustre-OST0003-osc-MDT0000.sync_changes=0 osc.lustre-OST0003-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0003-osc-MDT0000.sync_in_progress=1 osc.lustre-OST0004-osc-MDT0000.sync_changes=0 osc.lustre-OST0004-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0004-osc-MDT0000.sync_in_progress=0 osc.lustre-OST0005-osc-MDT0000.sync_changes=0 osc.lustre-OST0005-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0005-osc-MDT0000.sync_in_progress=0 osc.lustre-OST0006-osc-MDT0000.sync_changes=0 osc.lustre-OST0006-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0006-osc-MDT0000.sync_in_progress=0 OST kbytes available: 7252180 7255884 7255880 7255880 7255492 7255864 7255884 Min free space: OST 0: 7252180 Max free space: OST 1: 7255884 CMD: onyx-64vm8 lctl get_param -n *.*MDT0000-mdtlov.qos_threshold_rr Check for uneven OSTs: diff=3704KB (0%) must be > 17% ...no Fill 19% remaining space in OST0 with 1377914KB <snip>... CMD: onyx-64vm8 lctl get_param -n lod.*.qos_maxage Waiting for MDT destroys to complete CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osp.*.destroys_in_flight OST kbytes available: 5875924 7255884 7255880 7255880 7255492 7255864 7255884 Min free space: OST 0: 5875924 Max free space: OST 1: 7255884 diff=1379960=23% must be > 17% for QOS mode...ok writing 600 files to QOS-assigned OSTs <snip>.......................................................................................................wrote 600 200k files CMD: onyx-64vm8 lctl get_param -n lod.*.qos_maxage Note: free space may not be updated, so measurements might be off Waiting for MDT destroys to complete CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osp.*.destroys_in_flight OST kbytes available: 5837668 7215932 7220592 7223136 7218932 7211672 7215932 Min free space: OST 0: 5837668 Max free space: OST 3: 7223136 free space delta: orig 1379960 final 1385468 delta got worse! Wrote 38256KB to smaller OST 0 Wrote 39952KB to larger OST 1 Wrote 4% more data to larger OST 1 lustre-OST0000_UUID 86 files created on smaller OST 0 lustre-OST0001_UUID 90 files created on larger OST 1 Wrote 4% more files to larger OST 1 CMD: onyx-64vm8 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Delete is not completed in 32 seconds CMD: onyx-64vm8 /usr/sbin/lctl get_param osc.*MDT*.sync_* osc.lustre-OST0000-osc-MDT0000.sync_changes=0 osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0000-osc-MDT0000.sync_in_progress=758 osc.lustre-OST0001-osc-MDT0000.sync_changes=0 osc.lustre-OST0001-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0001-osc-MDT0000.sync_in_progress=90 osc.lustre-OST0002-osc-MDT0000.sync_changes=0 osc.lustre-OST0002-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0002-osc-MDT0000.sync_in_progress=79 osc.lustre-OST0003-osc-MDT0000.sync_changes=0 osc.lustre-OST0003-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0003-osc-MDT0000.sync_in_progress=73 osc.lustre-OST0004-osc-MDT0000.sync_changes=0 osc.lustre-OST0004-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0004-osc-MDT0000.sync_in_progress=82 osc.lustre-OST0005-osc-MDT0000.sync_changes=0 osc.lustre-OST0005-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0005-osc-MDT0000.sync_in_progress=100 osc.lustre-OST0006-osc-MDT0000.sync_changes=0 osc.lustre-OST0006-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0006-osc-MDT0000.sync_in_progress=90 test_116a returned 1 FAIL 116a (1245s)
Server (OST)
[11369.034895] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 \(1715094700\)^M [11369.268468] Lustre: DEBUG MARKER: == sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 (1715094700)^M [11440.942725] Lustre: ll_ost_io00_010: service thread pid 205919 was inactive for 42.277 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:^M [11440.942726] Pid: 175291, comm: ll_ost_io00_002 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M [11440.942743] Call Trace TBD:^M [11440.942916] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M [11440.946268] Lustre: Skipped 1 previous similar message^M [11440.948180] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M [11440.951734] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M [11440.952941] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M [11440.954093] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M [11440.955169] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M [11440.956206] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M [11440.957116] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M [11440.958573] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M [11440.959630] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M [11440.960803] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M [11440.962054] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M [11440.963063] [<0>] kthread+0x134/0x150^M [11440.963830] [<0>] ret_from_fork+0x35/0x40^M [11440.964638] Pid: 205919, comm: ll_ost_io00_010 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M [11440.966517] Call Trace TBD:^M [11440.967109] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M [11440.968124] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M [11440.969111] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M [11440.970200] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M [11440.971257] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M [11440.972285] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M [11440.973296] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M [11440.974197] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M [11440.975239] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M [11440.976294] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M [11440.977422] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M [11440.978671] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M [11440.979673] [<0>] kthread+0x134/0x150^M [11440.980390] [<0>] ret_from_fork+0x35/0x40^M [11440.981188] Pid: 342443, comm: ll_ost_io00_018 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M [11440.983091] Call Trace TBD:^M [11440.983668] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M [11440.984686] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M [11440.985661] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M [11440.986770] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M [11440.987836] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M [11440.988850] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M [11440.989867] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M [11440.990777] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M [11440.991824] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M [11440.992883] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M [11440.994019] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M [11440.995272] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M [11440.996290] [<0>] kthread+0x134/0x150^M [11440.997053] [<0>] ret_from_fork+0x35/0x40^M [11440.997858] Lustre: ll_ost_io00_023: service thread pid 342448 was inactive for 42.344 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.^M [11605.070033] Autotest: Test running for 190 minutes (lustre-reviews_review-ldiskfs_104641.1)^M [12128.200813] Autotest: Test running for 195 minutes (lustre-reviews_review-ldiskfs_104641.1)^M [12213.635236] Autotest: Test running for 200 minutes (lustre-reviews_review-ldiskfs_104641.1)^M [12536.312751] Autotest: Test running for 205 minutes (lustre-reviews_review-ldiskfs_104641.1)^M [12615.028468] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1 || true^M [12618.979347] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug_raw_pointers=0 || true^M [12621.391252] Lustre: DEBUG MARKER: sysctl --values kernel/kptr_restrict^M [12621.722753] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1^M [12622.048370] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug_raw_pointers^M [12622.371640] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug_raw_pointers=Y || true^M
<<Please provide additional information about the failure here>>
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_116a - test_116a returned 1
Attachments
Issue Links
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...