Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17827

sanity test_116a: Failed with "test_116a returned 1"

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0, Lustre 2.15.6
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/daa7243f-f748-4b63-ad5b-5423e68f04b7

      test_116a failed with the following error:

      test_116a returned 1
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/104641 - 4.18.0-513.5.1.el8_9.x86_64/RHEL 8.9/2.15.63.1
      servers: https://build.whamcloud.com/job/lustre-reviews/104641 - 4.18.0-513.18.1.el8_lustre.x86_64/RHEL 8.9/2.15.63.1/ldiskfs

      Client

      == sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 (1715094700)
      Free space priority CMD: onyx-64vm8 lctl get_param -n lo[vd].*-mdtlov.qos_prio_free
      90%
      CMD: onyx-64vm8 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      Delete is not completed in 28 seconds
      CMD: onyx-64vm8 /usr/sbin/lctl get_param osc.*MDT*.sync_*
      osc.lustre-OST0000-osc-MDT0000.sync_changes=0
      osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0000-osc-MDT0000.sync_in_progress=0
      osc.lustre-OST0001-osc-MDT0000.sync_changes=0
      osc.lustre-OST0001-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0001-osc-MDT0000.sync_in_progress=1
      osc.lustre-OST0002-osc-MDT0000.sync_changes=0
      osc.lustre-OST0002-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0002-osc-MDT0000.sync_in_progress=1
      osc.lustre-OST0003-osc-MDT0000.sync_changes=0
      osc.lustre-OST0003-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0003-osc-MDT0000.sync_in_progress=1
      osc.lustre-OST0004-osc-MDT0000.sync_changes=0
      osc.lustre-OST0004-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0004-osc-MDT0000.sync_in_progress=0
      osc.lustre-OST0005-osc-MDT0000.sync_changes=0
      osc.lustre-OST0005-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0005-osc-MDT0000.sync_in_progress=0
      osc.lustre-OST0006-osc-MDT0000.sync_changes=0
      osc.lustre-OST0006-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0006-osc-MDT0000.sync_in_progress=0
      OST kbytes available: 7252180 7255884 7255880 7255880 7255492 7255864 7255884
      Min free space: OST 0: 7252180
      Max free space: OST 1: 7255884
      CMD: onyx-64vm8 lctl get_param -n *.*MDT0000-mdtlov.qos_threshold_rr
      Check for uneven OSTs: diff=3704KB (0%) must be > 17% ...no
      Fill 19% remaining space in OST0 with 1377914KB
      <snip>...
      CMD: onyx-64vm8 lctl get_param -n lod.*.qos_maxage
      Waiting for MDT destroys to complete
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osp.*.destroys_in_flight
      OST kbytes available: 5875924 7255884 7255880 7255880 7255492 7255864 7255884
      Min free space: OST 0: 5875924
      Max free space: OST 1: 7255884
      diff=1379960=23% must be > 17% for QOS mode...ok
      writing 600 files to QOS-assigned OSTs
      <snip>.......................................................................................................wrote 600 200k files
      CMD: onyx-64vm8 lctl get_param -n lod.*.qos_maxage
      Note: free space may not be updated, so measurements might be off
      Waiting for MDT destroys to complete
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osp.*.destroys_in_flight
      OST kbytes available: 5837668 7215932 7220592 7223136 7218932 7211672 7215932
      Min free space: OST 0: 5837668
      Max free space: OST 3: 7223136
      free space delta: orig 1379960 final 1385468
      delta got worse!
      Wrote 38256KB to smaller OST 0
      Wrote 39952KB to larger OST 1
      Wrote 4% more data to larger OST 1
      lustre-OST0000_UUID
      86 files created on smaller OST 0
      lustre-OST0001_UUID
      90 files created on larger OST 1
      Wrote 4% more files to larger OST 1
      CMD: onyx-64vm8 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      CMD: onyx-64vm8 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      Delete is not completed in 32 seconds
      CMD: onyx-64vm8 /usr/sbin/lctl get_param osc.*MDT*.sync_*
      osc.lustre-OST0000-osc-MDT0000.sync_changes=0
      osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0000-osc-MDT0000.sync_in_progress=758
      osc.lustre-OST0001-osc-MDT0000.sync_changes=0
      osc.lustre-OST0001-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0001-osc-MDT0000.sync_in_progress=90
      osc.lustre-OST0002-osc-MDT0000.sync_changes=0
      osc.lustre-OST0002-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0002-osc-MDT0000.sync_in_progress=79
      osc.lustre-OST0003-osc-MDT0000.sync_changes=0
      osc.lustre-OST0003-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0003-osc-MDT0000.sync_in_progress=73
      osc.lustre-OST0004-osc-MDT0000.sync_changes=0
      osc.lustre-OST0004-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0004-osc-MDT0000.sync_in_progress=82
      osc.lustre-OST0005-osc-MDT0000.sync_changes=0
      osc.lustre-OST0005-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0005-osc-MDT0000.sync_in_progress=100
      osc.lustre-OST0006-osc-MDT0000.sync_changes=0
      osc.lustre-OST0006-osc-MDT0000.sync_in_flight=0
      osc.lustre-OST0006-osc-MDT0000.sync_in_progress=90
      test_116a returned 1
      FAIL 116a (1245s)

       

      Server (OST)

      [11369.034895] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 \(1715094700\)^M
      [11369.268468] Lustre: DEBUG MARKER: == sanity test 116a: stripe QOS: free space balance ============================================================================= 15:11:40 (1715094700)^M
      [11440.942725] Lustre: ll_ost_io00_010: service thread pid 205919 was inactive for 42.277 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:^M
      [11440.942726] Pid: 175291, comm: ll_ost_io00_002 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M
      [11440.942743] Call Trace TBD:^M
      [11440.942916] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M
      [11440.946268] Lustre: Skipped 1 previous similar message^M
      [11440.948180] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M
      [11440.951734] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M
      [11440.952941] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M
      [11440.954093] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M
      [11440.955169] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M
      [11440.956206] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M
      [11440.957116] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M
      [11440.958573] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M
      [11440.959630] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M
      [11440.960803] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M
      [11440.962054] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M
      [11440.963063] [<0>] kthread+0x134/0x150^M
      [11440.963830] [<0>] ret_from_fork+0x35/0x40^M
      [11440.964638] Pid: 205919, comm: ll_ost_io00_010 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M
      [11440.966517] Call Trace TBD:^M
      [11440.967109] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M
      [11440.968124] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M
      [11440.969111] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M
      [11440.970200] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M
      [11440.971257] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M
      [11440.972285] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M
      [11440.973296] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M
      [11440.974197] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M
      [11440.975239] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M
      [11440.976294] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M
      [11440.977422] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M
      [11440.978671] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M
      [11440.979673] [<0>] kthread+0x134/0x150^M
      [11440.980390] [<0>] ret_from_fork+0x35/0x40^M
      [11440.981188] Pid: 342443, comm: ll_ost_io00_018 4.18.0-513.18.1.el8_lustre.x86_64 #1 SMP Thu May 2 05:23:37 UTC 2024^M
      [11440.983091] Call Trace TBD:^M
      [11440.983668] [<0>] jbd2_log_wait_commit+0xac/0x120 [jbd2]^M
      [11440.984686] [<0>] jbd2_journal_stop+0x2d6/0x330 [jbd2]^M
      [11440.985661] [<0>] __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]^M
      [11440.986770] [<0>] osd_trans_stop+0x357/0xb10 [osd_ldiskfs]^M
      [11440.987836] [<0>] ofd_write_attr_set+0x334/0x1080 [ofd]^M
      [11440.988850] [<0>] ofd_commitrw_write+0x206/0x1ac0 [ofd]^M
      [11440.989867] [<0>] ofd_commitrw+0x5d4/0xd60 [ofd]^M
      [11440.990777] [<0>] obd_commitrw+0x1b6/0x370 [ptlrpc]^M
      [11440.991824] [<0>] tgt_brw_write+0xf00/0x1f30 [ptlrpc]^M
      [11440.992883] [<0>] tgt_request_handle+0x3f4/0x1a20 [ptlrpc]^M
      [11440.994019] [<0>] ptlrpc_server_handle_request+0x3ca/0xbf0 [ptlrpc]^M
      [11440.995272] [<0>] ptlrpc_main+0xc96/0x15c0 [ptlrpc]^M
      [11440.996290] [<0>] kthread+0x134/0x150^M
      [11440.997053] [<0>] ret_from_fork+0x35/0x40^M
      [11440.997858] Lustre: ll_ost_io00_023: service thread pid 342448 was inactive for 42.344 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.^M
      [11605.070033] Autotest: Test running for 190 minutes (lustre-reviews_review-ldiskfs_104641.1)^M
      [12128.200813] Autotest: Test running for 195 minutes (lustre-reviews_review-ldiskfs_104641.1)^M
      [12213.635236] Autotest: Test running for 200 minutes (lustre-reviews_review-ldiskfs_104641.1)^M
      [12536.312751] Autotest: Test running for 205 minutes (lustre-reviews_review-ldiskfs_104641.1)^M
      [12615.028468] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1 || true^M
      [12618.979347] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug_raw_pointers=0 || true^M
      [12621.391252] Lustre: DEBUG MARKER: sysctl --values kernel/kptr_restrict^M
      [12621.722753] Lustre: DEBUG MARKER: sysctl -wq kernel/kptr_restrict=1^M
      [12622.048370] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug_raw_pointers^M
      [12622.371640] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug_raw_pointers=Y || true^M

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_116a - test_116a returned 1

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: