Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14371

replay-single test 65b fails with 'No early reply'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • (So far only) DNE
    • 3
    • 9223372036854775807

    Description

      replay-single test_65b fails with 'No early reply'. We see this test fail frequently for PPC64 client testing. If we ignore the PPC failures, we see this test fail for DNE configurations from 16 FEB 2020 with https://testing.whamcloud.com/test_sets/9766a818-51c7-11ea-a90e-52540065bddc. In all of these cases, replay-single tests 0c and 0d fail with 'mount fails' and then a test after 65b will hang.

      Looking at the test_suite for a recent failure at https://testing.whamcloud.com/test_sets/2efe802a-db0c-4424-af23-e1a7e8ce99a4, we see the test output

      == replay-single test 65b: AT: verify early replies on packed reply / bulk =========================== 02:08:17 (1611540497)
      CMD: trevis-63vm4 lctl get_param -n at_max
      CMD: trevis-63vm1.trevis.whamcloud.com lctl get_param -n at_max
      CMD: trevis-63vm3 lctl get_param -n at_max
      CMD: trevis-63vm4 lctl get_param -n at_max
      CMD: trevis-63vm1.trevis.whamcloud.com lctl get_param -n at_max
      CMD: trevis-63vm3 lctl get_param -n at_max
      CMD: trevis-63vm4 lctl get_param -n at_history
      CMD: trevis-63vm4 lctl set_param at_history=8
      at_history=8
      CMD: trevis-63vm3 lctl set_param at_history=8
      at_history=8
      CMD: trevis-63vm4 /usr/sbin/lctl get_param -n debug
      debug=other trace
      CMD: trevis-63vm3 lctl set_param fail_val=6
      fail_val=6
      CMD: trevis-63vm3 /usr/sbin/lctl set_param fail_loc=0x224
      fail_loc=0x224
      CMD: trevis-63vm3 /usr/sbin/lctl set_param fail_loc=0
      fail_loc=0
       replay-single test_65b: @@@@@@ FAIL: No early reply 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/replay-single.sh:1864:test_65b()
      

      There are no errors in the console/dmesg logs.

      This issue maybe be related to LU-9566 replay-single test_65a: @@@@@@ FAIL: No early reply

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/6b7431ba-6424-408d-a379-e5fb422df642
      https://testing.whamcloud.com/test_sets/fee3e68e-26e9-4a53-a83e-7d8ea9ad4bd1
      https://testing.whamcloud.com/test_sets/d097765d-c4f1-4070-9c42-f1a1b1f7b258

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: