Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-480

Test failure on test suite sanityn: LBUG/LASSERT detected

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • 3
    • 6590

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8771913e-a451-11e0-b786-52540025f9af.

      The sub-test test_36 failed with the following error:

      LBUG/LASSERT detected

      Attachments

        Activity

          [LU-480] Test failure on test suite sanityn: LBUG/LASSERT detected
          pjones Peter Jones added a comment -

          Closing for now. Reopen if this reoccurs with the change in place

          pjones Peter Jones added a comment - Closing for now. Reopen if this reoccurs with the change in place

          OK, I've change the pdsh in mecturk, let's see if it stops failing.

          chris Chris Gearing (Inactive) added a comment - OK, I've change the pdsh in mecturk, let's see if it stops failing.
          yujian Jian Yu added a comment -

          Per this provisioning log: https://maloo.whamcloud.com/test_logs/bcb00110-a450-11e0-b786-52540025f9af/show_text

          12:00:25:(/home/autotest/lab/tools/run_test.sh:55): 
          12:00:25:cat /home/autotest/lab/tools/mecturk.sh
          12:00:25:((/home/autotest/lab/tools/run_test.sh:57): 
          12:00:25:hostname -s
          12:00:25:(/home/autotest/lab/tools/run_test.sh:57): 
          12:00:25:CPYLINE='scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/'
          12:00:27:(/home/autotest/lab/tools/run_test.sh:58): 
          12:00:27:pdsh -w client-21vm1,client-21vm2 scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/
          

          The PDSH variable was set in /home/autotest/lab/tools/mecturk.sh:

          $ grep PDSH /home/autotest/lab/tools/mecturk.sh 
          PDSH="pdsh -S -Rrsh -w"
          

          Chris, could you please change it to "pdsh -t 120 -S -Rrsh -w" to see whether the upcoming autotest results hit those timeout issues or not?

          yujian Jian Yu added a comment - Per this provisioning log: https://maloo.whamcloud.com/test_logs/bcb00110-a450-11e0-b786-52540025f9af/show_text 12:00:25:(/home/autotest/lab/tools/run_test.sh:55): 12:00:25:cat /home/autotest/lab/tools/mecturk.sh 12:00:25:((/home/autotest/lab/tools/run_test.sh:57): 12:00:25:hostname -s 12:00:25:(/home/autotest/lab/tools/run_test.sh:57): 12:00:25:CPYLINE='scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/' 12:00:27:(/home/autotest/lab/tools/run_test.sh:58): 12:00:27:pdsh -w client-21vm1,client-21vm2 scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/ The PDSH variable was set in /home/autotest/lab/tools/mecturk.sh: $ grep PDSH /home/autotest/lab/tools/mecturk.sh PDSH="pdsh -S -Rrsh -w" Chris, could you please change it to "pdsh -t 120 -S -Rrsh -w" to see whether the upcoming autotest results hit those timeout issues or not?
          bobijam Zhenyu Xu added a comment -

          those error was due to pdsh command times out, and pdsh IMO is set when we call auster.sh.

          Yu Jian points out that there is a autotest_config.sh or similar script which she saw in

          Provisioning-1.Provision_1.autotest.client-21vm1.log:
          pdsh -w client-21vm1,client-21vm2 scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/

          I suggest we add "-t 120" option for pdsh, like "pdsh -w -t 120 xxxxxxxxxxxxxxxx" in it. pdsh default use 10 seconds as timeout, -t 120 can extend it to 120 seconds.

          bobijam Zhenyu Xu added a comment - those error was due to pdsh command times out, and pdsh IMO is set when we call auster.sh. Yu Jian points out that there is a autotest_config.sh or similar script which she saw in Provisioning-1.Provision_1.autotest.client-21vm1.log: pdsh -w client-21vm1,client-21vm2 scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/ I suggest we add "-t 120" option for pdsh, like "pdsh -w -t 120 xxxxxxxxxxxxxxxx" in it. pdsh default use 10 seconds as timeout, -t 120 can extend it to 120 seconds.

          You'll have to give me a bit more info to enable me to understand that
          question. Which pdsh command when? If you asking about the pdsh command
          used for test 41g then that will be the whatever the standard scripts
          create. The test environment will not change or customize it?

          What other output could help you?

          chris Chris Gearing (Inactive) added a comment - You'll have to give me a bit more info to enable me to understand that question. Which pdsh command when? If you asking about the pdsh command used for test 41g then that will be the whatever the standard scripts create. The test environment will not change or customize it? What other output could help you?
          bobijam Zhenyu Xu added a comment -

          Chris,

          What is the pdsh full command being used in the autotest?

          bobijam Zhenyu Xu added a comment - Chris, What is the pdsh full command being used in the autotest?
          yujian Jian Yu added a comment - The issue occurred on many sanityn sub-tests: https://maloo.whamcloud.com/test_sets/419609a4-a4fd-11e0-b786-52540025f9af https://maloo.whamcloud.com/test_sets/bda31ee8-a204-11e0-aee5-52540025f9af https://maloo.whamcloud.com/test_sets/3791c1d0-a1df-11e0-aee5-52540025f9af

          People

            bobijam Zhenyu Xu
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: