Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • None
    • 1
    • 0.1
    • 3
    • 2220

    Attachments

      Issue Links

        Activity

          [LU-1926] Reboots during test runs
          yujian Jian Yu added a comment -

          After running those failed tests manually, it turned out to be Lustre failures:

          Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/17

          parallel-scale-nfsv4 test_compilebench: LU-1928
          parallel-scale-nfsv3 test_compilebench: LU-1928
          parallel-scale test_compilebench: LU-1925
          large-scale test_3a: LU-1927
          performance-saniy test_3: LU-1906, LU-1909, LU-1929
          sanity test_32n: LU-1863
          Console log and syslog missing issue: TT-875

          yujian Jian Yu added a comment - After running those failed tests manually, it turned out to be Lustre failures: Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/17 parallel-scale-nfsv4 test_compilebench: LU-1928 parallel-scale-nfsv3 test_compilebench: LU-1928 parallel-scale test_compilebench: LU-1925 large-scale test_3a: LU-1927 performance-saniy test_3: LU-1906 , LU-1909 , LU-1929 sanity test_32n: LU-1863 Console log and syslog missing issue: TT-875
          yujian Jian Yu added a comment -

          Chris, I'm still creating new LU tickets per the manual test runs. After all of the tests are done and the corresponding LU tickets are created, we can close this one.

          The remaining issue for TT is that the Oops on consoles were not gathered, and syslogs on Maloo were empty.

          yujian Jian Yu added a comment - Chris, I'm still creating new LU tickets per the manual test runs. After all of the tests are done and the corresponding LU tickets are created, we can close this one. The remaining issue for TT is that the Oops on consoles were not gathered, and syslogs on Maloo were empty.

          Just assigning this to you for clarity.

          chris Chris Gearing (Inactive) added a comment - Just assigning this to you for clarity.

          From Skype

          [13/09/2012 08:21:29] Yu Jian

          aha, parallel-scale test compilebench hung in manual run, the MDS is being rebooted
          more logs occurr
          ok, Chris, TT-851 is not a test environment issue, it's a Lustre issue, I'll gather logs, file new ticket and update TT-851
          thank you, Chris

          chris Chris Gearing (Inactive) added a comment - From Skype [13/09/2012 08:21:29] Yu Jian aha, parallel-scale test compilebench hung in manual run, the MDS is being rebooted more logs occurr ok, Chris, TT-851 is not a test environment issue, it's a Lustre issue, I'll gather logs, file new ticket and update TT-851 thank you, Chris
          yujian Jian Yu added a comment -

          Oleg, Bobi, do you have any ideas about whether the above MDS reboot issue is a Lustre issue or test environment issue? Thanks.
          This is really blocking the Lustre b2_3 testing now.

          yujian Jian Yu added a comment - Oleg, Bobi, do you have any ideas about whether the above MDS reboot issue is a Lustre issue or test environment issue? Thanks. This is really blocking the Lustre b2_3 testing now.
          yujian Jian Yu added a comment -

          Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/17

          The issue still exists:
          parallel-scale-nfsv4 test_compilebench: https://maloo.whamcloud.com/test_sets/f2b8c2b8-fc85-11e1-a4a6-52540035b04c
          parallel-scale-nfsv3 test_compilebench: https://maloo.whamcloud.com/test_sets/d241d4ca-fc85-11e1-a4a6-52540035b04c
          parallel-scale test_compilebench: https://maloo.whamcloud.com/test_sets/3b4a8f4e-fc85-11e1-a4a6-52540035b04c
          large-scale test_3a: https://maloo.whamcloud.com/test_sets/733bf24e-fc85-11e1-a4a6-52540035b04c

          BTW, all of the syslogs in the above reports are empty. I checked the syslogs from brent node but still found nothing useful for debugging.

          However, comparing to the results on b2_3 build #16, although performance-saniy test_3 and sanity test_32n also hit MDS reboot issue, there are error messages on the MDS console logs on build #17 (no such messages on build #16), please refer to LU-1906, LU-1909 and LU-1863.

          So, I'm not sure whether the above parallel-scale* and large-scale failures were caused by Lustre issues or not although there were no specific error messages on their logs.

          yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/17 The issue still exists: parallel-scale-nfsv4 test_compilebench: https://maloo.whamcloud.com/test_sets/f2b8c2b8-fc85-11e1-a4a6-52540035b04c parallel-scale-nfsv3 test_compilebench: https://maloo.whamcloud.com/test_sets/d241d4ca-fc85-11e1-a4a6-52540035b04c parallel-scale test_compilebench: https://maloo.whamcloud.com/test_sets/3b4a8f4e-fc85-11e1-a4a6-52540035b04c large-scale test_3a: https://maloo.whamcloud.com/test_sets/733bf24e-fc85-11e1-a4a6-52540035b04c BTW, all of the syslogs in the above reports are empty. I checked the syslogs from brent node but still found nothing useful for debugging. However, comparing to the results on b2_3 build #16, although performance-saniy test_3 and sanity test_32n also hit MDS reboot issue, there are error messages on the MDS console logs on build #17 (no such messages on build #16), please refer to LU-1906 , LU-1909 and LU-1863 . So, I'm not sure whether the above parallel-scale* and large-scale failures were caused by Lustre issues or not although there were no specific error messages on their logs.
          yujian Jian Yu added a comment -

          BTW, autotest did not gather syslogs for the above reports, I had to find out the syslogs on brent:/scratch/logs/syslog.

          yujian Jian Yu added a comment - BTW, autotest did not gather syslogs for the above reports, I had to find out the syslogs on brent:/scratch/logs/syslog.

          People

            yujian Jian Yu
            pjones Peter Jones
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: