Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4724

Test failure on test suite sanity-hsm, subtest test_71

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.6.0, Lustre 2.5.1, Lustre 2.12.3
    • 3
    • 12983

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/e688c5ae-a550-11e3-9e53-52540035b04c.

      The sub-test test_71 failed with the following error:

      Copytool sent malformed event: {"event_time": "2014-03-06 03:11:53 -0800", "event_type": "LOGGED_MESSAGE", "level": "INFO", "message": "lhsmtool_posix[8611]: waiting for message from kernel"

      Unknown macro: {"event_time"}

      Info required for matching: sanity-hsm 71

      Attachments

        Issue Links

          Activity

            [LU-4724] Test failure on test suite sanity-hsm, subtest test_71
            pjones Peter Jones added a comment -

            Landed for 2.5.1 and 2.6

            pjones Peter Jones added a comment - Landed for 2.5.1 and 2.6

            I see that Michael already did a back port. Will abandon mine. Sorry for the confusion.

            bogl Bob Glossman (Inactive) added a comment - I see that Michael already did a back port. Will abandon mine. Sorry for the confusion.
            bogl Bob Glossman (Inactive) added a comment - backport to b2_5: http://review.whamcloud.com/9579
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/40/ (2.5.1 RC2)
            Distro/Arch: RHEL6.5/x86_64

            sanity-hsm test 71 hit the same failure:
            https://maloo.whamcloud.com/test_sets/f3e026c4-a687-11e3-9d0d-52540035b04c

            This is a regression failure on Lustre 2.5.1 RC2 introduced by the patch of http://review.whamcloud.com/9512 for LU-4020.

            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/40/ (2.5.1 RC2) Distro/Arch: RHEL6.5/x86_64 sanity-hsm test 71 hit the same failure: https://maloo.whamcloud.com/test_sets/f3e026c4-a687-11e3-9d0d-52540035b04c This is a regression failure on Lustre 2.5.1 RC2 introduced by the patch of http://review.whamcloud.com/9512 for LU-4020 .

            jhammond: I've added you as a reviewer. Please take a look when you have a chance.

            mjmac Michael MacDonald (Inactive) added a comment - jhammond : I've added you as a reviewer. Please take a look when you have a chance.

            Pushed http://review.whamcloud.com/9553 for review. Soaked this locally for 64 runs, zero failures. Without it, was getting failures about 33% of the time. Lesson learned.

            mjmac Michael MacDonald (Inactive) added a comment - Pushed http://review.whamcloud.com/9553 for review. Soaked this locally for 64 runs, zero failures. Without it, was getting failures about 33% of the time. Lesson learned.

            I've been able to reproduce this locally by running test_71 in a loop. Will implement a fix and soak it for a while this morning before pushing for review.

            mjmac Michael MacDonald (Inactive) added a comment - I've been able to reproduce this locally by running test_71 in a loop. Will implement a fix and soak it for a while this morning before pushing for review.

            Looking into this.

            mjmac Michael MacDonald (Inactive) added a comment - Looking into this.
            jhammond John Hammond added a comment - - edited

            Writing to the event FIFO uses fprintf() with an unbuffered stream and there is no synchronization around calls to llapi_hsm_write_json_event() that I can see. So since the CT is multithreaded expect more such malformed events.

            jhammond John Hammond added a comment - - edited Writing to the event FIFO uses fprintf() with an unbuffered stream and there is no synchronization around calls to llapi_hsm_write_json_event() that I can see. So since the CT is multithreaded expect more such malformed events.

            People

              mjmac Michael MacDonald (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: