Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3749

Failure on test suite replay-dual test_8: test_8 failed with 2

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • Lustre 2.5.0
    • None
    • server and client: tag-2.4.90 RHEL6
    • 3
    • 9672

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/fd1709ea-02c9-11e3-b384-52540035b04c.

      The sub-test test_8 failed with the following error:

      test_8 failed with 2

      MDS console:

      22:33:28:Lustre: DEBUG MARKER: == replay-dual test 8: replay of resent request == 22:33:24 (1376199204)
      22:33:28:Lustre: DEBUG MARKER: sync; sync; sync
      22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
      22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
      22:33:28:LustreError: 14713:0:(osd_handler.c:1194:osd_ro()) *** setting lustre-MDT0000 read-only ***
      22:33:28:LustreError: 14713:0:(osd_handler.c:1194:osd_ro()) Skipped 3 previous similar messages
      22:33:28:Turning device dm-0 (0xfd00000) read-only
      22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      22:33:28:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      22:33:28:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x119
      22:33:29:Lustre: *** cfs_fail_loc=119, val=2147483648***
      22:33:29:LustreError: 14098:0:(ldlm_lib.c:2409:target_send_reply_msg()) @@@ dropping reply  req@ffff88005c311c00 x1443048921768268/t498216206346(0) o36->c0662422-4c9b-ce8c-77e9-eb8caf0a37b9@10.10.4.206@tcp:0/0 lens 496/448 e 0 to 0 dl 1376199231 ref 1 fl Interpret:/0/0 rc 0/0
      

      Attachments

        Issue Links

          Activity

            [LU-3749] Failure on test suite replay-dual test_8: test_8 failed with 2
            pjones Peter Jones added a comment -

            Landed for 2.5.0

            pjones Peter Jones added a comment - Landed for 2.5.0

            http://review.whamcloud.com/7786 - patch to fix that problem

            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/7786 - patch to fix that problem

            yes, I am looking at it

            tappro Mikhail Pershin added a comment - yes, I am looking at it

            Mike, any chance to look at this?

            adilger Andreas Dilger added a comment - Mike, any chance to look at this?

            Mike,
            Would you be able to comment on this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Mike, Would you be able to comment on this one? Thank you!
            green Oleg Drokin added a comment -

            Keith, it's a file data version mismatch.

            green Oleg Drokin added a comment - Keith, it's a file data version mismatch.

            The client says this:

            22:34:35:LustreError: 166-1: MGC10.10.4.208@tcp: Connection to MGS (at 10.10.4.208@tcp) was lost; in progress operations using this service will fail
            22:34:35:Lustre: Evicted from MGS (at 10.10.4.208@tcp) after server handle changed from 0x2a56bfb73f955d8a to 0x2a56bfb73f95659b
            22:35:07:Lustre: 19509:0:(client.c:2652:ptlrpc_replay_interpret()) @@@ Version mismatch during replay
            22:35:07:  req@ffff88007c91a800 x1443048921768268/t498216206346(498216206346) o36->lustre-MDT0000-mdc-ffff880062d92800@10.10.4.208@tcp:12/10 lens 496/416 e 0 to 0 dl 1376199343 ref 2 fl Interpret:R/4/0 rc -75/-75
            22:35:07:Lustre: lustre-MDT0000-mdc-ffff880065203400: Connection restored to lustre-MDT0000 (at 10.10.4.208@tcp)
            22:35:07:Lustre: Skipped 6 previous similar messages
            22:35:48:Lustre: 19509:0:(import.c:1209:completed_replay_interpret()) lustre-MDT0000-mdc-ffff880062d92800: version recovery fails, reconnecting
            
            

            I wonder why we get a "Version mismatch during replay" Client and server appear to be the same version.

            keith Keith Mannthey (Inactive) added a comment - The client says this: 22:34:35:LustreError: 166-1: MGC10.10.4.208@tcp: Connection to MGS (at 10.10.4.208@tcp) was lost; in progress operations using this service will fail 22:34:35:Lustre: Evicted from MGS (at 10.10.4.208@tcp) after server handle changed from 0x2a56bfb73f955d8a to 0x2a56bfb73f95659b 22:35:07:Lustre: 19509:0:(client.c:2652:ptlrpc_replay_interpret()) @@@ Version mismatch during replay 22:35:07: req@ffff88007c91a800 x1443048921768268/t498216206346(498216206346) o36->lustre-MDT0000-mdc-ffff880062d92800@10.10.4.208@tcp:12/10 lens 496/416 e 0 to 0 dl 1376199343 ref 2 fl Interpret:R/4/0 rc -75/-75 22:35:07:Lustre: lustre-MDT0000-mdc-ffff880065203400: Connection restored to lustre-MDT0000 (at 10.10.4.208@tcp) 22:35:07:Lustre: Skipped 6 previous similar messages 22:35:48:Lustre: 19509:0:(import.c:1209:completed_replay_interpret()) lustre-MDT0000-mdc-ffff880062d92800: version recovery fails, reconnecting I wonder why we get a "Version mismatch during replay" Client and server appear to be the same version.
            sarah Sarah Liu added a comment - SLES11 SP2 client also hit this issue: https://maloo.whamcloud.com/test_sets/85b954d8-029d-11e3-b384-52540035b04c

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: