Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3168

replay-ost-single test_3: rpc : FAIL: can't put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_LOCKS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Low Priority
    • Minor
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 7728

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/cb34d710-a15b-11e2-b429-52540035b04c.

      The sub-test test_3 failed with the following error:

      test failed to respond and timed out

      Info required for matching: replay-ost-single 3

      From the test log:

      == replay-ost-single test 3: Fail OST during write, with verification == 10:40:41 (1365529241)
      Failing ost1 on wtm-13vm8
      CMD: wtm-13vm8 grep -c /mnt/ost1' ' /proc/mounts
      Stopping /mnt/ost1 (opts on wtm-13vm8
      CMD: wtm-13vm8 umount -d /mnt/ost1
      CMD: wtm-13vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      reboot facets: ost1
      Failover ost1 to wtm-13vm8
      10:41:01 (1365529261) waiting for wtm-13vm8 network 900 secs ...
      10:41:01 (1365529261) network interface is UP
      CMD: wtm-13vm8 hostname
      mount facets: ost1
      CMD: wtm-13vm8 test -b /dev/lvm-OSS/P1
      Starting ost1: /dev/lvm-OSS/P1 /mnt/ost1
      CMD: wtm-13vm8 mkdir -p /mnt/ost1; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1
      CMD: wtm-13vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"0x33f0404\" \" 0xffb7e3ff\" 32
      CMD: wtm-13vm8 e2label /dev/lvm-OSS/P1 2>/dev/null
      Started lustre-OST0000
      CMD: wtm-13vm1,wtm-13vm2.rosso.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
      wtm-13vm2: CMD: wtm-13vm2.rosso.whamcloud.com lctl get_param -n at_max
      wtm-13vm1: CMD: wtm-13vm1.rosso.whamcloud.com lctl get_param -n at_max
      wtm-13vm1: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
      wtm-13vm1: Trace dump:
      wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:4024:error_noexit()
      wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:4047:error()
      wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5070:_wait_import_state()
      wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5089:wait_import_state()
      wtm-13vm1: = /usr/lib64/lustre/tests/test-framework.sh:5098:wait_import_state_mount()
      wtm-13vm1: = rpc.sh:20:main()
      [...]

      From the OSS console log:

      [...]
      10:52:20:Lustre: lustre-OST0000: Client e2129d7f-62b9-b16f-bf66-e1b35c1765b7 (at 10.10.16.142@tcp) reconnecting, waiting for 3 clients in recovery for 0:41
      10:52:20:Lustre: Skipped 78214 previous similar messages
      10:52:20:LustreError: 5789:0:(ldlm_resource.c:1171:ldlm_resource_get()) lvbo_init failed for resource 2820: rc -2
      10:52:20:LustreError: 5789:0:(ldlm_resource.c:1171:ldlm_resource_get()) Skipped 544404 previous similar messages
      10:52:20:Lustre: lustre-OST0000: Client e2129d7f-62b9-b16f-bf66-e1b35c1765b7 (at 10.10.16.142@tcp) reconnecting, waiting for 3 clients in recovery for 0:42
      10:52:20:Lustre: Skipped 160009 previous similar messages
      10:52:20:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0000-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
      [...]

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: