Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11206

recovery-mds-scale test failover_ost fails with 'import is not in FULL state'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      recovery-mds-scale test_failover_ost fails with an OST in IDLE state when it should be in FULL state. From the test_log we can see that we successfully failover OSTs 55 times and, on the 56th failover, we see that one OST is in the IDLE state

      trevis-4vm8: trevis-4vm8.trevis.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-ffff*.ost_server_uuid,osc.lustre-OST0001-osc-ffff*.ost_server_uuid,osc.lustre-OST0002-osc-ffff*.ost_server_uuid,osc.lustre-OST0003-osc-ffff*.ost_server_uuid,osc.lustre-OST0004-osc-ffff*.ost_server_uuid,osc.lustre-OST0005-osc-ffff*.ost_server_uuid,osc.lustre-OST0006-osc-ffff*.ost_server_uuid
      trevis-4vm7: trevis-4vm7.trevis.whamcloud.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-ffff*.ost_server_uuid,osc.lustre-OST0001-osc-ffff*.ost_server_uuid,osc.lustre-OST0002-osc-ffff*.ost_server_uuid,osc.lustre-OST0003-osc-ffff*.ost_server_uuid,osc.lustre-OST0004-osc-ffff*.ost_server_uuid,osc.lustre-OST0005-osc-ffff*.ost_server_uuid,osc.lustre-OST0006-osc-ffff*.ost_server_uuid
      trevis-4vm8: CMD: trevis-4vm8.trevis.whamcloud.com lctl get_param -n at_max
      trevis-4vm7: CMD: trevis-4vm7.trevis.whamcloud.com lctl get_param -n at_max
      trevis-4vm8: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm8: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm8: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm8: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0004-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0005-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm7: osc.lustre-OST0006-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-4vm8:  rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0004-osc-ffff*.ost_server_uuid into FULL state after 1475 sec, have IDLE 
      trevis-4vm8:   Trace dump:
      trevis-4vm8:   = /usr/lib64/lustre/tests/test-framework.sh:5742:error()
      trevis-4vm8:   = /usr/lib64/lustre/tests/test-framework.sh:6846:_wait_import_state()
      trevis-4vm8:   = /usr/lib64/lustre/tests/test-framework.sh:6868:wait_import_state()
      trevis-4vm8:   = /usr/lib64/lustre/tests/test-framework.sh:6877:wait_import_state_mount()
      trevis-4vm8:   = rpc.sh:21:main()
      

      I can’t find any stack traces or any Lustre errors that would explain what the issue is here.

      So far, we've only seen this for failover testing and only with ZFS.

      Logs for this failure are at
      https://testing.whamcloud.com/test_sets/2f7df696-91c2-11e8-8ee3-52540065bddc

      There are two older test sessions that fail in a similar way, but there are no successful OST failovers when the fail with this error:
      https://testing.whamcloud.com/test_sets/452cb77a-8da8-11e8-a9f7-52540065bddc
      https://testing.whamcloud.com/test_sets/ebaddda0-8ba9-11e8-b0aa-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11206] recovery-mds-scale test failover_ost fails with 'import is not in FULL state'

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34471/
            Subject: LU-11206 tests: Use import_ready to check IDLE
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 54f4c4341b8330468ad95abd2bf4051a3d17abba

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34471/ Subject: LU-11206 tests: Use import_ready to check IDLE Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 54f4c4341b8330468ad95abd2bf4051a3d17abba

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34471
            Subject: LU-11206 tests: Use import_ready to check IDLE
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 0e5ede9852562b67d0f23d2264cfa78cc3cbfc97

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34471 Subject: LU-11206 tests: Use import_ready to check IDLE Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 0e5ede9852562b67d0f23d2264cfa78cc3cbfc97
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34225/
            Subject: LU-11206 tests: Use import_ready to check IDLE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3ed6b8c2ea27b1a3a9fa073e19d77d7c317ae69f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34225/ Subject: LU-11206 tests: Use import_ready to check IDLE Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3ed6b8c2ea27b1a3a9fa073e19d77d7c317ae69f

            Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34225
            Subject: LU-11206 tests: Use import_ready to check IDLE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 31bf8c928d6f45895c6ba2f02f4b857087aabd18

            gerrit Gerrit Updater added a comment - Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34225 Subject: LU-11206 tests: Use import_ready to check IDLE Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 31bf8c928d6f45895c6ba2f02f4b857087aabd18
            pfarrell Patrick Farrell (Inactive) added a comment - Master: https://testing.whamcloud.com/test_sessions/2e75b03b-1a6a-4ff8-89a8-bdfe0336a0c2
            yujian Jian Yu added a comment - conf-sanity test 93 failed with the same issue in DNE ZFS test sessions on master branch: https://testing.whamcloud.com/test_sets/fa748b22-c0f6-11e8-b143-52540065bddc https://testing.whamcloud.com/test_sets/65ac0bcc-bb70-11e8-9df3-52540065bddc https://testing.whamcloud.com/test_sets/1571b9bc-bb99-11e8-9df3-52540065bddc https://testing.whamcloud.com/test_sets/3a4fa378-bcea-11e8-a9d9-52540065bddc It's affecting patch testing.

            People

              pfarrell Patrick Farrell (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: