Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17792

replay-single: test 135 Error: 'import is not in REPLAY_LOCKS state'

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • Lustre 2.17.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for James Simmons <uja.ornl@gmail.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c43792eb-a655-4405-8720-f1b31aee6d88

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/104489 - 4.18.0-513.5.1.el8_9.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/104489 - 4.18.0-513.18.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      Started lustre-OST0000
      CMD: trevis-41vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config TESTLOG_PREFIX=/autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//replay-single TESTNAME=test_135 bash rpc.sh wait_import_state_mount REPLAY_LOCKS osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
      End of sync
      trevis-41vm1.trevis.whamcloud.com: executing wait_import_state_mount REPLAY_LOCKS osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
      CMD: trevis-41vm1.trevis.whamcloud.com lctl get_param -n at_max
      rpc test_135: @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid into REPLAY_LOCKS state after 1475 sec, have IDLE
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:7011:error()
      = /usr/lib64/lustre/tests/test-framework.sh:8433:_wait_import_state()
      = /usr/lib64/lustre/tests/test-framework.sh:8455:wait_import_state()
      = /usr/lib64/lustre/tests/test-framework.sh:8465:wait_import_state_mount()
      = rpc.sh:20:main()
      CMD: trevis-41vm1.trevis.whamcloud.com,trevis-41vm2,trevis-41vm3,trevis-41vm6 /usr/sbin/lctl dk > /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.debug_log.$(hostname -s).1714424452.log;
      dmesg > /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.dmesg.$(hostname -s).1714424452.log
      trevis-41vm1.trevis.whamcloud.com: Dumping lctl log to /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.*.1714424452.log
      replay-single test_135: @@@@@@ FAIL: import is not in REPLAY_LOCKS state

      Attachments

        Issue Links

          Activity

            [LU-17792] replay-single: test 135 Error: 'import is not in REPLAY_LOCKS state'
            gerrit Gerrit Updater added a comment - - edited

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56935
            Subject: LU-17792 tests: fix replay-single 135 "not in REPLAY_LOCKS"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 16e27bad8f43321f753697065492bea87c237705

            gerrit Gerrit Updater added a comment - - edited "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56935 Subject: LU-17792 tests: fix replay-single 135 "not in REPLAY_LOCKS" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 16e27bad8f43321f753697065492bea87c237705
            eaujames Etienne Aujames added a comment - - edited

            adilger, I tried to debug this some months ago, but I am not able to reproduce it.

            "import is not in REPLAY_LOCKS state'" seems to have more occurrences for ZFS. But I tried to run the test more than 100 time on ZFS without any failure.
            So, these failures are more complex, maybe these have multiple causes. It can be caused with something in the environment and/or something coming from context inherited from the previous tests.

            The debug logs from the failures cases, seems to indicate that we don't enter in REPLAY_LOCKS state on the client.
            If we are not able to find a reliable way to get the server in "REPLAY_LOCKS", we can just skip the test if we don't enter this state.

            I will push a debug patch to get more debug information.

            eaujames Etienne Aujames added a comment - - edited adilger , I tried to debug this some months ago, but I am not able to reproduce it. "import is not in REPLAY_LOCKS state'" seems to have more occurrences for ZFS. But I tried to run the test more than 100 time on ZFS without any failure. So, these failures are more complex, maybe these have multiple causes. It can be caused with something in the environment and/or something coming from context inherited from the previous tests. The debug logs from the failures cases, seems to indicate that we don't enter in REPLAY_LOCKS state on the client. If we are not able to find a reliable way to get the server in "REPLAY_LOCKS", we can just skip the test if we don't enter this state. I will push a debug patch to get more debug information.

            eaujames, could you please take a look at this subtest failure. It looks like it was introduced with your patch.

            adilger Andreas Dilger added a comment - eaujames , could you please take a look at this subtest failure. It looks like it was introduced with your patch.
            nangelinas Nikitas Angelinas added a comment - +1 on master: https://testing.whamcloud.com/test_sets/45681a0c-49fa-4a60-8ed4-a1be0badaddd
            yujian Jian Yu added a comment - Lustre 2.16.0 RC5: https://testing.whamcloud.com/test_sets/eb1d5bb8-8d6b-4973-94ed-61d0acf36b17
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.whamcloud.com/test_sets/381fd2ee-2b72-4f87-b902-88df4762d3c7
            yujian Jian Yu added a comment -

            The failure occurred consistently in failover-part-1 and failover-zfs-part-1 test sessions.

            yujian Jian Yu added a comment - The failure occurred consistently in failover-part-1 and failover-zfs-part-1 test sessions.
            yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/70917f0b-ca74-4291-ac26-511cb8de1371

            Etienne, I noticed that test_135 started failing intermittently since 2023-10-25, which is the day your patch https://review.whamcloud.com/41227 "LU-14027 tests: Fix test_135 of replay-single" landed, and the only failure of this subtest in the previous 6 months was on 2023-07-06 from a version of your patch before it landed:

            https://testing.whamcloud.com/search?status%5B%5D=FAIL&test_set_script_id=f6a12204-32c3-11e0-a61c-52540025f9ae&sub_test_script_id=1be840b1-c349-4f88-97fa-0c0f947ae703&start_date=2023-06-01&end_date=2023-12-01&source=sub_tests#redirect

            Could you please take a look at this failure.

            adilger Andreas Dilger added a comment - Etienne, I noticed that test_135 started failing intermittently since 2023-10-25, which is the day your patch https://review.whamcloud.com/41227 " LU-14027 tests: Fix test_135 of replay-single" landed, and the only failure of this subtest in the previous 6 months was on 2023-07-06 from a version of your patch before it landed: https://testing.whamcloud.com/search?status%5B%5D=FAIL&test_set_script_id=f6a12204-32c3-11e0-a61c-52540025f9ae&sub_test_script_id=1be840b1-c349-4f88-97fa-0c0f947ae703&start_date=2023-06-01&end_date=2023-12-01&source=sub_tests#redirect Could you please take a look at this failure.

            People

              eaujames Etienne Aujames
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: