Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for James Simmons <uja.ornl@gmail.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c43792eb-a655-4405-8720-f1b31aee6d88
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/104489 - 4.18.0-513.5.1.el8_9.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/104489 - 4.18.0-513.18.1.el8_lustre.x86_64
<<Please provide additional information about the failure here>>
Started lustre-OST0000
CMD: trevis-41vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/opt/iozone/bin:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config TESTLOG_PREFIX=/autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//replay-single TESTNAME=test_135 bash rpc.sh wait_import_state_mount REPLAY_LOCKS osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
End of sync
trevis-41vm1.trevis.whamcloud.com: executing wait_import_state_mount REPLAY_LOCKS osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
CMD: trevis-41vm1.trevis.whamcloud.com lctl get_param -n at_max
rpc test_135: @@@@@@ FAIL: can't put import for osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid into REPLAY_LOCKS state after 1475 sec, have IDLE
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:7011:error()
= /usr/lib64/lustre/tests/test-framework.sh:8433:_wait_import_state()
= /usr/lib64/lustre/tests/test-framework.sh:8455:wait_import_state()
= /usr/lib64/lustre/tests/test-framework.sh:8465:wait_import_state_mount()
= rpc.sh:20:main()
CMD: trevis-41vm1.trevis.whamcloud.com,trevis-41vm2,trevis-41vm3,trevis-41vm6 /usr/sbin/lctl dk > /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.debug_log.$(hostname -s).1714424452.log;
dmesg > /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.dmesg.$(hostname -s).1714424452.log
trevis-41vm1.trevis.whamcloud.com: Dumping lctl log to /autotest/autotest-2/2024-04-29/lustre-reviews_review-zfs_104489_2_4b20e343-e437-4963-b984-b19ca92bce9e//rpc.test_135.*.1714424452.log
replay-single test_135: @@@@@@ FAIL: import is not in REPLAY_LOCKS state
Attachments
Issue Links
- is duplicated by
-
LU-17262 replay-single test_135: FAIL: Unexpected sync success
-
- Open
-
- is related to
-
LU-14027 Client recovery statemachine hangs in recovery disconnected during lock reply
-
- Resolved
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
adilger, I tried to debug this some months ago, but I am not able to reproduce it.
"import is not in REPLAY_LOCKS state'" seems to have more occurrences for ZFS. But I tried to run the test more than 100 time on ZFS without any failure.
So, these failures are more complex, maybe these have multiple causes. It can be caused with something in the environment and/or something coming from context inherited from the previous tests.
The debug logs from the failures cases, seems to indicate that we don't enter in REPLAY_LOCKS state on the client.
If we are not able to find a reliable way to get the server in "REPLAY_LOCKS", we can just skip the test if we don't enter this state.
I will push a debug patch to get more debug information.