Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.12.3
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4080e38e-b988-11e8-9df3-52540065bddc
test_34 failed with the following error:
lock timeout happened
This is a new failure starts from tag-2.11.55
test log
== sanityn test 34: no lock timeout under IO ========================================================= 02:25:24 (1536978324)
CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
lock should not expire
writing on client1
reading on client2
CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
sanityn test_34: @@@@@@ FAIL: lock timeout happened
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5764:error()
= /usr/lib64/lustre/tests/sanityn.sh:1190:test_34()
= /usr/lib64/lustre/tests/test-framework.sh:6045:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:6084:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:5931:run_test()
= /usr/lib64/lustre/tests/sanityn.sh:1198:main()
Dumping lctl log to /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.*.1536978344.log
CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 /usr/sbin/lctl dk > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.debug_log.\$(hostname -s).1536978344.log;
dmesg > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.dmesg.\$(hostname -s).1536978344.log
Resetting fail_loc on all nodes...CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
done.
CMD: trevis-33vm3 lctl set_param -n fail_loc=0 2>/dev/null || true
CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid
osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40
== rpc test complete, duration -o sec ================================================================ 02:25:47 (1536978347)
trevis-33vm1.trevis.whamcloud.com: executing wait_import_state (FULL|IDLE) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40
osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid in IDLE state after 0 sec
CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0001-osc-ffff9de2562cf000.ost_server_uuid
osc.lustre-OST0001-osc-ffff9de25e57e000.ost_server_uuid 40
== rpc test complete, duration -o sec ================================================================ 02:25:48 (1536978348)
OST dmesg
[21528.561273] Lustre: *** cfs_fail_loc=512, val=0*** [21529.151161] Lustre: *** cfs_fail_loc=512, val=0*** [21529.163524] LustreError: 6657:0:(ldlm_lockd.c:251:expired_lock_main()) ### lock callback timer expired after 1s: evicting client at 10.9.5.158@tcp ns: filter-lustre-OST0001_UUID lock: ffff8c6a7596e240/0x7408246bca0d1d1c lrc: 3/0,0 mode: PW/PW res: [0x433a:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x60000400000020 nid: 10.9.5.158@tcp remote: 0x7280a0e6030637b7 expref: 9 pid: 8833 timeout: 21529 lvb_type: 0 [21529.169313] Lustre: *** cfs_fail_loc=512, val=0*** [21529.169943] LustreError: 29618:0:(ldlm_lockd.c:2263:ldlm_cancel_handler()) ldlm_cancel from 10.9.5.158@tcp arrived at 1536978332 with bad export cookie 8360972753583283470 [21529.169966] Lustre: *** cfs_fail_loc=512, val=0*** [21529.172542] Lustre: *** cfs_fail_loc=512, val=0***
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_34 - lock timeout happened