[LU-11387] sanityn test_34: lock timeout happened Created: 17/Sep/18  Updated: 01/Nov/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4080e38e-b988-11e8-9df3-52540065bddc

test_34 failed with the following error:

lock timeout happened

This is a new failure starts from tag-2.11.55

test log

== sanityn test 34: no lock timeout under IO ========================================================= 02:25:24 (1536978324)
CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
CMD: trevis-33vm3 lctl set_param fail_loc=0x512
fail_loc=0x512
lock should not expire
writing on client1
reading on client2
CMD: trevis-33vm3 lctl get_param -n ldlm.namespaces.filter-*.lock_timeouts
 sanityn test_34: @@@@@@ FAIL: lock timeout happened 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5764:error()
  = /usr/lib64/lustre/tests/sanityn.sh:1190:test_34()
  = /usr/lib64/lustre/tests/test-framework.sh:6045:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6084:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5931:run_test()
  = /usr/lib64/lustre/tests/sanityn.sh:1198:main()
Dumping lctl log to /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.*.1536978344.log
CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 /usr/sbin/lctl dk > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.debug_log.\$(hostname -s).1536978344.log;
         dmesg > /autotest/autotest2/2018-09-14/lustre-master-el7-x86_64--full--1_1_1__3792___ae7d30c1-91f6-4521-85bc-1217e7c76208/sanityn.test_34.dmesg.\$(hostname -s).1536978344.log
Resetting fail_loc on all nodes...CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2,trevis-33vm3,trevis-33vm4 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
done.
CMD: trevis-33vm3 lctl set_param -n fail_loc=0 2>/dev/null || true
CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid
osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40 
== rpc test complete, duration -o sec ================================================================ 02:25:47 (1536978347)
trevis-33vm1.trevis.whamcloud.com: executing wait_import_state (FULL|IDLE) osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid osc.lustre-OST0000-osc-ffff9de25e57e000.ost_server_uuid 40
osc.lustre-OST0000-osc-ffff9de2562cf000.ost_server_uuid in IDLE state after 0 sec
CMD: trevis-33vm1.trevis.whamcloud.com lctl get_param -n at_min
CMD: trevis-33vm1.trevis.whamcloud.com PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh wait_import_state \(FULL\|IDLE\) osc.lustre-OST0001-osc-ffff9de2562cf000.ost_server_uuid
osc.lustre-OST0001-osc-ffff9de25e57e000.ost_server_uuid 40 
== rpc test complete, duration -o sec ================================================================ 02:25:48 (1536978348)

OST dmesg

[21528.561273] Lustre: *** cfs_fail_loc=512, val=0***
[21529.151161] Lustre: *** cfs_fail_loc=512, val=0***
[21529.163524] LustreError: 6657:0:(ldlm_lockd.c:251:expired_lock_main()) ### lock callback timer expired after 1s: evicting client at 10.9.5.158@tcp  ns: filter-lustre-OST0001_UUID lock: ffff8c6a7596e240/0x7408246bca0d1d1c lrc: 3/0,0 mode: PW/PW res: [0x433a:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x60000400000020 nid: 10.9.5.158@tcp remote: 0x7280a0e6030637b7 expref: 9 pid: 8833 timeout: 21529 lvb_type: 0
[21529.169313] Lustre: *** cfs_fail_loc=512, val=0***
[21529.169943] LustreError: 29618:0:(ldlm_lockd.c:2263:ldlm_cancel_handler()) ldlm_cancel from 10.9.5.158@tcp arrived at 1536978332 with bad export cookie 8360972753583283470
[21529.169966] Lustre: *** cfs_fail_loc=512, val=0***
[21529.172542] Lustre: *** cfs_fail_loc=512, val=0***

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_34 - lock timeout happened



 Comments   
Comment by Patrick Farrell (Inactive) [ 12/Mar/19 ]

On master:
https://testing.whamcloud.com/test_sessions/db7a6124-7e6d-4bf7-a01e-86c5f5e32682

Comment by Chris Horn [ 20/Jun/19 ]

Hit on b2_12:
https://testing.whamcloud.com/test_sessions/7bd9c2b0-fd1e-4f94-8b5a-554ebe6dafb4

Comment by Andreas Dilger [ 25/May/20 ]

+1 https://testing.whamcloud.com/test_sets/3e52e47f-812e-4f5d-b912-6054959df98d

Generated at Sat Feb 10 02:43:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.