Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.6.0
-
3
-
12008
Description
If I run the following:
# export OSTCOUNT=6 # export MOUNT_2=y # ./lustre/tests/llmount.sh # lfs setstripe -c 6 /mnt/lustre/f0 # # (while true; do echo Hi > /mnt/lustre/f0; done) & # (while true; do echo Bye > /mnt/lustre2/f0; done) &
Then within a second of starting, both child tasks get stuck in cl_lock_state_wait()
[<ffffffffa045cb75>] cl_lock_state_wait+0x1b5/0x320 [obdclass] [<ffffffffa045d35b>] cl_enqueue_locked+0x15b/0x1f0 [obdclass] [<ffffffffa045debe>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0462e4c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0463082>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa0dcabe8>] cl_setattr_ost+0x218/0x2f0 [lustre] [<ffffffffa0d96145>] ll_setattr_raw+0xa45/0x10c0 [lustre] [<ffffffffa0d9681d>] ll_setattr+0x5d/0xf0 [lustre] [<ffffffff811a0048>] notify_change+0x168/0x340 [<ffffffff81180ad4>] do_truncate+0x64/0xa0 [<ffffffff811949e1>] do_filp_open+0x851/0xdc0 [<ffffffff8117f849>] do_sys_open+0x69/0x140 [<ffffffff8117f960>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff
They stay stuck there until one client gets evicted by an OST:
LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 0@lo ns: filter-lustre-OST0002_UUID lock: ffff880217559100/0xb06606e6f58bd625 lrc: 3/0,0 mode: PW/PW res: [0x2:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 0@lo remote: 0xb06606e6f58bd61e expref: 4 pid: 14479 timeout: 4300627190 lvb_type: 0 LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 0@lo ns: filter-lustre-OST0004_UUID lock: ffff88019996f9c0/0xb06606e6f58bd58b lrc: 3/0,0 mode: PW/PW res: [0x2:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 0@lo remote: 0xb06606e6f58bd584 expref: 4 pid: 13781 timeout: 4300627191 lvb_type: 0 LustreError: 11-0: lustre-OST0002-osc-ffff8801a13eb800: Communicating with 0@lo, operation obd_ping failed with -107. Lustre: lustre-OST0004-osc-ffff88019e033800: Connection to lustre-OST0004 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete LustreError: 167-0: lustre-OST0004-osc-ffff88019e033800: This client was evicted by lustre-OST0004; in progress operations using this service will fail. LustreError: 16413:0:(ldlm_resource.c:815:ldlm_resource_complain()) lustre-OST0004-osc-ffff88019e033800: namespace resource [0x2:0x0:0x0].0 (ffff8801a86f6980) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 16413:0:(ldlm_resource.c:1454:ldlm_resource_dump()) --- Resource: [0x2:0x0:0x0].0 (ffff8801a86f6980) refcount = 2 Lustre: lustre-OST0004-osc-ffff88019e033800: Connection restored to lustre-OST0004 (at 0@lo)
Attachments
Issue Links
- duplicates
-
LU-4495 client evicted on parallel append write to the shared file.
- Closed