[LU-12828] FLOCK request can be processed twice during resend Created: 01/Oct/19  Updated: 16/Mar/21  Resolved: 14/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0, Lustre 2.15.0

Type: Bug Priority: Critical
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It can lead to wrong UNLOCK:

00000080:00010000:0.0:1519718439.198477:0:9470:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9470, owner=0xffff88012d046e00, flags=0x0, mode=2, start=0, end=9223372036854775807
00010000:00010000:0.0:1519718439.200930:0:7184:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 2 start 0 end 9223372036854775807
00000080:00010000:0.0:1519718439.210498:0:9472:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9472, owner=0xffff88013606ce00, flags=0x0, mode=2, start=0, end=9223372036854775807
00000080:00010000:0.0:1519718440.231947:0:9470:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9470, owner=0xffff88012d046e00, flags=0x0, mode=32, start=0, end=9223372036854775807
00000100:00080000:0.0:1519718440.248334:0:5786:0:(client.c:2792:ptlrpc_resend_req()) @@@ going to resend  req@ffff88012d036f00 x1593540148730064/t0(0) o101->lustre-MDT0000-mdc-ffff88009c2e4000@0@lo:12/10 lens 328/344 e 0 to 0 dl 1519718447 ref 2 fl Rpc:/0/ffffffff rc 0/-1
00000001:00020000:0.0:1519718440.248489:0:7185:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 10000ms
00000001:00020000:0.0:1519718450.233339:0:7184:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 999 awake
00010000:00010000:0.0:1519718450.233528:0:7184:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807
00000001:00020000:0.0:1519718450.237109:0:9470:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 11000ms
00000001:00020000:0.0:1519718450.248712:0:7185:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake
00000001:00020000:0.0:1519718450.249192:0:7185:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 11000ms
00000001:00020000:0.0:1519718458.253208:0:7184:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 10000ms
00000001:00020000:0.0:1519718461.237222:0:9470:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake
00010000:00010000:0.0:1519718461.237816:0:9470:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x800000000 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807
00010000:00010000:0.0:1519718461.239919:0:9484:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 2 start 0 end 9223372036854775807
00010000:00010000:0.0:1519718461.241247:0:9470:0:(ldlm_flock.c:819:ldlm_flock_completion_ast()) ### client-side enqueue returned a blocked lock, sleeping ns: lustre-MDT0000-mdc-ffff88009c2e4000 lock: ffff8800a0f52800/0x6628549332bf7a4f lrc: 4/0,1 mode: --/PW res: [0x200000400:0x1:0x0].c rrc: 4 type: FLK pid: 9470 [0->9223372036854775807] flags: 0x0 nid: local remote: 0x6628549332bf7a56 expref: -99 pid: 9470 timeout: 0
00000001:00020000:0.0:1519718461.249186:0:7185:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake
00010000:00010000:0.0:1519718461.249206:0:7185:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807
00010000:00020000:0.0:1519718461.251161:0:9470:0:(ldlm_flock.c:885:ldlm_flock_completion_ast()) client-side: only asynchronous lock enqueue can be canceled by CANCELK

The similar scenario is possible with double FLOCK lock which leads to lock on the MDS without any lock on a client



 Comments   
Comment by Gerrit Updater [ 01/Oct/19 ]

Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/36340
Subject: LU-12828 ldlm: FLOCK request can be processed twice
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 74955a7424a83c91efe025f71a8b7df73a2209d9

Comment by Gerrit Updater [ 07/Nov/19 ]

Sorry for the noisy.

Comment by Oleg Drokin [ 12/Nov/19 ]

can you please elaborate how double locj happenswnad why it would lead to a lock on server with no lock on client? I think second flock of a double flock is a noop by definition?

I looked at the trace but it's not obvious what wrong unlock means too.

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36340/
Subject: LU-12828 ldlm: FLOCK request can be processed twice
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 85a12c6c8d7a6d40fc81c73f9b900475b58e3e98

Comment by Peter Jones [ 14/Dec/19 ]

Landed for 2.14

Comment by Gerrit Updater [ 02/Mar/21 ]

Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/41818
Subject: LU-12828 ldlm: not freed req on enqueue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0673bd5ca08177ed9e0dde82691409437c91eceb

Comment by Cory Spitz [ 02/Mar/21 ]

vitaly_fertman, LU-12828 is RESOLVED. You should either re-open this ticket or open a new one for https://review.whamcloud.com/#/c/41818/. I imagine that a new ticket would be in order.

Comment by Gerrit Updater [ 16/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41818/
Subject: LU-12828 ldlm: not freed req on enqueue
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce9c1c11593814dacacc2c66f9fcf124ea84b807

Generated at Sat Feb 10 02:56:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.