[LU-12828] FLOCK request can be processed twice during resend Created: 01/Oct/19 Updated: 16/Mar/21 Resolved: 14/Dec/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Andriy Skulysh | Assignee: | Andriy Skulysh |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
It can lead to wrong UNLOCK: 00000080:00010000:0.0:1519718439.198477:0:9470:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9470, owner=0xffff88012d046e00, flags=0x0, mode=2, start=0, end=9223372036854775807 00010000:00010000:0.0:1519718439.200930:0:7184:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 2 start 0 end 9223372036854775807 00000080:00010000:0.0:1519718439.210498:0:9472:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9472, owner=0xffff88013606ce00, flags=0x0, mode=2, start=0, end=9223372036854775807 00000080:00010000:0.0:1519718440.231947:0:9470:0:(file.c:3583:ll_file_flock()) inode=[0x200000400:0x1:0x0], pid=9470, owner=0xffff88012d046e00, flags=0x0, mode=32, start=0, end=9223372036854775807 00000100:00080000:0.0:1519718440.248334:0:5786:0:(client.c:2792:ptlrpc_resend_req()) @@@ going to resend req@ffff88012d036f00 x1593540148730064/t0(0) o101->lustre-MDT0000-mdc-ffff88009c2e4000@0@lo:12/10 lens 328/344 e 0 to 0 dl 1519718447 ref 2 fl Rpc:/0/ffffffff rc 0/-1 00000001:00020000:0.0:1519718440.248489:0:7185:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 10000ms 00000001:00020000:0.0:1519718450.233339:0:7184:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 999 awake 00010000:00010000:0.0:1519718450.233528:0:7184:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807 00000001:00020000:0.0:1519718450.237109:0:9470:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 11000ms 00000001:00020000:0.0:1519718450.248712:0:7185:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake 00000001:00020000:0.0:1519718450.249192:0:7185:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 11000ms 00000001:00020000:0.0:1519718458.253208:0:7184:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 sleeping for 10000ms 00000001:00020000:0.0:1519718461.237222:0:9470:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake 00010000:00010000:0.0:1519718461.237816:0:9470:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x800000000 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807 00010000:00010000:0.0:1519718461.239919:0:9484:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 2 start 0 end 9223372036854775807 00010000:00010000:0.0:1519718461.241247:0:9470:0:(ldlm_flock.c:819:ldlm_flock_completion_ast()) ### client-side enqueue returned a blocked lock, sleeping ns: lustre-MDT0000-mdc-ffff88009c2e4000 lock: ffff8800a0f52800/0x6628549332bf7a4f lrc: 4/0,1 mode: --/PW res: [0x200000400:0x1:0x0].c rrc: 4 type: FLK pid: 9470 [0->9223372036854775807] flags: 0x0 nid: local remote: 0x6628549332bf7a56 expref: -99 pid: 9470 timeout: 0 00000001:00020000:0.0:1519718461.249186:0:7185:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 998 awake 00010000:00010000:0.0:1519718461.249206:0:7185:0:(ldlm_flock.c:314:ldlm_process_flock_lock()) flags 0x0 owner 18446612137364450816 pid 9470 mode 32 start 0 end 9223372036854775807 00010000:00020000:0.0:1519718461.251161:0:9470:0:(ldlm_flock.c:885:ldlm_flock_completion_ast()) client-side: only asynchronous lock enqueue can be canceled by CANCELK The similar scenario is possible with double FLOCK lock which leads to lock on the MDS without any lock on a client |
| Comments |
| Comment by Gerrit Updater [ 01/Oct/19 ] |
|
Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/36340 |
| Comment by Gerrit Updater [ 07/Nov/19 ] |
|
Sorry for the noisy. |
| Comment by Oleg Drokin [ 12/Nov/19 ] |
|
can you please elaborate how double locj happenswnad why it would lead to a lock on server with no lock on client? I think second flock of a double flock is a noop by definition? I looked at the trace but it's not obvious what wrong unlock means too. |
| Comment by Gerrit Updater [ 14/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36340/ |
| Comment by Peter Jones [ 14/Dec/19 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 02/Mar/21 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/41818 |
| Comment by Cory Spitz [ 02/Mar/21 ] |
|
vitaly_fertman, |
| Comment by Gerrit Updater [ 16/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41818/ |