[LU-2839] Client file locking issue Created: 20/Feb/13  Updated: 29/Jul/14  Resolved: 29/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: HP Slovakia team (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Server: 1.8.8-wc1
Client: 1.8.8-wc1
OS : RHEL Server release 5.8 (Tikanga)


Severity: 4
Rank (Obsolete): 6875

 Description   

Customer experienced an issue with file locking.
This case is the same as LU-1126.
Please increase priority to release a patch.
This is customer with payed Lustre support for env. size of 8 OSS.
Example traceback:

Jul 30 08:19:54 skpriu01b kernel: LustreError: 2427:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) ASSERTION(loc
k->l_readers > 0) failed
Jul 30 08:19:54 skpriu01b kernel: LustreError: 2427:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) LBUG
Jul 30 08:19:54 skpriu01b kernel: Pid: 2427, comm: PNetTNetServer.
Jul 30 08:19:54 skpriu01b kernel:
Jul 30 08:19:54 skpriu01b kernel: Call Trace:
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff885b86a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff885b8bda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff885c0fc0>] tracefile_init+0x0/0x110 [libcfs]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff886e261f>] ldlm_lock_decref_internal_nolock+0x7f/0xe0 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8870a9a9>] ldlm_process_flock_lock+0x1779/0x18a0 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8860b33d>] LNetMDUnlink+0xcd/0xf0 [lnet]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff886e0d59>] ldlm_grant_lock+0x4e9/0x550 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8870b5db>] ldlm_flock_completion_ast+0xa0b/0xaf0 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff886e4709>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff886fcc6b>] ldlm_cli_enqueue_fini+0xa5b/0xbc0 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8867565d>] class_handle_hash+0x16d/0x250 [obdclass]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8008ee72>] default_wake_function+0x0/0xe
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff886fe7af>] ldlm_cli_enqueue+0x63f/0x700 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff888e258f>] ll_file_flock+0x57f/0x680 [lustre]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8870abd0>] ldlm_flock_completion_ast+0x0/0xaf0 [ptlrpc]
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8000ce04>] do_path_lookup+0x294/0x310
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff80039cb2>] fcntl_setlk+0x11e/0x273
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8008ee72>] default_wake_function+0x0/0xe
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff8002e1d8>] sys_fcntl+0x269/0x2dc
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff800fef39>] compat_sys_fcntl64+0x222/0x381
Jul 30 08:19:54 skpriu01b kernel: [<ffffffff800614b5>] sysenter_do_call+0x1e/0x76
Jul 30 08:19:54 skpriu01b kernel:



 Comments   
Comment by Bruno Faccini (Inactive) [ 20/Feb/13 ]

Hello, I take the action to push for LU-1126 patch review/landing.
I may finally duplicate this JIRA to LU-1126.

Comment by Peter Jones [ 20/Feb/13 ]

Bruno is looking into this ticket

Comment by Oleg Drokin [ 21/Feb/13 ]

Does this application really need to have whole-cluster-consistent locking? If not, you can mount your clients with -o localflock option and the problem will go away (as a bonus, the program will likely be a little bit faster too).

Comment by John Fuchs-Chesney (Inactive) [ 08/Mar/14 ]

HP Slovakia team,
Was Oleg's suggestion helpful to you?
Do you still have an issue with this ticket, or can I mark it as resolved?
Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 29/Jul/14 ]

Patch was landed to Master.
~ jfc.

Generated at Sat Feb 10 01:28:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.