Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 1.8.7
-
None
-
Client: Lustre 1.8.7-wc1, inkernel IB, RHEL 5.6
Server Lustre 1.8.4, CentOS 5.5, Terascala appliance
-
1
-
3992
Description
The Lustre filesystem is mounted on the client using the -o flock option. Without this option the customer's application will not run. The application uses fcntl for file locking. The application does not explicitly release locks, it relies the file close operation to do that.
The client hardware configuration is a single HP DL980 G7 server with 8x8 core Nahalem-EX CPUs and 512 GB RAM.
The application workload consists of a number of processes writing to a small number of shared datasets.
Below is an example of the traceback.
Feb 15 21:08:43 pt980a kernel: LustreError: 23926:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) ASSERTION(lock->l_readers > 0) failed
Feb 15 21:08:43 pt980a kernel: LustreError: 23926:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) LBUG
Feb 15 21:08:43 pt980a kernel: Pid: 23926, comm: sas
Feb 15 21:08:43 pt980a kernel:
Feb 15 21:08:43 pt980a kernel: Call Trace:
Feb 15 21:08:43 pt980a kernel: [<ffffffff889fa6a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Feb 15 21:08:43 pt980a kernel: [<ffffffff889fabda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88a02ff0>] tracefile_init+0x0/0x110 [libcfs]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b2161f>] ldlm_lock_decref_internal_nolock+0x7f/0x100 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b492d9>] ldlm_process_flock_lock+0x1089/0x18a0 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88a4c33d>] LNetMDUnlink+0xcd/0xf0 [lnet]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b1fd59>] ldlm_grant_lock+0x4e9/0x550 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b4a5fb>] ldlm_flock_completion_ast+0xa0b/0xaf0 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b23729>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3bc8b>] ldlm_cli_enqueue_fini+0xa5b/0xbc0 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88ab567d>] class_handle_hash+0x16d/0x250 [obdclass]
Feb 15 21:08:43 pt980a kernel: [<ffffffff8008e7f7>] default_wake_function+0x0/0xe
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3d7cf>] ldlm_cli_enqueue+0x63f/0x700 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3e0a0>] ldlm_completion_ast+0x0/0x880 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88d20acf>] ll_file_flock+0x57f/0x680 [lustre]
Feb 15 21:08:43 pt980a kernel: [<ffffffff88b49bf0>] ldlm_flock_completion_ast+0x0/0xaf0 [ptlrpc]
Feb 15 21:08:43 pt980a kernel: [<ffffffff8003063e>] locks_remove_posix+0x84/0xa8
Feb 15 21:08:43 pt980a kernel: [<ffffffff8003007e>] __up_write+0x27/0xf2
Feb 15 21:08:43 pt980a kernel: [<ffffffff80023da7>] filp_close+0x54/0x64
Feb 15 21:08:43 pt980a kernel: [<ffffffff8001e211>] sys_close+0x88/0xbd
Feb 15 21:08:43 pt980a kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA