[LU-1126] Client file locking issue. Assertion triggered when decrementing a read lock on an item that has no existing read locks. Created: 21/Feb/12 Updated: 21/Oct/13 Resolved: 21/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Peter Piela (Inactive) | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Environment: |
Client: Lustre 1.8.7-wc1, inkernel IB, RHEL 5.6 |
||
| Attachments: |
|
| Severity: | 1 |
| Rank (Obsolete): | 3992 |
| Description |
|
The Lustre filesystem is mounted on the client using the -o flock option. Without this option the customer's application will not run. The application uses fcntl for file locking. The application does not explicitly release locks, it relies the file close operation to do that. The client hardware configuration is a single HP DL980 G7 server with 8x8 core Nahalem-EX CPUs and 512 GB RAM. The application workload consists of a number of processes writing to a small number of shared datasets. Below is an example of the traceback. Feb 15 21:08:43 pt980a kernel: LustreError: 23926:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) ASSERTION(lock->l_readers > 0) failed |
| Comments |
| Comment by Oleg Drokin [ 21/Feb/12 ] |
|
Since dlmtrace is no longer in default debug amsk, can you please reproduce and then attach here the debug log dumped during lbug? It appears that the problem is due to some sort of a race where once the file is closed, lustre attempts to wipe all flock locks, but one of the locks we found is already free somehow. Additionally if the application needs flock functionality, but it's not important for consistency (e.g. because the app writes to non-intersecting regions of the file and hence does not need the real flock semantics on lustre to work), you can significantly speed up its operations by mounting with -o localflock option, this would provide you with node-local flock consistency, but not cluster-wide consistency (the savings come from not needing to send lock RPCs to MDS for every flock operation) |
| Comment by Peter Piela (Inactive) [ 28/Feb/12 ] |
|
Problem reproduced with lock tracing enabled |
| Comment by Peter Piela (Inactive) [ 28/Feb/12 ] |
|
Customer was able to reproduce the problem with tracing enabled. The log file is attached. |
| Comment by Peter Piela (Inactive) [ 29/Feb/12 ] |
|
The generated log does not contain the trace information that I expected. I will review the steps used to create it. The logs was generated using the steps below. Please advise if they are not correct. lctl> filter all_types |
| Comment by Oleg Drokin [ 29/Feb/12 ] |
|
Can you please tell me how is the flock used by the app to guarantee the consistency? I think the best way to get a very good log would be: Please attach the resulting LBUG message and the decoded log file (the one with txt at the end you got in the last step) here. |
| Comment by Ben Evans (Inactive) [ 02/Mar/12 ] |
|
Debug logs from test run. |
| Comment by Oleg Drokin [ 05/Mar/12 ] |
|
Did the backtrace from the LBUG (in dmesg/kernel log) looked the same as the one at the start of this bug? i.e. it all started at sys_close and then passed through locks_remove_posix into lustre flock code? |
| Comment by Oleg Drokin [ 05/Mar/12 ] |
|
From the log we can see that the kernel made a GETLK request for a write lock covering entire file. Now, even ignoring the wrong unlock mode for a moment, if we look into locks_remove_posix, we can see that it just does not do any GETLK kind of requests. It does a blanket mode UNLCK with SETLK command. Do you have any patches on top of your rhel5 kernel? Please attach fs/locks.c from the kernel you use. |
| Comment by Peter Piela (Inactive) [ 05/Mar/12 ] |
|
Do you have any patches on top of your rhel5 kernel? No [root@pt980a ~]# uname -a Do you have any patches on top of the 1.8.7-wc1 that you use on the clients? No Please attach fs/locks.c from the kernel you use. The customer does not have access to the source file, can you locate it from the version information above? |
| Comment by Oleg Drokin [ 05/Mar/12 ] |
|
Well, I have looked into the file for that kernel version and its content is as I expected. I.e. it does not do getlk there. So, if the backtrace from the crash to which you provided the logs is the same as in the bug description, I don't have a coherent explanation of what has happened. I wonder if you can share the reproducing app with us and what sort of scale does it need to be run at to actually hit the issue? |
| Comment by Peng Tao [ 17/Mar/12 ] |
|
reproducer programe |
| Comment by Peng Tao [ 17/Mar/12 ] |
|
I've spent sometime looking into this and found a reproducer for it. As discussed with Oleg, we identified the root cause to be: 1. For GETLK/LCK_PW operation, client creates struct ldlm_lock with PW mode and reader(0) writer(1). If server returns LCK_PR to indicate the lock is obtained by other process (line 466845 and line 466846 both show server returned PR lock owned by PID 22497), client changes the lock mode to PR but doesn't touch reader/writer count. Hence there is a window where the lock is in resource's lr_granted list, saved in PR mode but had reader(0) and writer(1). Normally it is OK because the lock will be destroyed in ldlm_flock_completion_ast() with PW mode saved in getlk->fl_type. But it is vulnerable because other processes can find it from resource's lr_granted list. 2. In the log, one process entered ldlm_process_flock_lock() looking for PR lock owned by PID 22497 in lr_granted (line 468326). Then it found the locks mentioned in 1 and decided to destroy them because of lock range overlapping. It destroyed the lock in req->l_req_mode which is PR mode but the found locks are in PR mode with reader(0) writer(1). Therefore it caused the LBUG as reported. |
| Comment by Peng Tao [ 18/Mar/12 ] |
|
A patch to fix it by exchanging lock refcount when lock mode changes is submitted to gerrit: |
| Comment by HP Slovakia team (Inactive) [ 03/Aug/12 ] |
|
Our customer also faced this issue on Lustre 1.8.8. When we can expect a patch to fix it? |
| Comment by Ben Allen [ 24/Sep/12 ] |
|
This bug also appears to affect version 2.1.3 installed from Whamcloud's official RPMs and the pre-built kernel 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64. An easy way to reproduce this issue is with BULL's NFS Locktests: http://nfsv4.bullopensource.org/tools/tests/locktest.php. When run with the thread option, it will cause the LBUG nearly immediately on a client. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 24/Jul/13 ] |
|
Dear sirs, we are planning to install SAS with LSF in our production server with before install , we need to fix this bug. The patch reported here http://review.whamcloud.com/#/c/2343/ |
| Comment by Bruno Faccini (Inactive) [ 01/Aug/13 ] |
|
As part of _ unable to reproduce LBUG with attached "flock.c". Thus, I have been able to successfully run "BULL's NFS Locktests" during hours over latest master after applying the simpler patch http://review.whamcloud.com/7134. Will check if I can confirm this running with latest 1.8. May be I should open a specific ticket against master too ? |
| Comment by Bruno Faccini (Inactive) [ 10/Sep/13 ] |
|
Specific case/scenario of LBUG when running "BULL's NFS Locktests" is now addressed by |
| Comment by Jodi Levi (Inactive) [ 21/Oct/13 ] |
|
Patch landed to Master. |