[LU-3684] LBUG/"ldlm_lock_decref_internal_nolock()) ASSERTION(lock->l_readers > 0) failed" running Bull's NFS locktests Created: 01/Aug/13 Updated: 16/Apr/14 Resolved: 03/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Supporto Lustre Jnet2000 (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client Lustre 1.8.7 jenkins-wc1--PRISTINE-2.6.18-274.3.1.el5 RHEL 5.7 |
||
| Attachments: |
|
| Rank (Obsolete): | 9511 |
| Description |
|
We need to fix the bug reported here: https://jira.hpdd.intel.com/browse/LU-1126 before installing Sas Grid Manager. The Lustre filesystem is mounted on client using the -o flock option |
| Comments |
| Comment by Peter Jones [ 01/Aug/13 ] |
|
Thanks for the report. We are looking into the best option |
| Comment by Bruno Faccini (Inactive) [ 02/Sep/13 ] |
|
Hello, Since your feeling is that your problem is still the one originally addressed by Do you think you can again provide the Lustre debug-log (The way Oleg described in Thanks in advance for your help. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 02/Sep/13 ] |
|
Dear Bruno, currently we don't have SAS Grid Manager installed on our system. Before installing it, sas support team require that the bug reported here https://jira.hpdd.intel.com/browse/LU-1126 must to be fixed. |
| Comment by Bruno Faccini (Inactive) [ 02/Sep/13 ] |
|
That would be nice if I can get the Lustre debug-log taken during "BULL's NFS Locktests" run at your site! Thanks in advance. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 06/Sep/13 ] |
|
Dear Bruno, we have attached the log of lustre client crashed during the execution of the BULL test. |
| Comment by Bruno Faccini (Inactive) [ 09/Sep/13 ] |
|
I checked the lustre-log you provided and it is definitelly the same problem triggered by "BULL's NFS Locktests" (and not the original one in |
| Comment by Bruno Faccini (Inactive) [ 10/Sep/13 ] |
|
Since this ticket definitely addresses a different scenario (even if LBUG/"ldlm_lock_decref_internal_nolock()) ASSERTION(lock->l_readers > 0) failed" is the same!!) than the original reported as part of Just to be complete about the differences between the 2 problems for _ _ this is different to the scenario for this ticket's problem, where a race can occur between 2 threads who want to destroy (one to finish corresponding request processing, the other due to overlap rules) the same lock, mainly during F_UNLCK multiple/concurent requests handling. Thus the LBUG occurs because 2nd thread found counter already set to 0. This particular problem show up very easily when running, as you experienced, "Bull's NFS Locktests". This test is available at http://nfsv4.bullopensource.org/tools/tests/locktest.php, and provided as "locktests.tar.gz" distro I attached here. Easy way to reproduce is to run in pthread mode like "locktests -n 10 -T -f <Lustre-File>" on a single+full Lustre node (ie, like after intalling Lustre and running "llmount.sh"). As I said, problem has been fixed in master with http://review.whamcloud.com/7134, b1_8 patch is at http://review.whamcloud.com/7420 now. Also, I would like to change this ticket's title as it is definitely not the same problem/race than the one addressed in Will also add reference to this ticket in |
| Comment by Gabriele Paciucci (Inactive) [ 28/Feb/14 ] |
|
The customer is currently in the process to upgrade to 1.8.9+patch. So please close this ticket. |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/14 ] |
|
b2_4 patch version is at http://review.whamcloud.com/9968. |