[LU-4269] ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed Created: 18/Nov/13  Updated: 27/Feb/14  Resolved: 21/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: MB, mn4
Environment:

Hyperion/LLNL


Attachments: File LU-4268-140123-log.tar.gz     Text File analysis.txt     File iwc.console.gz     Text File iwc106.sysrq.txt     Text File iwc106.sysrq.txt     File iwc106.sysrq.txt.gz     File iwc106.sysrq.txt.gz     Text File iwc48.lctl.log.txt.gz     File log.bz2     Text File vmcore-dmesg.txt    
Severity: 3
Rank (Obsolete): 11723

 Description   

Running SWL - miranda IO test.

2013-11-16 17:27:14 LustreError: 105183:0:(ldlm_lock.c:222:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed:
2013-11-16 17:27:14 LustreError: 105183:0:(ldlm_lock.c:222:ldlm_lock_put()) LBUG
2013-11-16 17:27:14 Pid: 105183, comm: miranda_io
2013-11-16 17:27:14 Nov 16 17:27:14
2013-11-16 17:27:14 Call Trace:
2013-11-16 17:27:14 iwc48 kernel: LustreError: 10518 [<ffffffffa056b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-11-16 17:27:14 3:0:(ldlm_lock.c:222:ldlm_lock_p [<ffffffffa056be97>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-11-16 17:27:14 ut()) ASSERTION( (((( lock))->l_flags & (1ULL << [<ffffffffa080f94d>] ldlm_lock_put+0x44d/0x560 [ptlrpc]
2013-11-16 17:27:14  50)) != 0) ) failed:
2013-11-16 17:27:14 Nov 16 17:27:14 iwc48 kernel: LustreError [<ffffffffa0822d22>] ldlm_cli_cancel_list+0xf2/0x3e0 [ptlrpc]
2013-11-16 17:27:14 : 105183:0:(ldlm_lock.c:222:ldlm_lock_put()) LBUG
2013-11-16 17:27:14  [<ffffffffa0824952>] ldlm_cli_cancel+0x132/0x360 [ptlrpc]
2013-11-16 17:27:14  [<ffffffffa09fdbce>] osc_lock_cancel+0xfe/0x1c0 [osc]
2013-11-16 17:27:14  [<ffffffffa0708525>] cl_lock_cancel0+0x75/0x160 [obdclass]
2013-11-16 17:27:14  [<ffffffffa07090db>] cl_lock_cancel+0x13b/0x140 [obdclass]
2013-11-16 17:27:14  [<ffffffffa070d7e4>] cl_lock_enqueue_wait+0xc4/0x2d0 [obdclass]
2013-11-16 17:27:14  [<ffffffffa070aa17>] ? cl_lock_mutex_put+0x77/0x90 [obdclass]
2013-11-16 17:27:14  [<ffffffffa0a9126b>] lov_lock_enqueue+0x3fb/0x850 [lov]
2013-11-16 17:27:14  [<ffffffffa070c65c>] cl_enqueue_try+0xfc/0x300 [obdclass]
2013-11-16 17:27:14  [<ffffffffa070da5f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
2013-11-16 17:27:14  [<ffffffffa070e6ae>] cl_lock_request+0x7e/0x270 [obdclass]
2013-11-16 17:27:14  [<ffffffffa071391c>] cl_io_lock+0x3cc/0x560 [obdclass]
2013-11-16 17:27:14  [<ffffffffa0713b52>] cl_io_loop+0xa2/0x1b0 [obdclass]
2013-11-16 17:27:14  [<ffffffffa0b0e290>] ll_file_io_generic+0x460/0x610 [lustre]
2013-11-16 17:27:14  [<ffffffffa0701e39>] ? cl_env_get+0x29/0x350 [obdclass]
2013-11-16 17:27:14  [<ffffffffa0b0ecb2>] ll_file_aio_write+0x142/0x2c0 [lustre]
2013-11-16 17:27:14  [<ffffffffa0b0ef9c>] ll_file_write+0x16c/0x2a0 [lustre]
2013-11-16 17:27:14  [<ffffffff81181398>] vfs_write+0xb8/0x1a0
2013-11-16 17:27:14  [<ffffffff81181c91>] sys_write+0x51/0x90
2013-11-16 17:27:14  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
2013-11-16 17:27:14

Lctl dump attached



 Comments   
Comment by Zhenyu Xu [ 11/Dec/13 ]

What I observed in the log

line 140698, ldlm_cli_cancel()
line 140704, ldlm_cli_cancel_local() ldlm lock ffff880793e54180/0xf128b59d838c9d81 res: [0x9edf8f:0x0:0x0].0 flags: 0x284'0000'0000 (LVB_READY|CANCELING|CBPENDING)
line 140756, the ldlm lock is destroyed ldlm_lock_destroy_internal(), which set the destroy flag for the ldlm lock (0x0004000000000000ULL)
line 140771, prepare to issue ldlm lock cancel request
line 140781, ldlm_cancel_pack() lock: ffff880793e54180/0xf128b59d838c9d81 flags: 0x8694'0000'0000 (BL_DONE|KMS_IGNORE|LVB_READY|CANCELING|CANCEL|CBPENDING) the destroy flag disappeared

Cliff, what's the latest git commit is this code? Since I've seen cl_lock_discard_pages() while current master branch does not have this function in it, I need to know the exact code base you were using when you hit this problem.

Comment by Shuichi Ihara (Inactive) [ 16/Dec/13 ]

We saw exactly same problem on multiple clients during mdtest is running. The client was running with the latest master branch.

<0>LustreError: 10212:0:(ldlm_lock.c:222:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: 
<0>LustreError: 10212:0:(ldlm_lock.c:222:ldlm_lock_put()) LBUG
<4>Pid: 10212, comm: mdtest
<4>
<4>Call Trace:
<4> [<ffffffffa05c5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa05c5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa0951e27>] ldlm_lock_put+0x547/0x630 [ptlrpc]
<4> [<ffffffffa09708f2>] ldlm_cli_cancel_list+0xf2/0x3e0 [ptlrpc]
<4> [<ffffffffa0972e0a>] ldlm_prep_elc_req+0x21a/0x4b0 [ptlrpc]
<4> [<ffffffffa1043637>] mdc_unlink+0xe7/0x500 [mdc]
<4> [<ffffffffa12ade7b>] lmv_unlink+0x1db/0x7a0 [lmv]
<4> [<ffffffffa11c4a1e>] ? ll_i2gids+0x2e/0xd0 [lustre]
<4> [<ffffffffa11cb228>] ll_unlink+0x158/0x610 [lustre]
<4> [<ffffffff8118fec0>] vfs_unlink+0xa0/0xf0
<4> [<ffffffff8118ebfa>] ? lookup_hash+0x3a/0x50
<4> [<ffffffff81192265>] do_unlinkat+0xf5/0x1b0
<4> [<ffffffff810d94e2>] ? unroll_tree_refs+0xe2/0x120
<4> [<ffffffff8100c535>] ? math_state_restore+0x45/0x60
<4> [<ffffffff81192336>] sys_unlink+0x16/0x20
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 10212, comm: mdtest Not tainted 2.6.32-358.23.2.el6_lustre.ge975b1c.x86_64 #1
<4>Call Trace:
<4> [<ffffffff8150deec>] ? panic+0xa7/0x16f
<4> [<ffffffffa05c5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa0951e27>] ? ldlm_lock_put+0x547/0x630 [ptlrpc]
<4> [<ffffffffa09708f2>] ? ldlm_cli_cancel_list+0xf2/0x3e0 [ptlrpc]
<4> [<ffffffffa0972e0a>] ? ldlm_prep_elc_req+0x21a/0x4b0 [ptlrpc]
<4> [<ffffffffa1043637>] ? mdc_unlink+0xe7/0x500 [mdc]
<4> [<ffffffffa12ade7b>] ? lmv_unlink+0x1db/0x7a0 [lmv]
<4> [<ffffffffa11c4a1e>] ? ll_i2gids+0x2e/0xd0 [lustre]
<4> [<ffffffffa11cb228>] ? ll_unlink+0x158/0x610 [lustre]
<4> [<ffffffff8118fec0>] ? vfs_unlink+0xa0/0xf0
<4> [<ffffffff8118ebfa>] ? lookup_hash+0x3a/0x50
<4> [<ffffffff81192265>] ? do_unlinkat+0xf5/0x1b0
<4> [<ffffffff810d94e2>] ? unroll_tree_refs+0xe2/0x120
<4> [<ffffffff8100c535>] ? math_state_restore+0x45/0x60
<4> [<ffffffff81192336>] ? sys_unlink+0x16/0x20
<4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

we have crashdump as well. Please let me know it helps for debugging.

Comment by Zhenyu Xu [ 16/Dec/13 ]

Hi Ihara-San,

Do you have debug log of it?

Comment by Shuichi Ihara (Inactive) [ 16/Dec/13 ]

Hi Bobijam,

vmware-dmesg.txt attached. Please let me know if you need vmware itself. I can upload it as well.

Comment by Zhenyu Xu [ 16/Dec/13 ]

besides dmesg, do you have debug log? (lctl dk)

Comment by Cliff White (Inactive) [ 19/Dec/13 ]

The test was run Nov 18th, build information was not in the version: build: 2.5.51--PRISTINE-2.6.32-358.23.2.el6_lustre.x86_64 - so it was likely the 2.5.51 tag.

Comment by Cliff White (Inactive) [ 02/Jan/14 ]

Sysrq -t from hung client.

Comment by Cliff White (Inactive) [ 03/Jan/14 ]

Attempting to attach file again

Comment by Jinshan Xiong (Inactive) [ 07/Jan/14 ]

From the feedback from Ihara, it looks like there exists race condition to access the flags of dlm lock.

Let's create a debug patch to add a line into ldlm_lock_destroy_internal() to print out what's exact state of the lock

Comment by Zhenyu Xu [ 07/Jan/14 ]

debug patch at http://review.whamcloud.com/8759, please unset panic_on_lbug (echo 0 > /proc/sys/lnet/panic_on_lbug) and run this debug patched version so that we can collect the debug log when this assertion is hit.

Comment by Li Xi (Inactive) [ 13/Jan/14 ]

I've post a patch for this. However, it is not sure whether it fixes this problem, since we don't have a reliable way to reproduce this problem yet.
http://review.whamcloud.com/#/c/8772/

Comment by Zhenyu Xu [ 13/Jan/14 ]

Thank you LiXi, but the iwc48.lctl.log.txt shows in line 140770

00010000:00000001:0.0:1384651634.269628:0:105183:0:(ldlm_request.c:1760:ldlm_prepare_lru_list()) Process leaving (rc=0 : 0 : 0)

which shows that when we canceled the problematic lock, ldlm_prepare_lru_list() didn't find other lru locks, so I'd think your patch would possibly not affect this issue.

Comment by Li Xi (Inactive) [ 13/Jan/14 ]

Ah, thank you for pointing that out, Zhenyu. I was just wondering why we can call ldlm_lock_remove_from_lru_nolock() without calling lock_res_and_lock() first. Is there anything I missed?

Comment by Zhenyu Xu [ 13/Jan/14 ]

ldlm_prepare_lru_list() is to collect unused lock from namespace::ns_unused_list, where only unused locks are linked into it (please refer to ldlm_lock_decref_internal()->ldlm_lock_add_to_lru()), so in ldlm_prepare_lru_list() we can call ldlm_lock_remove_from_lru_nolock() w/o locking the res and lock since the lock has no readers nor writers referring to it.

Comment by Li Xi (Inactive) [ 13/Jan/14 ]

Ah, thank you for the explaination! Now I understand.

Comment by Jinshan Xiong (Inactive) [ 14/Jan/14 ]

how often do you guys see this problem? If it's quite often, please apply Bobijam's patch and get some logs; otherwise, I'd like to drop the priority of this ticket.

Jinshan

Comment by Shuichi Ihara (Inactive) [ 14/Jan/14 ]

Jinshan, Bobijam,

Yes, we applied Bobijam's debug patch at the customer site, but didn't reproduce problem yet. I will post logs once we can hit same issue.

Comment by Shuichi Ihara (Inactive) [ 14/Jan/14 ]

We got crash again with debug patch. Attached is diagnostic information of crach.

Comment by Zhenyu Xu [ 14/Jan/14 ]

do you collect any lctl log from it?

Comment by Shuichi Ihara (Inactive) [ 14/Jan/14 ]

no, unfortunately.. client got panic even we have panic_on_lbug=0 setting.. I don't know why we got panic after LBUG..

Comment by Patrick Farrell (Inactive) [ 23/Jan/14 ]

We reproduced this running master + a client lock debug patch on SLES11SP3 while reproducing a client lock bug. I believe we hit this with a special debug flag for cl_lock_trace turned on (So all calls to cl_lock_trace are being logged, but little or nothing else.)

I should be able to make the dump and logs available if you think it would be helpful. I'm not sure about reproducing it, with different debug or at all - We run the test I'm using as a reproducer a lot and this is the first we've seen this particular bug. (The test in question is mmstress from the Linux Test Project.)

Comment by Zhenyu Xu [ 24/Jan/14 ]

yes please upload logs Patrick, thank you.

Comment by Patrick Farrell (Inactive) [ 27/Jan/14 ]

Console log & Lustre log.
System is master from January 23rd on SLES11SP3, with a special debug patch that enables all calls to cl_lock_trace.

Comment by Patrick Farrell (Inactive) [ 27/Jan/14 ]

Logs attached as noted. I've also uploaded the dump (with logs, vmlinuz, ko files, etc) to ftp.whamcloud.com at uploads/LU-4269/LU-4269-140123.tar.gz

System is master from January 23rd on SLES11SP3, with a special debug patch that enables all calls to cl_lock_trace.

Also, we reproduced this by running mmstress from the Linux test project. We ran multiple copies on multiple nodes. (This also hits several cl_lock related bugs, so that may make this harder to find. Those problems are more common than this one when running mmstress.)

Comment by Zhenyu Xu [ 29/Jan/14 ]

Hi Jinshan,

When a top lock enqueue failed, top lock will go to cl_lock_hold_release() and then unhold a lovsub lock, which will go to osc_lock_cancel(), if at this time the osc lock is used by another IO, then the ldlm lock is not canceled.

osc_lock_cancel()
                do_cancel = (dlmlock->l_readers == 0 &&
                             dlmlock->l_writers == 0);
                ldlm_set_cbpending(dlmlock);
                unlock_res_and_lock(dlmlock);
                if (do_cancel)
                        result = ldlm_cli_cancel(&olck->ols_handle, LCF_ASYNC);
}

and the osc lock is detached from the lobsub lock in cl_lock_hold_release()

        if (lock->cll_holds == 0) {
                CL_LOCK_ASSERT(lock->cll_state != CLS_HELD, env, lock);
                if (lock->cll_descr.cld_mode == CLM_PHANTOM ||
                    lock->cll_descr.cld_mode == CLM_GROUP ||
                    lock->cll_state != CLS_CACHED)
                        /*
                         * If lock is still phantom or grouplock when user is
                         * done with it---destroy the lock.
                         */
                        lock->cll_flags |= CLF_CANCELPEND|CLF_DOOMED;
                if (lock->cll_flags & CLF_CANCELPEND) {
                        lock->cll_flags &= ~CLF_CANCELPEND;
                        cl_lock_cancel0(env, lock);
                }
                if (lock->cll_flags & CLF_DOOMED) {
                        /* no longer doomed: it's dead... Jim. */
                        lock->cll_flags &= ~CLF_DOOMED;
                        cl_lock_delete0(env, lock);
                }
        }

Is it possible that the ldlm lock somehow missed the cancel process somewhere so that its last dereference is still missing the destroy flag?

Comment by Jinshan Xiong (Inactive) [ 31/Jan/14 ]

I happen to be able to reproduce this issue steadily while I was working another one.

After I applied the following patch:

diff --git a/lustre/ldlm/ldlm_lock.c b/lustre/ldlm/ldlm_lock.c
index 214ee8c..2e69ad0 100644
--- a/lustre/ldlm/ldlm_lock.c
+++ b/lustre/ldlm/ldlm_lock.c
@@ -372,12 +372,14 @@ int ldlm_lock_destroy_internal(struct ldlm_lock *lock)
                 LBUG();
         }
 
+       LDLM_DEBUG(lock, "destroy");
        if (ldlm_is_destroyed(lock)) {
                LASSERT(cfs_list_empty(&lock->l_lru));
                EXIT;
                return 0;
        }
        ldlm_set_destroyed(lock);
+       LDLM_DEBUG(lock, "destroy");
 
        if (lock->l_export && lock->l_export->exp_lock_hash) {
                /* NB: it's safe to call cfs_hash_del() even lock isn't
@@ -2169,6 +2171,7 @@ void ldlm_lock_cancel(struct ldlm_lock *lock)
 
         ldlm_resource_unlink_lock(lock);
         ldlm_lock_destroy_nolock(lock);
+       LDLM_DEBUG(lock, "destroyed");
 
         if (lock->l_granted_mode == lock->l_req_mode)
                 ldlm_pool_del(&ns->ns_pool, lock);
diff --git a/lustre/ldlm/ldlm_request.c b/lustre/ldlm/ldlm_request.c
index e140e0f..def3ee5 100644
--- a/lustre/ldlm/ldlm_request.c
+++ b/lustre/ldlm/ldlm_request.c
@@ -1141,6 +1141,7 @@ static __u64 ldlm_cli_cancel_local(struct ldlm_lock *lock)
                         rc = LDLM_FL_LOCAL_ONLY;
                 }
                 ldlm_lock_cancel(lock);
+               LDLM_DEBUG(lock, "client-side cancel completed, rc = %lld", rc);
         } else {
                 if (ns_is_client(ldlm_lock_to_ns(lock))) {
                         LDLM_ERROR(lock, "Trying to cancel local lock");
diff --git a/lustre/llite/llite_internal.h b/lustre/llite/llite_internal.h
index 92b278e..e9e7053 100644
--- a/lustre/llite/llite_internal.h
+++ b/lustre/llite/llite_internal.h
@@ -1469,7 +1469,7 @@ static inline void cl_isize_unlock(struct inode *inode)
 
 static inline void cl_isize_write_nolock(struct inode *inode, loff_t kms)
 {
-       LASSERT(down_trylock(&ll_i2info(inode)->lli_size_sem) != 0);
+       //LASSERT(down_trylock(&ll_i2info(inode)->lli_size_sem) != 0);
        i_size_write(inode, kms);
 }

 diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c
index df421d6..ae53f3d 100644
--- a/lustre/llite/llite_lib.c
+++ b/lustre/llite/llite_lib.c
@@ -1700,25 +1700,29 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs)
 
 void ll_inode_size_lock(struct inode *inode)
 {
+#if 0
         struct ll_inode_info *lli;
 
         LASSERT(!S_ISDIR(inode->i_mode));
 
         lli = ll_i2info(inode);
         LASSERT(lli->lli_size_sem_owner != current);
-       down(&lli->lli_size_sem);
+       //down(&lli->lli_size_sem);
         LASSERT(lli->lli_size_sem_owner == NULL);
         lli->lli_size_sem_owner = current;
+#endif
 }
 
 void ll_inode_size_unlock(struct inode *inode)
 {
+#if 0
         struct ll_inode_info *lli;
 
         lli = ll_i2info(inode);
         LASSERT(lli->lli_size_sem_owner == current);
         lli->lli_size_sem_owner = NULL;
-       up(&lli->lli_size_sem);
+       //up(&lli->lli_size_sem);
+#endif
 }
 
 void ll_update_inode(struct inode *inode, struct lustre_md *md)
diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c
index d440da9..b8c2e04 100644
--- a/lustre/obdclass/cl_lock.c
+++ b/lustre/obdclass/cl_lock.c
@@ -493,6 +493,7 @@ static struct cl_lock *cl_lock_lookup(const struct lu_env *env,
                                       const struct cl_io *io,
                                       const struct cl_lock_descr *need)
 {
+#if 0
         struct cl_lock          *lock;
         struct cl_object_header *head;
 
@@ -518,6 +519,7 @@ static struct cl_lock *cl_lock_lookup(const struct lu_env *env,
                         RETURN(lock);
                 }
         }
+#endif
         RETURN(NULL);
 }
 
@@ -1964,6 +1966,8 @@ static struct cl_lock *cl_lock_hold_mutex(const struct lu_env *env,
                         cl_lock_hold_mod(env, lock, +1);
                         lu_ref_add(&lock->cll_holders, scope, source);
                         lu_ref_add(&lock->cll_reference, scope, source);
+
+                       cl_lock_cancel(env, lock);
                         break;
                 }
                 cl_lock_mutex_put(env, lock);

And uses 32 processes to read the same file on the same time. The reading process hit the LBUG at the same place:

Call Trace:
 [<ffffffffa03fd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa03fde97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0774fb7>] ldlm_lock_put+0x547/0x630 [ptlrpc]
 [<ffffffffa0793df2>] ldlm_cli_cancel_list+0xf2/0x3e0 [ptlrpc]
 [<ffffffffa0795a22>] ldlm_cli_cancel+0x132/0x360 [ptlrpc]
 [<ffffffffa0a8e72e>] osc_lock_cancel+0x10e/0x1d0 [osc]
 [<ffffffffa059c045>] cl_lock_cancel0+0x75/0x160 [obdclass]
 [<ffffffffa059d9d6>] cl_lock_hold_release+0x1c6/0x2a0 [obdclass]
 [<ffffffffa059ecc7>] cl_lock_unhold+0x37/0x130 [obdclass]
 [<ffffffffa0af8a68>] lov_sublock_release+0x1a8/0x280 [lov]
 [<ffffffffa0afa4fe>] lov_lock_unuse+0x1be/0x290 [lov]
 [<ffffffffa059c205>] cl_unuse_try_internal+0x55/0xe0 [obdclass]
 [<ffffffffa059f0a9>] cl_unuse_try+0x199/0x320 [obdclass]
 [<ffffffffa059f267>] cl_unuse_locked+0x37/0x1a0 [obdclass]
 [<ffffffffa059f40e>] cl_unuse+0x3e/0x120 [obdclass]
 [<ffffffffa05a5e57>] cl_io_unlock+0x127/0x2b0 [obdclass]
 [<ffffffffa05a6aed>] cl_io_loop+0xcd/0x1b0 [obdclass]
 [<ffffffffa0f7c9d6>] ll_file_io_generic+0x2b6/0x710 [lustre]
 [<ffffffffa0596db9>] ? cl_env_get+0x29/0x350 [obdclass]
 [<ffffffffa0f7cf6f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
 [<ffffffffa0f7d42c>] ll_file_read+0x16c/0x2a0 [lustre]
 [<ffffffff81181a95>] vfs_read+0xb5/0x1a0
 [<ffffffff81181bd1>] sys_read+0x51/0x90
 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

LustreError: dumping log to /tmp/lustre-log.1391207221.6283

However, from the log, I can see this:

00000020:00010000:5.0:1391207220.451167:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) unuse lock: ffff88081acf48e8@(2 ffff8807e6fd0080 1 3 0 1 1 2)(ffff88080d4a0db8/0/1) at cl_unuse_try():1363
00000020:00010000:5.0:1391207220.451172:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) enclosure lock: ffff88081acf4738@(1 (null) 0 3 0 1 1 2)(ffff88080d4a0db8/1/1) at cl_lock_enclosure():1691
00000020:00010000:5.0:1391207220.451175:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) enclosure lock: ffff88081acf48e8@(3 ffff8807e6fd0080 1 4 0 2 1 2)(ffff88080d4a0db8/0/2) at cl_lock_enclosure():1691
00000020:00010000:5.0:1391207220.451178:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) unuse lock: ffff88081acf4738@(2 ffff8807e6fd0080 1 3 0 1 1 2)(ffff88080d4a0c68/1/0) at cl_unuse_try():1363
00010000:00010000:5.0:1391207220.451185:0:6294:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 4/1,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x10020000000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.451191:0:6294:0:(ldlm_lock.c:920:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 3/0,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x10020000000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00000020:00010000:5.0:1391207220.451197:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) hold release lock: ffff88081acf4738@(3 ffff8807e6fd0080 1 5 0 2 0 2)(ffff88080d4a0c68/1/0) at cl_lock_hold_release():903
00000020:00010000:5.0:1391207220.451200:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) hold release lock: ffff88081acf4738@(2 ffff8807e6fd0080 1 5 0 1 0 2)(ffff88080d4a0db8/1/2) at cl_lock_hold_release():903
00000080:00200000:5.0:1391207220.451207:0:6294:0:(vvp_io.c:1163:vvp_io_init()) [0x200000400:0x2:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
00000080:00200000:5.0:1391207220.452460:0:6294:0:(vvp_io.c:153:vvp_io_fini()) [0x200000400:0x2:0x0] ignore/verify layout 1/0, layout version 0 restore needed 0
00010000:00010000:5.0:1391207220.452485:0:6294:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 3/0,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x28400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00000020:00010000:5.0:1391207220.452493:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) cancel lock: ffff88081acf4738@(3 ffff8807e6fd0080 2 5 0 0 0 1)(ffff8807e0547920/1/1) at cl_lock_cancel():1836
00000020:00010000:5.0:1391207220.452496:0:6294:0:(cl_lock.c:151:cl_lock_trace0()) delete lock: ffff88081acf4738@(3 ffff8807e6fd0080 2 5 0 0 0 1)(ffff8807e0547920/1/1) at cl_lock_delete():1783
00010000:00010000:5.0:1391207220.452523:0:6294:0:(ldlm_lock.c:375:ldlm_lock_destroy_internal()) ### destroy ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 2/0,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x10869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.452529:0:6294:0:(ldlm_lock.c:382:ldlm_lock_destroy_internal()) ### destroy ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 2/0,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x10869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.452534:0:6294:0:(ldlm_lock.c:2174:ldlm_lock_cancel()) ### destroyed ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 1/0,0 mode: PR/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x10869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.452544:0:6294:0:(ldlm_request.c:1144:ldlm_cli_cancel_local()) ### client-side cancel completed, rc = 549755813888 ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 1/0,0 mode: --/PR res: [0x2:0x0:0x0].0 rrc: 15 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.452554:0:6294:0:(ldlm_request.c:1187:ldlm_cancel_pack()) ### packing ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 1/0,0 mode: --/PR res: [0x2:0x0:0x0].0 rrc: 14 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00010000:5.0:1391207220.452559:0:6294:0:(ldlm_request.c:1191:ldlm_cancel_pack()) 1 locks packed
00010000:00010000:5.0:1391207220.452565:0:6294:0:(ldlm_lock.c:219:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-OST0000-osc-ffff8808198eb000 lock: ffff8808152de340/0x6d44b209b7d2fe79 lrc: 0/0,0 mode: --/PR res: [0x2:0x0:0x0].0 rrc: 14 type: EXT [0->18446744073709551615] (req 1146880->1150975) flags: 0x869400000000 nid: local remote: 0x6d44b209b7d2fe95 expref: -99 pid: 6294 timeout: 0 lvb_type: 1
00010000:00040000:5.0:1391207220.452570:0:6294:0:(ldlm_lock.c:222:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: 
00010000:00040000:5.0:1391207220.577549:0:6294:0:(ldlm_lock.c:222:ldlm_lock_put()) LBUG

Both line 375 and 382 in were printed. and the source is as follows:

 361 int ldlm_lock_destroy_internal(struct ldlm_lock *lock)
 362 {
 363         ENTRY;
 364 
 365         if (lock->l_readers || lock->l_writers) {
 366                 LDLM_ERROR(lock, "lock still has references");
 367                 LBUG();
 368         }
 369 
 370         if (!cfs_list_empty(&lock->l_res_link)) {
 371                 LDLM_ERROR(lock, "lock still on resource");
 372                 LBUG();
 373         }
 374 
 375         LDLM_DEBUG(lock, "destroy");
 376         if (ldlm_is_destroyed(lock)) {
 377                 LASSERT(cfs_list_empty(&lock->l_lru));
 378                 EXIT;
 379                 return 0;
 380         }
 381         ldlm_set_destroyed(lock);
 382         LDLM_DEBUG(lock, "destroy");

However, the destroyed flag was failed to set. This is beyond of my understanding, and we should escalate this issue.

Comment by Jinshan Xiong (Inactive) [ 31/Jan/14 ]

Full log is here

Comment by Oleg Drokin [ 31/Jan/14 ]

I think the prime suspect owuld be some place that updates lock flags without taking lock spinlock. There are several such places.

Comment by Jinshan Xiong (Inactive) [ 01/Feb/14 ]

I confirmed that changing the dlm flag operation to bitops fixed the problem.

Check the attachment for the concept patch. The real patch should be worked out by modifying lustre/contrib/bit-masks/lustre_dlm_flags.tpl.

diff --git a/lustre/include/lustre_dlm.h b/lustre/include/lustre_dlm.h
index 98e03c9..bf4fb65 100644
--- a/lustre/include/lustre_dlm.h
+++ b/lustre/include/lustre_dlm.h
@@ -772,7 +772,7 @@ struct ldlm_lock {
         * Lock state flags. Protected by lr_lock.
         * \see lustre_dlm_flags.h where the bits are defined.
         */
-       __u64                   l_flags;
+       volatile __u64          l_flags;
 
        /**
         * Lock r/w usage counters.
diff --git a/lustre/include/lustre_dlm_flags.h b/lustre/include/lustre_dlm_flags.h
index 283546d..4be532f 100644
--- a/lustre/include/lustre_dlm_flags.h
+++ b/lustre/include/lustre_dlm_flags.h
@@ -57,91 +57,91 @@
 
 /** extent, mode, or resource changed */
 #define LDLM_FL_LOCK_CHANGED            0x0000000000000001ULL // bit   0
-#define ldlm_is_lock_changed(_l)        LDLM_TEST_FLAG(( _l), 1ULL <<  0)
-#define ldlm_set_lock_changed(_l)       LDLM_SET_FLAG((  _l), 1ULL <<  0)
-#define ldlm_clear_lock_changed(_l)     LDLM_CLEAR_FLAG((_l), 1ULL <<  0)
+#define ldlm_is_lock_changed(_l)        LDLM_TEST_FLAG(( _l),  0)
+#define ldlm_set_lock_changed(_l)       LDLM_SET_FLAG((  _l),  0)
+#define ldlm_clear_lock_changed(_l)     LDLM_CLEAR_FLAG((_l),  0)

......
......

 /** test for ldlm_lock flag bit set */
-#define LDLM_TEST_FLAG(_l, _b)    (((_l)->l_flags & (_b)) != 0)
+#define LDLM_TEST_FLAG(_l, _b)    test_bit(_b, (volatile unsigned long *)&(_l)->l_flags)
 
 /** multi-bit test: are any of mask bits set? */
 #define LDLM_HAVE_MASK(_l, _m)    (((_l)->l_flags & LDLM_FL_##_m##_MASK) != 0)
 
 /** set a ldlm_lock flag bit */
-#define LDLM_SET_FLAG(_l, _b)     ((_l)->l_flags |= (_b))
+#define LDLM_SET_FLAG(_l, _b)     set_bit(_b, (volatile unsigned long *)&(_l)->l_flags)
 
 /** clear a ldlm_lock flag bit */
-#define LDLM_CLEAR_FLAG(_l, _b)   ((_l)->l_flags &= ~(_b))
+#define LDLM_CLEAR_FLAG(_l, _b)   clear_bit(_b, (volatile unsigned long *)&(_l)->l_flags)
 
 /** @} subgroup */
 /** @} group */
Comment by Andreas Dilger [ 01/Feb/14 ]

Jinshan, according to the comment, the flags should be protected by lr_flags. Are there flag changes not protected by this lock?

Also, I have some concern that using set_bit() will not set the same bits as the Lustre network protocol on all architectures. Do you know if there is a specific bit ordering used in all cases?

Comment by Jinshan Xiong (Inactive) [ 01/Feb/14 ]

The flags should be protected by ldlm_lock::l_lock. When I saw this issue, my first reaction was to check if there are unprotected writing to l_flags and the only place is:

/**
 * Removes LDLM lock \a lock from LRU. Assumes LRU is already locked.
 */
int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock)
{
        int rc = 0;
        if (!cfs_list_empty(&lock->l_lru)) {
                struct ldlm_namespace *ns = ldlm_lock_to_ns(lock);

                LASSERT(lock->l_resource->lr_type != LDLM_FLOCK);
                cfs_list_del_init(&lock->l_lru);
                ldlm_clear_skipped(lock);
                LASSERT(ns->ns_nr_unused > 0);
                ns->ns_nr_unused--;
                rc = 1;
        }
        return rc;
}

However, it was protected by ns_lock and if the lock is being destroyed, it must have been taken out of LRU list.

Comment by Li Xi (Inactive) [ 02/Feb/14 ]

Hi Jinshan,

According to your test results, this problem really looks like a race problem. Would you please try following patch to check that the lock is always held when clearing/setting the flag?

http://review.whamcloud.com/#/c/8772/

Comment by Jinshan Xiong (Inactive) [ 02/Feb/14 ]

Hi Li Xi,

That looks true. Somehow I looked at the code wrong. The dlm lock was set destroyed flag before taking out of LRU, so ldlm_clear_skipped() in ldlm_lock_remove_from_lru_nolock() is indeed the root cause of this problem.

Your patch will cause deadlock because we should take res lock first and then ns_lock. My idea to fix this problem is to move ldlm_clear_skipped() out of ldlm_lock_remove_from_lru_nolock() into ldlm_lock_add_to_lru_nolock(). I think this will fix the problem as well.

Comment by Li Xi (Inactive) [ 02/Feb/14 ]

Ah, yeah, the right lock order is lock->l_lock, res->lr_lock, ns->ns_lock. Thanks for poiting that error out.

We are able to reproduce the problem steadily too. If you could push a patch, we can check whether it helps. I feel that there are more than one place where lock->l_flags is changed without the protection of lock->l_lock.

Comment by Patrick Farrell (Inactive) [ 12/Feb/14 ]

Li, Jinshan - Are either of you able to share how you're reproducing this?

Comment by Jinshan Xiong (Inactive) [ 12/Feb/14 ]

The root cause of this issue is pretty clear.

Bobijam, can you please create a patch for this?

Comment by Zhenyu Xu [ 13/Feb/14 ]

updated patch http://review.whamcloud.com/#/c/8772/

Comment by Jodi Levi (Inactive) [ 21/Feb/14 ]

Patch landed to Master. Please reopen ticket if more work is needed.

Comment by Bob Glossman (Inactive) [ 21/Feb/14 ]

backport to b2_5:
http://review.whamcloud.com/9346

Generated at Sat Feb 10 01:41:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.