[LU-2177] ldlm_flock_completion_ast causes LBUG because of a race Created: 15/Oct/12  Updated: 04/Jan/16  Resolved: 12/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.3, Lustre 1.8.8
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Minor
Reporter: Hiroya Nozaki Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: patch
Environment:

MDSx1, OSSx1(OSTx3), Clientx1, running an application program using flock


Issue Links:
Related
is related to LU-7626 ldlm_resource_complain()) echotgt-ech... Resolved
Severity: 3
Rank (Obsolete): 5215

 Description   

I believe that I have found a possible race condition between ldlm_cli_enqueue_fini() and cleanup_resource() when these handle flocks. The case is below:

thread A: flock
1) In ldlm_cli_enqueue_fini(), ldlm_lock_enqueue() has returned in success. it means that the lock is being handled now has been registered with a list in a ldlm_resouce, lr_granted or lr_waiting. And it also means that the spin lock which protects the ldlm_resouce of flock has been released already because ldlm_lock_enqueue calls unlock_res_and_lock() right before its returning.

thread B: evict
2) For some reason, an evict process has been driven and will call cleanup_resouce() with the same lock which is now being handled by ldlm_cli_enqueue_fini() because the lock has been registered the list in the ldlm_resouce of flock and the lock which protects the ldlm_resouce has been released already.

thread B: evict
3) Since the lock is now being handled in a success way, l_writers or l_readers must be more than 1. So, after checking these variables and relesing the res lock, cleanup_resouce() will call ldlm_flock_completion_ast().

cleanup_resource()
static void cleanup_resource(struct ldlm_resource *res, cfs_list_t *q,
                             int flags)
{

                ...

                /* Set CBPENDING so nothing in the cancellation path
                 * can match this lock */
                lock->l_flags |= LDLM_FL_CBPENDING;
                lock->l_flags |= LDLM_FL_FAILED;
                lock->l_flags |= flags;

                /* ... without sending a CANCEL message for local_only. */
                if (local_only)
                        lock->l_flags |= LDLM_FL_LOCAL_ONLY;

                if (local_only && (lock->l_readers || lock->l_writers)) {
                        /* This is a little bit gross, but much better than the
                         * alternative: pretend that we got a blocking AST from
                         * the server, so that when the lock is decref'd, it
                         * will go away ... */
                        unlock_res(res);
                        LDLM_DEBUG(lock, "setting FL_LOCAL_ONLY");
                        if (lock->l_completion_ast)
                                lock->l_completion_ast(lock, 0, NULL);
                        LDLM_LOCK_RELEASE(lock);
                        continue;
                }

                ...
}

thread A: flock
4) After returning ldlm_lock_enqueue(), ldlm_cli_enqueue_fini() will also call ldlm_flock_compleiton_ast() because there is no condition-statement between ldlm_lock_enqueue() and ldlm_flock_completion_ast().

ldlm_cli_enqueue_fini()
int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
                          ldlm_type_t type, __u8 with_policy, ldlm_mode_t mode,
                          int *flags, void *lvb, __u32 lvb_len,
                          struct lustre_handle *lockh,int rc)
{

        ...

        if (!is_replay) {
                rc = ldlm_lock_enqueue(ns, &lock, NULL, flags);
                if (lock->l_completion_ast != NULL) {
                        int err = lock->l_completion_ast(lock, *flags, NULL);
                        if (!rc)
                                rc = err;
                        if (rc)
                                cleanup_phase = 1;
                }
        }

        ...

}

Result: Both threads are possible
5) Since the l_flags of the lock has been set by cleanup_resouce() in order to call ldlm_lock_decref_internal() in ldlm_flock_completion_ast(), ldlm_lock_decref_internal() will be called twice and this race will cause LBUG() in ldlm_lock_decref_internal()

ldlm_flock_completion_ast()
ldlm_flock_completion_ast(struct ldlm_lock *lock, int flags, void *data)
{
        cfs_flock_t                    *getlk = lock->l_ast_data;
        struct obd_device              *obd;
        struct obd_import              *imp = NULL;
        struct ldlm_flock_wait_data     fwd;
        struct l_wait_info              lwi;
        ldlm_error_t                    err;
        int                             rc = 0;
        ENTRY;

        CDEBUG(D_DLMTRACE, "flags: 0x%x data: %p getlk: %p\n",
               flags, data, getlk);

        /* Import invalidation. We need to actually release the lock
         * references being held, so that it can go away. No point in
         * holding the lock even if app still believes it has it, since
         * server already dropped it anyway. Only for granted locks too. */
        if ((lock->l_flags & (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) ==
            (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) {
                if (lock->l_req_mode == lock->l_granted_mode &&
                    lock->l_granted_mode != LCK_NL &&
                    NULL == data)
                        ldlm_lock_decref_internal(lock, lock->l_req_mode);

                /* Need to wake up the waiter if we were evicted */
                cfs_waitq_signal(&lock->l_waitq);
                RETURN(0);
        }
        ....
}

the blow is the dlmtrace. it actually says that ldlm_lock_decref_interanl_nolock() was done to the same lock twice, althought it rw refcount has reached zero.

5827: flock
5829: evict

00010000:00000001:13:1350018852.706543:0:5827:0:(ldlm_request.c:409:ldlm_cli_enqueue_fini()) Process entered
00010000:00000001:13:1350018852.706543:0:5827:0:(ldlm_lock.c:451:__ldlm_handle2lock()) Process entered
00000020:00000001:13:1350018852.706544:0:5827:0:(lustre_handles.c:172:class_handle2object()) Process entered
00000020:00000001:13:1350018852.706545:0:5827:0:(lustre_handles.c:195:class_handle2object()) Process leaving (rc=18446604488316264960 : -139585393286656 : ffff810c3e37a200)
00010000:00000001:13:1350018852.706548:0:5827:0:(ldlm_resource.c:1105:ldlm_resource_putref_internal()) Process entered
00010000:00000001:13:1350018852.706549:0:5827:0:(ldlm_resource.c:1118:ldlm_resource_putref_internal()) Process leaving (rc=0 : 0 : 0)
00010000:00000001:13:1350018852.706551:0:5827:0:(ldlm_lock.c:502:__ldlm_handle2lock()) Process leaving
00010000:00000001:13:1350018852.706552:0:5827:0:(ldlm_lock.c:1217:ldlm_lock_enqueue()) Process entered
00010000:00000001:13:1350018861.719601:0:5827:0:(ldlm_lock.c:895:ldlm_grant_lock()) Process entered
00010000:00000001:13:1350018861.719602:0:5827:0:(ldlm_lock.c:914:ldlm_grant_lock()) Process leaving
00010000:00000001:13:1350018861.719603:0:5827:0:(ldlm_lock.c:1329:ldlm_lock_enqueue()) Process leaving via out (rc=0 : 0 : 0)
00010000:00000001:13:1350018861.719604:0:5827:0:(ldlm_flock.c:525:ldlm_flock_completion_ast()) Process entered
00010000:00010000:4:1350018861.719604:0:5829:0:(ldlm_resource.c:574:cleanup_resource()) ### setting FL_LOCAL_ONLY ns: fefs-MDT0000-mdc-ffff8105e9bbbc00 lock: ffff810c3e37a200/0xee16fc0aef26bb17 lrc: 5/0,1 mode: PW/PW res: 12/706300508 rrc: 2 type: FLK pid: 5827 [0->16777215] flags: 0x2000c10 remote: 0xa520aeaebdefc517 expref: -99 pid: 5827 timeout: 0
00010000:00010000:13:1350018861.719605:0:5827:0:(ldlm_flock.c:528:ldlm_flock_completion_ast()) flags: 0x40000 data: 0000000000000000 getlk: ffff810c3ecc1ad8
00010000:00000001:13:1350018861.719607:0:5827:0:(ldlm_lock.c:659:ldlm_lock_decref_internal()) Process entered

1
==================================================
00010000:00010000:13:1350018861.719607:0:5827:0:(ldlm_lock.c:637:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PW) ns: fefs-MDT0000-mdc-ffff8105e9bbbc00 lock: ffff810c3e37a200/0xee16fc0aef26bb17 lrc: 5/0,1 mode: PW/PW res: 12/706300508 rrc: 2 type: FLK pid: 5827 [0->16777215] flags: 0x2000c10 remote: 0xa520aeaebdefc517 expref: -99 pid: 5827 timeout: 0
==================================================

00010000:00000001:4:1350018861.719611:0:5829:0:(ldlm_flock.c:525:ldlm_flock_completion_ast()) Process entered
00010000:00000001:13:1350018861.719612:0:5827:0:(ldlm_lock.c:152:ldlm_lock_put()) Process entered
00010000:00010000:4:1350018861.719612:0:5829:0:(ldlm_flock.c:528:ldlm_flock_completion_ast()) flags: 0x0 data: 0000000000000000 getlk: ffff810c3ecc1ad8
00010000:00000001:13:1350018861.719613:0:5827:0:(ldlm_lock.c:182:ldlm_lock_put()) Process leaving
00010000:00010000:13:1350018861.719613:0:5827:0:(ldlm_lock.c:683:ldlm_lock_decref_internal()) ### final decref done on cbpending lock ns: fefs-MDT0000-mdc-ffff8105e9bbbc00 lock: ffff810c3e37a200/0xee16fc0aef26bb17 lrc: 4/0,0 mode: PW/PW res: 12/706300508 rrc: 2 type: FLK pid: 5827 [0->16777215] flags: 0x2000c10 remote: 0xa520aeaebdefc517 expref: -99 pid: 5827 timeout: 0
00010000:00000001:13:1350018861.719617:0:5827:0:(ldlm_lock.c:206:ldlm_lock_remove_from_lru()) Process entered
00010000:00000001:13:1350018861.719618:0:5827:0:(ldlm_lock.c:210:ldlm_lock_remove_from_lru()) Process leaving
00010000:00000001:13:1350018861.719619:0:5827:0:(ldlm_lockd.c:1763:ldlm_bl_to_thread()) Process entered
00010000:00000001:4:1350018861.719619:0:5829:0:(ldlm_lock.c:659:ldlm_lock_decref_internal()) Process entered

2
==================================================
00010000:00010000:4:1350018861.719619:0:5829:0:(ldlm_lock.c:637:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PW) ns: fefs-MDT0000-mdc-ffff8105e9bbbc00 lock: ffff810c3e37a200/0xee16fc0aef26bb17 lrc: 5/0,0 mode: PW/PW res: 12/706300508 rrc: 2 type: FLK pid: 5827 [0->16777215] flags: 0x2000c10 remote: 0xa520aeaebdefc517 expref: -99 pid: 5827 timeout: 0
==================================================


 Comments   
Comment by Hiroya Nozaki [ 18/Oct/12 ]

patch for b1_8
http://review.whamcloud.com/#change,4290

Comment by Hiroya Nozaki [ 18/Oct/12 ]

patch for master
http://review.whamcloud.com/#change,4291

Comment by Hiroya Nozaki [ 18/Oct/12 ]

I uploaded the patch to fix the problem for b1_8 and master branches.
So, could someone check and reviews them ? thank you.

Comment by Vitaly Fertman [ 18/Apr/14 ]

CODE: http://review.whamcloud.com/10005

Comment by Emoly Liu [ 12/Jun/14 ]

The patch http://review.whamcloud.com/10005 landed to 2.6 .

Generated at Sat Feb 10 01:23:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.