Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4797

ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.2
    • None
    • 3
    • 13202

    Description

      Hi,

      After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:

      LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed:
      LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG
      Pid: 4089, comm: %%AQC.P.I.O
      
      Call Trace:
       [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov]
       [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass]
       [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass]
       [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass]
       [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov]
       [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass]
       [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
       [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass]
       [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass]
       [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass]
       [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre]
       [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre]
       [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre]
       [<ffffffff811895d8>] vfs_write+0xb8/0x1a0
       [<ffffffff81189ed1>] sys_write+0x51/0x90
       [<ffffffff81091039>] ? sys_times+0x29/0x70
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.

      Please ask if you need additional information that could help the diagnostic and resolution of the problem.

      Sebastien.

      Attachments

        Issue Links

          Activity

            [LU-4797] ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed
            pjones Peter Jones made changes -
            Labels Original: p4b
            pjones Peter Jones made changes -
            Labels Original: llnl p4b New: p4b
            bobijam Zhenyu Xu added a comment -

            every write needs a exclusive lock, write from other node will cause the lock holder to relinquish the lock, and multiple write upon the same file from different node will cause lock enqueue and lock blocking ast intertwined, by that I meant normal.

            bobijam Zhenyu Xu added a comment - every write needs a exclusive lock, write from other node will cause the lock holder to relinquish the lock, and multiple write upon the same file from different node will cause lock enqueue and lock blocking ast intertwined, by that I meant normal.

            Hi Zhenyu Xu,
            Could you explain me why you think "there are lock enqueue and blocking ast call trace intertwined"
            and eviction. You said "that's normal". I don't understand why there a lot contention on this because
            we just add some bytes at the end of one file with just 4 process on 2 nodes . for me and on a 2.1.x
            lustre distribution we haven't this contention.
            and that could explain why sometime we meet the race fix by LU-4558
            LU-3027 and LU-4495 could explain this contention ?
            thanks

            apercher Antoine Percher added a comment - Hi Zhenyu Xu, Could you explain me why you think "there are lock enqueue and blocking ast call trace intertwined" and eviction. You said "that's normal". I don't understand why there a lot contention on this because we just add some bytes at the end of one file with just 4 process on 2 nodes . for me and on a 2.1.x lustre distribution we haven't this contention. and that could explain why sometime we meet the race fix by LU-4558 LU-3027 and LU-4495 could explain this contention ? thanks
            jay Jinshan Xiong (Inactive) made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            duplicate of LU-4558

            jay Jinshan Xiong (Inactive) added a comment - duplicate of LU-4558

            Please try patch: http://review.whamcloud.com/9881

            I believe this is the same issue in LU-4591.

            jay Jinshan Xiong (Inactive) added a comment - Please try patch: http://review.whamcloud.com/9881 I believe this is the same issue in LU-4591 .
            jay Jinshan Xiong (Inactive) made changes -
            Link New: This issue is related to LU-4558 [ LU-4558 ]
            jay Jinshan Xiong (Inactive) made changes -
            Link New: This issue is related to LU-4591 [ LU-4591 ]
            jay Jinshan Xiong (Inactive) made changes -
            Assignee Original: Zhenyu Xu [ bobijam ] New: Jinshan Xiong [ jay ]

            People

              jay Jinshan Xiong (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: