Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1527

Assertion at: (cl_lock.c:2211:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.1.6
    • 3
    • 5266

    Description

      With backtrace:

      LustreError: 16393:0:(cl_lock.c:2211:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed:
      LustreError: 16393:0:(cl_lock.c:2211:cl_lock_hold_add()) LBUG
      Pid: 16393, comm: ll_close

      Call Trace:
      [<ffffffffa0ce3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa0ce3f17>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0e64f71>] cl_lock_hold_add+0x141/0x150 [obdclass]
      [<ffffffffa0e67c16>] cl_lock_peek+0x96/0x150 [obdclass]
      [<ffffffffa0a0ac66>] cl_local_size+0x266/0x2f0 [lustre]
      [<ffffffffa09ce846>] ll_done_writing_attr+0x46/0x190 [lustre]
      [<ffffffffa09cec89>] ll_ioepoch_close+0x2f9/0x600 [lustre]
      [<ffffffffa09d05d0>] ll_close_thread+0x2a0/0x1060 [lustre]
      [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
      [<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
      [<ffffffff8100c14a>] child_rip+0xa/0x20
      [<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
      [<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
      [<ffffffff8100c140>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1339749052.16393

      Attachments

        Issue Links

          Activity

            [LU-1527] Assertion at: (cl_lock.c:2211:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING )
            yujian Jian Yu added a comment -

            While testing patch http://review.whamcloud.com/8835 on Lustre b2_1 branch, the same failure occurred:

            17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: 
            17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) LBUG
            

            Maloo report: https://maloo.whamcloud.com/test_sets/ec280652-8726-11e3-8928-52540035b04c

            yujian Jian Yu added a comment - While testing patch http://review.whamcloud.com/8835 on Lustre b2_1 branch, the same failure occurred: 17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: 17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) LBUG Maloo report: https://maloo.whamcloud.com/test_sets/ec280652-8726-11e3-8928-52540035b04c

            FYI, I ran into this running acceptance on a SLES11SP1 node with Lustre 2.1.5 client.
            Andriy's patch looks applicable to b2_1, as well.
            I just passed an acceptance run using a 2.1.5 base plus this patch.
            Whitespace changes were required.

            schamp Stephen Champion added a comment - FYI, I ran into this running acceptance on a SLES11SP1 node with Lustre 2.1.5 client. Andriy's patch looks applicable to b2_1, as well. I just passed an acceptance run using a 2.1.5 base plus this patch. Whitespace changes were required.

            patch has landed.

            jay Jinshan Xiong (Inactive) added a comment - patch has landed.

            I hit this on sanity.sh test_132() in my local single-node test setup (dual-core x86_64):

            Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) testfs-OST0002: deleting orphan objects from 47607 to 47694
            Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) Skipped 6 previous similar messages
            LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 8000: rc -2
            LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 20 previous similar messages
            LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7832: rc -2
            LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 2 previous similar messages
            Lustre: Mounted testfs-client
            LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7848: rc -2
            LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 33 previous similar messages
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 28838: rc -2
            LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 262 previous similar messages
            LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 45068: rc -2
            LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 5436 previous similar messages
            Lustre: 2094:0:(mdt_handler.c:5649:mdt_connect_internal()) testfs-MDT0000: MDS has SOM enabled, but client does not support it
            Lustre: DEBUG MARKER: cancel_lru_locks osc start
            LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: 
            LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) LBUG
            Pid: 2526, comm: ll_close
            
            Call Trace:
            [<ffffffffa078c905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            [<ffffffffa078cf17>] lbug_with_loc+0x47/0xb0 [libcfs]
            [<ffffffffa0980ac1>] cl_lock_hold_add+0x141/0x150 [obdclass]
            [<ffffffffa0983736>] cl_lock_peek+0x96/0x150 [obdclass]
            [<ffffffffa117b1b6>] cl_local_size+0x266/0x2f0 [lustre]
            [<ffffffffa113df36>] ll_done_writing_attr+0x46/0x190 [lustre]
            [<ffffffffa113e379>] ll_ioepoch_close+0x2f9/0x600 [lustre]
            [<ffffffffa113fcc0>] ll_close_thread+0x2a0/0x1060 [lustre]
            [<ffffffff81060250>] ? default_wake_function+0x0/0x20
            

            The lvbo_init errors are unexpected in this test, so I'm not sure if they are the trigger, or unrelated to the problem.

            adilger Andreas Dilger added a comment - I hit this on sanity.sh test_132() in my local single-node test setup (dual-core x86_64): Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) testfs-OST0002: deleting orphan objects from 47607 to 47694 Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) Skipped 6 previous similar messages LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 8000: rc -2 LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 20 previous similar messages LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7832: rc -2 LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 2 previous similar messages Lustre: Mounted testfs-client LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7848: rc -2 LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 33 previous similar messages Lustre: DEBUG MARKER: Using TIMEOUT=20 LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 28838: rc -2 LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 262 previous similar messages LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 45068: rc -2 LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 5436 previous similar messages Lustre: 2094:0:(mdt_handler.c:5649:mdt_connect_internal()) testfs-MDT0000: MDS has SOM enabled, but client does not support it Lustre: DEBUG MARKER: cancel_lru_locks osc start LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) LBUG Pid: 2526, comm: ll_close Call Trace: [<ffffffffa078c905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa078cf17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0980ac1>] cl_lock_hold_add+0x141/0x150 [obdclass] [<ffffffffa0983736>] cl_lock_peek+0x96/0x150 [obdclass] [<ffffffffa117b1b6>] cl_local_size+0x266/0x2f0 [lustre] [<ffffffffa113df36>] ll_done_writing_attr+0x46/0x190 [lustre] [<ffffffffa113e379>] ll_ioepoch_close+0x2f9/0x600 [lustre] [<ffffffffa113fcc0>] ll_close_thread+0x2a0/0x1060 [lustre] [<ffffffff81060250>] ? default_wake_function+0x0/0x20 The lvbo_init errors are unexpected in this test, so I'm not sure if they are the trigger, or unrelated to the problem.
            jay Jinshan Xiong (Inactive) added a comment - patch is at: http://review.whamcloud.com/3117

            We should check if the lock is freed in cl_lock_peek().

            jay Jinshan Xiong (Inactive) added a comment - We should check if the lock is freed in cl_lock_peek().

            People

              jay Jinshan Xiong (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: