Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4692

(osc_lock.c:1204:osc_lock_enqueue()) ASSERTION( ols->ols_state == OLS_NEW ) failed: Impossible state: 6

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.4.3
    • None
    • 2.5.54
    • 3
    • 12901

    Description

      The assertion failed on our system. The system are running Lustre-2.5.54, so patch from LU-3889 is already merged.

      <3>LustreError: 109372:0:(file.c:3087:ll_inode_revalidate_fini()) home2: revalidate FID [0x2000048a2:0x17:0x0] error: rc = -116
      <0>LustreError: 237806:0:(osc_lock.c:1204:osc_lock_enqueue()) ASSERTION( ols->ols_state == OLS_NEW ) failed: Impossible state: 6
      <0>LustreError: 237806:0:(osc_lock.c:1204:osc_lock_enqueue()) LBUG
      <4>Pid: 237806, comm: tail
      <4>
      <4>Call Trace:
      <4> [<ffffffffa054c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa054ce97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa0c620f0>] ? osc_lock_enqueue+0x0/0x8c0 [osc]
      <4> [<ffffffffa0c62820>] osc_lock_enqueue+0x730/0x8c0 [osc]
      <4> [<ffffffffa069a337>] ? cl_lock_state_signal+0x87/0x160 [obdclass]
      <4> [<ffffffffa069dcfc>] cl_enqueue_try+0xfc/0x300 [obdclass]
      <4> [<ffffffffa0a5c42a>] lov_lock_enqueue+0x22a/0x850 [lov]
      <4> [<ffffffffa069dcfc>] cl_enqueue_try+0xfc/0x300 [obdclass]
      <4> [<ffffffffa069ef4f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
      <4> [<ffffffffa069fb9e>] cl_lock_request+0x7e/0x270 [obdclass]
      <4> [<ffffffffa06a4b2c>] cl_io_lock+0x3cc/0x560 [obdclass]
      <4> [<ffffffffa06a4d62>] cl_io_loop+0xa2/0x1b0 [obdclass]
      <4> [<ffffffffa0addca6>] ll_file_io_generic+0x2b6/0x710 [lustre]
      <4> [<ffffffffa0694ce9>] ? cl_env_get+0x29/0x350 [obdclass]
      <4> [<ffffffffa0ade23f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      <4> [<ffffffffa0ade6fc>] ll_file_read+0x16c/0x2a0 [lustre]
      <4> [<ffffffff811816c5>] vfs_read+0xb5/0x1a0
      <4> [<ffffffff81181801>] sys_read+0x51/0x90
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 237806, comm: tail Tainted: G W --------------- 2.6.32-358.el6.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
      <4> [<ffffffffa054ceeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa0c620f0>] ? osc_lock_enqueue+0x0/0x8c0 [osc]
      <4> [<ffffffffa0c62820>] ? osc_lock_enqueue+0x730/0x8c0 [osc]
      <4> [<ffffffffa069a337>] ? cl_lock_state_signal+0x87/0x160 [obdclass]
      <4> [<ffffffffa069dcfc>] ? cl_enqueue_try+0xfc/0x300 [obdclass]
      <4> [<ffffffffa0a5c42a>] ? lov_lock_enqueue+0x22a/0x850 [lov]
      <4> [<ffffffffa069dcfc>] ? cl_enqueue_try+0xfc/0x300 [obdclass]
      <4> [<ffffffffa069ef4f>] ? cl_enqueue_locked+0x6f/0x1f0 [obdclass]
      <4> [<ffffffffa069fb9e>] ? cl_lock_request+0x7e/0x270 [obdclass]
      <4> [<ffffffffa06a4b2c>] ? cl_io_lock+0x3cc/0x560 [obdclass]
      <4> [<ffffffffa06a4d62>] ? cl_io_loop+0xa2/0x1b0 [obdclass]
      <4> [<ffffffffa0addca6>] ? ll_file_io_generic+0x2b6/0x710 [lustre]
      <4> [<ffffffffa0694ce9>] ? cl_env_get+0x29/0x350 [obdclass]
      <4> [<ffffffffa0ade23f>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
      <4> [<ffffffffa0ade6fc>] ? ll_file_read+0x16c/0x2a0 [lustre]
      <4> [<ffffffff811816c5>] ? vfs_read+0xb5/0x1a0
      <4> [<ffffffff81181801>] ? sys_read+0x51/0x90
      <4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

      Attachments

        Issue Links

          Activity

            [LU-4692] (osc_lock.c:1204:osc_lock_enqueue()) ASSERTION( ols->ols_state == OLS_NEW ) failed: Impossible state: 6

            I think this is imported by LU-3889, I pushed http://review.whamcloud.com/9783 please give it a try.

            It lacks of lustre module files so I can't analyze the core dump file.

            jay Jinshan Xiong (Inactive) added a comment - I think this is imported by LU-3889 , I pushed http://review.whamcloud.com/9783 please give it a try. It lacks of lustre module files so I can't analyze the core dump file.

            This problem should be imported by LU-3027, I'm taking a look.

            jay Jinshan Xiong (Inactive) added a comment - This problem should be imported by LU-3027 , I'm taking a look.
            wjt27 Wojciech Turek added a comment - we use patchless clients so the kernel is standard rhel6.5 kernel ftp://ftp.scientificlinux.org/linux/scientific/6.5/archive/debuginfo/kernel-debuginfo-2.6.32-431.1.2.el6.x86_64.rpm ftp://ftp.scientificlinux.org/linux/scientific/6.5/archive/debuginfo/kernel-debuginfo-common-x86_64-2.6.32-431.1.2.el6.x86_64.rpm
            ihara Shuichi Ihara (Inactive) added a comment - Here is crashdump of original this problem. ftp://ftp.whamcloud.com/uploads/LU-4692-2/ The kernel is standard RHEL6.4, you can download vmlinux and System.map from here. http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-2.6.32-358.el6.x86_64.rpm http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-common-x86_64-2.6.32-358.el6.x86_64.rpm
            bobijam Zhenyu Xu added a comment -

            Hi Wojciech,

            Can you upload the supporting files (uncompressed vmlinuz, System.map) as well?

            bobijam Zhenyu Xu added a comment - Hi Wojciech, Can you upload the supporting files (uncompressed vmlinuz, System.map) as well?
            wjt27 Wojciech Turek added a comment - uploaded to ftp://ftp.whamcloud.com/uploads/LU-4692/

            Hi Turek,

            Can you share us the core dump? You can upload it to ftp.whamcloud.com, thanks.

            Jinshan

            jay Jinshan Xiong (Inactive) added a comment - Hi Turek, Can you share us the core dump? You can upload it to ftp.whamcloud.com, thanks. Jinshan

            We hit the same problem shortly after upgrading to 2.4.2 clients. Have not seen this on 2.4.1

            crash> bt
            PID: 18546 TASK: ffff8809f1aa8040 CPU: 0 COMMAND: "turbostream"
            #0 [ffff8809f0cf98a0] machine_kexec at ffffffff81038f3b
            #1 [ffff8809f0cf9900] crash_kexec at ffffffff810c5da2
            #2 [ffff8809f0cf99d0] panic at ffffffff8152721a
            #3 [ffff8809f0cf9a50] lbug_with_loc at ffffffffa0f47eeb [libcfs]
            #4 [ffff8809f0cf9a70] lovsub_lock_state at ffffffffa14b7d51 [lov]
            #5 [ffff8809f0cf9ac0] cl_lock_state_signal at ffffffffa10aedd8 [obdclass]
            #6 [ffff8809f0cf9b00] cl_lock_state_set at ffffffffa10aef25 [obdclass]
            #7 [ffff8809f0cf9b30] cl_enqueue_try at ffffffffa10b2c29 [obdclass]
            #8 [ffff8809f0cf9b80] lov_lock_enqueue at ffffffffa14b00da [lov]
            #9 [ffff8809f0cf9c20] cl_enqueue_try at ffffffffa10b2bdc [obdclass]
            #10 [ffff8809f0cf9c70] cl_enqueue_locked at ffffffffa10b3fcf [obdclass]
            #11 [ffff8809f0cf9cb0] cl_lock_request at ffffffffa10b4c3e [obdclass]
            #12 [ffff8809f0cf9d10] cl_io_lock at ffffffffa10b9edc [obdclass]
            #13 [ffff8809f0cf9d70] cl_io_loop at ffffffffa10ba112 [obdclass]
            #14 [ffff8809f0cf9da0] ll_file_io_generic at ffffffffa152ca90 [lustre]
            #15 [ffff8809f0cf9e20] ll_file_aio_write at ffffffffa152d9d2 [lustre]
            #16 [ffff8809f0cf9e80] ll_file_write at ffffffffa152dcbc [lustre]
            #17 [ffff8809f0cf9ef0] vfs_write at ffffffff81188f88
            #18 [ffff8809f0cf9f30] sys_write at ffffffff81189881
            #19 [ffff8809f0cf9f80] system_call_fastpath at ffffffff8100b072
            RIP: 00007fdf7eeea6fd RSP: 00007fffad51c8b8 RFLAGS: 00000202
            RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 0000000000000001
            RDX: 00000000000009c4 RSI: 0000000008b32154 RDI: 000000000000001b
            RBP: 0000000013252a00 R8: 00007fdf8069d6a8 R9: 00000000000009c4
            R10: 00007fffad512d3c R11: 0000000000000293 R12: 000000004c00010d
            R13: 00000000000009c4 R14: 0000000000e6d964 R15: 000000000000001b
            ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
            crash> sys
            SYSTEM MAP: /boot/System.map-2.6.32-431.1.2.el6.hpcs.x86_64
            DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.1.2.el6.hpcs.x86_64/vmlinux (2.6.32-431.1.2.el6.hpcs.x86_64)
            DUMPFILE: vmcore [PARTIAL DUMP]
            CPUS: 12
            DATE: Wed Mar 12 16:17:17 2014
            UPTIME: 1 days, 23:37:03
            LOAD AVERAGE: 1.99, 1.97, 1.91
            TASKS: 449
            NODENAME: tesla14
            RELEASE: 2.6.32-431.1.2.el6.hpcs.x86_64
            VERSION: #1 SMP Sun Dec 15 04:51:47 GMT 2013
            MACHINE: x86_64 (2599 Mhz)
            MEMORY: 64 GB
            PANIC: "Kernel panic - not syncing: LBUG"

            wjt27 Wojciech Turek added a comment - We hit the same problem shortly after upgrading to 2.4.2 clients. Have not seen this on 2.4.1 crash> bt PID: 18546 TASK: ffff8809f1aa8040 CPU: 0 COMMAND: "turbostream" #0 [ffff8809f0cf98a0] machine_kexec at ffffffff81038f3b #1 [ffff8809f0cf9900] crash_kexec at ffffffff810c5da2 #2 [ffff8809f0cf99d0] panic at ffffffff8152721a #3 [ffff8809f0cf9a50] lbug_with_loc at ffffffffa0f47eeb [libcfs] #4 [ffff8809f0cf9a70] lovsub_lock_state at ffffffffa14b7d51 [lov] #5 [ffff8809f0cf9ac0] cl_lock_state_signal at ffffffffa10aedd8 [obdclass] #6 [ffff8809f0cf9b00] cl_lock_state_set at ffffffffa10aef25 [obdclass] #7 [ffff8809f0cf9b30] cl_enqueue_try at ffffffffa10b2c29 [obdclass] #8 [ffff8809f0cf9b80] lov_lock_enqueue at ffffffffa14b00da [lov] #9 [ffff8809f0cf9c20] cl_enqueue_try at ffffffffa10b2bdc [obdclass] #10 [ffff8809f0cf9c70] cl_enqueue_locked at ffffffffa10b3fcf [obdclass] #11 [ffff8809f0cf9cb0] cl_lock_request at ffffffffa10b4c3e [obdclass] #12 [ffff8809f0cf9d10] cl_io_lock at ffffffffa10b9edc [obdclass] #13 [ffff8809f0cf9d70] cl_io_loop at ffffffffa10ba112 [obdclass] #14 [ffff8809f0cf9da0] ll_file_io_generic at ffffffffa152ca90 [lustre] #15 [ffff8809f0cf9e20] ll_file_aio_write at ffffffffa152d9d2 [lustre] #16 [ffff8809f0cf9e80] ll_file_write at ffffffffa152dcbc [lustre] #17 [ffff8809f0cf9ef0] vfs_write at ffffffff81188f88 #18 [ffff8809f0cf9f30] sys_write at ffffffff81189881 #19 [ffff8809f0cf9f80] system_call_fastpath at ffffffff8100b072 RIP: 00007fdf7eeea6fd RSP: 00007fffad51c8b8 RFLAGS: 00000202 RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 0000000000000001 RDX: 00000000000009c4 RSI: 0000000008b32154 RDI: 000000000000001b RBP: 0000000013252a00 R8: 00007fdf8069d6a8 R9: 00000000000009c4 R10: 00007fffad512d3c R11: 0000000000000293 R12: 000000004c00010d R13: 00000000000009c4 R14: 0000000000e6d964 R15: 000000000000001b ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b crash> sys SYSTEM MAP: /boot/System.map-2.6.32-431.1.2.el6.hpcs.x86_64 DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.1.2.el6.hpcs.x86_64/vmlinux (2.6.32-431.1.2.el6.hpcs.x86_64) DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 12 DATE: Wed Mar 12 16:17:17 2014 UPTIME: 1 days, 23:37:03 LOAD AVERAGE: 1.99, 1.97, 1.91 TASKS: 449 NODENAME: tesla14 RELEASE: 2.6.32-431.1.2.el6.hpcs.x86_64 VERSION: #1 SMP Sun Dec 15 04:51:47 GMT 2013 MACHINE: x86_64 (2599 Mhz) MEMORY: 64 GB PANIC: "Kernel panic - not syncing: LBUG"

            Hi Andreas,

            Unfortunately, I don't know what test was running. I will let you know as soon as I get more information about ot.

            lixi Li Xi (Inactive) added a comment - Hi Andreas, Unfortunately, I don't know what test was running. I will let you know as soon as I get more information about ot.

            This is probably a different representation of LU-4591. It would be really helpful if someone on our team can help reproduce the problem local side.

            jay Jinshan Xiong (Inactive) added a comment - This is probably a different representation of LU-4591 . It would be really helpful if someone on our team can help reproduce the problem local side.

            Li Xi, could you please comment about what test was being run to cause this problem?

            adilger Andreas Dilger added a comment - Li Xi, could you please comment about what test was being run to cause this problem?

            People

              bobijam Zhenyu Xu
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: