Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5522

ofd_prolong_extent_locks()) ASSERTION( lock->l_flags & 0x0000000000000020ULL ) failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • 3
    • 15379

    Description

      LBUG hit while running soak testing (IOR on CNs + message drop on router) on lola:

      <4>Lustre: 6834:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1408450946/real 1408450946]  req@ffff8802f1fc3000 x1476486607266588/t0(0) o105->soaked-OST0000@192.168.1.121@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1408450953 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
      <0>LustreError: 6767:0:(ofd_dev.c:1809:ofd_prolong_extent_locks()) ASSERTION( lock->l_flags & 0x0000000000000020ULL ) failed: 
      <0>LustreError: 6767:0:(ofd_dev.c:1809:ofd_prolong_extent_locks()) LBUG
      <0>Kernel panic - not syncing: LBUG in interrupt.
      <0>
      <4>Pid: 6767, comm: ll_ost_io02_005 Tainted: P        W  ---------------    2.6.32-431.23.3.el6_lustre.gc0c4f13.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff81528dbc>] ? panic+0xa7/0x16f
      <4> [<ffffffffa1089edd>] ? lbug_with_loc+0x8d/0xb0 [libcfs]
      <4> [<ffffffffa1a0f7ac>] ? ofd_prolong_extent_locks+0x35c/0x390 [ofd]
      <4> [<ffffffffa1a0fce9>] ? ofd_rw_hpreq_check+0xe9/0x350 [ofd]
      <4> [<ffffffffa1444345>] ? req_capsule_client_get+0x15/0x20 [ptlrpc]
      <4> [<ffffffffa1a0fb87>] ? ofd_hp_brw+0xd7/0x150 [ofd]
      <4> [<ffffffffa147d2a2>] ? tgt_hpreq_handler+0xf2/0x2f0 [ptlrpc]
      <4> [<ffffffffa1426cf6>] ? ptlrpc_server_handle_req_in+0x7c6/0xcd0 [ptlrpc]
      <4> [<ffffffffa142ddfc>] ? ptlrpc_main+0x9ec/0x1990 [ptlrpc]
      <4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      <4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150
      <4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760
      <4> [<ffffffffa142d410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-5522] ofd_prolong_extent_locks()) ASSERTION( lock->l_flags & 0x0000000000000020ULL ) failed

            Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/13744
            Subject: LU-5522 ldlm: remove expired lock from per-export list
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: b95a885461dab9c07410ee52e24a7d25318f55a1

            gerrit Gerrit Updater added a comment - Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/13744 Subject: LU-5522 ldlm: remove expired lock from per-export list Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: b95a885461dab9c07410ee52e24a7d25318f55a1

            Patch landed.

            johann Johann Lombardi (Inactive) added a comment - Patch landed.

            Noticed the following warning earlier in the log:

            <4>WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Tainted: P           ---------------   )
            <4>Hardware name: S2600GZ ..........
            <4>list_add corruption. next->prev should be prev (ffff8808305fae70), but was ffff8805da213328. (next=ffff8805da213328).
            <4>Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(
            U) obdclass(U) lnet(U) libcfs(U) sha512_generic sha256_generic crc32c_intel nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_o
            ndemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor
            _support sb_edac edac_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core i2c_i801 ioatdma
             ses enclosure sg mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif a
            hci isci libsas mpt2sas scsi_transport_sas raid_class wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
            <4>Pid: 6804, comm: ll_ost03_011 Tainted: P           ---------------    2.6.32-431.23.3.el6_lustre.gc0c4f13.x86_64 #1
            <4>Call Trace:
            <4> [<ffffffff81071b37>] ? warn_slowpath_common+0x87/0xc0
            <4> [<ffffffff81071c26>] ? warn_slowpath_fmt+0x46/0x50
            <4> [<ffffffff8129589d>] ? __list_add+0x6d/0xa0
            <4> [<ffffffffa13fba1a>] ? ldlm_add_waiting_lock+0x1fa/0x310 [ptlrpc]
            <4> [<ffffffffa13fd823>] ? ldlm_server_blocking_ast+0x423/0x8a0 [ptlrpc]
            <4> [<ffffffffa147975b>] ? tgt_blocking_ast+0x7b/0x7e0 [ptlrpc]
            <4> [<ffffffffa02ef9e9>] ? dbuf_dirty+0x4c9/0x840 [zfs]
            <4> [<ffffffffa13ce4ea>] ? ldlm_add_bl_work_item+0x8a/0x1e0 [ptlrpc]
            <4> [<ffffffffa13ce695>] ? ldlm_add_ast_work_item+0x55/0x150 [ptlrpc]
            <4> [<ffffffffa13d0e3d>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc]
            <4> [<ffffffffa1411f8c>] ? ptlrpc_set_wait+0x6c/0x860 [ptlrpc]
            <4> [<ffffffffa1622257>] ? kiblnd_launch_tx+0xf7/0xa80 [ko2iblnd]
            <4> [<ffffffffa140d842>] ? ptlrpc_prep_set+0x112/0x2e0 [ptlrpc]
            <4> [<ffffffffa13d0d60>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc]
            <4> [<ffffffffa13d2f6b>] ? ldlm_run_ast_work+0x1db/0x470 [ptlrpc]
            <4> [<ffffffffa13ea315>] ? ldlm_process_extent_lock+0x155/0xab0 [ptlrpc]
            <4> [<ffffffffa13d283d>] ? ldlm_lock_enqueue+0x41d/0x970 [ptlrpc]
            <4> [<ffffffffa13f1fc6>] ? ldlm_cli_enqueue_local+0x186/0x5d0 [ptlrpc]
            <4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
            <4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
            <4> [<ffffffffa1a23014>] ? ofd_destroy_by_fid+0x344/0x620 [ofd]
            <4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
            <4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
            <4> [<ffffffffa1a1c96a>] ? ofd_destroy_hdl+0x2fa/0xb60 [ofd]
            <4> [<ffffffffa147f21e>] ? tgt_request_handle+0x71e/0xb10 [ptlrpc]
            <4> [<ffffffffa142e274>] ? ptlrpc_main+0xe64/0x1990 [ptlrpc]
            
            johann Johann Lombardi (Inactive) added a comment - Noticed the following warning earlier in the log: <4>WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Tainted: P --------------- ) <4>Hardware name: S2600GZ .......... <4>list_add corruption. next->prev should be prev (ffff8808305fae70), but was ffff8805da213328. (next=ffff8805da213328). <4>Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc( U) obdclass(U) lnet(U) libcfs(U) sha512_generic sha256_generic crc32c_intel nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_o ndemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor _support sb_edac edac_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate lpc_ich mfd_core i2c_i801 ioatdma ses enclosure sg mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif a hci isci libsas mpt2sas scsi_transport_sas raid_class wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] <4>Pid: 6804, comm: ll_ost03_011 Tainted: P --------------- 2.6.32-431.23.3.el6_lustre.gc0c4f13.x86_64 #1 <4>Call Trace: <4> [<ffffffff81071b37>] ? warn_slowpath_common+0x87/0xc0 <4> [<ffffffff81071c26>] ? warn_slowpath_fmt+0x46/0x50 <4> [<ffffffff8129589d>] ? __list_add+0x6d/0xa0 <4> [<ffffffffa13fba1a>] ? ldlm_add_waiting_lock+0x1fa/0x310 [ptlrpc] <4> [<ffffffffa13fd823>] ? ldlm_server_blocking_ast+0x423/0x8a0 [ptlrpc] <4> [<ffffffffa147975b>] ? tgt_blocking_ast+0x7b/0x7e0 [ptlrpc] <4> [<ffffffffa02ef9e9>] ? dbuf_dirty+0x4c9/0x840 [zfs] <4> [<ffffffffa13ce4ea>] ? ldlm_add_bl_work_item+0x8a/0x1e0 [ptlrpc] <4> [<ffffffffa13ce695>] ? ldlm_add_ast_work_item+0x55/0x150 [ptlrpc] <4> [<ffffffffa13d0e3d>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc] <4> [<ffffffffa1411f8c>] ? ptlrpc_set_wait+0x6c/0x860 [ptlrpc] <4> [<ffffffffa1622257>] ? kiblnd_launch_tx+0xf7/0xa80 [ko2iblnd] <4> [<ffffffffa140d842>] ? ptlrpc_prep_set+0x112/0x2e0 [ptlrpc] <4> [<ffffffffa13d0d60>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc] <4> [<ffffffffa13d2f6b>] ? ldlm_run_ast_work+0x1db/0x470 [ptlrpc] <4> [<ffffffffa13ea315>] ? ldlm_process_extent_lock+0x155/0xab0 [ptlrpc] <4> [<ffffffffa13d283d>] ? ldlm_lock_enqueue+0x41d/0x970 [ptlrpc] <4> [<ffffffffa13f1fc6>] ? ldlm_cli_enqueue_local+0x186/0x5d0 [ptlrpc] <4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] <4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] <4> [<ffffffffa1a23014>] ? ofd_destroy_by_fid+0x344/0x620 [ofd] <4> [<ffffffffa13f0db0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] <4> [<ffffffffa13f2410>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] <4> [<ffffffffa1a1c96a>] ? ofd_destroy_hdl+0x2fa/0xb60 [ofd] <4> [<ffffffffa147f21e>] ? tgt_request_handle+0x71e/0xb10 [ptlrpc] <4> [<ffffffffa142e274>] ? ptlrpc_main+0xe64/0x1990 [ptlrpc]

            People

              johann Johann Lombardi (Inactive)
              johann Johann Lombardi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: