Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4644

Test failure sanity test_161a: list_del corruption

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.6.0
    • None
    • 3
    • 12702

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/015e73f8-9870-11e3-968c-52540035b04c.

      The sub-test test_161a failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity 161a

      MDS2 Console log: (review-dne-part-1 on master)

      16:31:57:Lustre: DEBUG MARKER: == sanity test 161a: link ea sanity == 16:31:56 (1392683516)
      16:31:57:------------[ cut here ]------------
      16:31:57:WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted)
      16:31:57:Hardware name: KVM
      16:31:57:list_del corruption. prev->next should be ffff88005f8fdb98, but was ffff88005f8fdbe0
      16:31:57:Modules linked in: osp(U) mdd(U) lod(U) lfsck(U) mdt(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      16:31:57:Pid: 10866, comm: mdt00_002 Not tainted 2.6.32-431.3.1.el6_lustre.gc762f0f.x86_64 #1
      16:31:57:Call Trace:
      16:31:57: [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0
      16:31:57: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50
      16:31:57: [<ffffffff812948be>] ? list_del+0x6e/0xa0
      16:31:57: [<ffffffff8109b6a1>] ? remove_wait_queue+0x31/0x50
      16:31:57: [<ffffffffa05fa469>] ? lu_object_find_at+0xf9/0x360 [obdclass]
      16:31:57: [<ffffffffa04a3cb2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
      16:31:57: [<ffffffffa05fa6e6>] ? lu_object_find+0x16/0x20 [obdclass]
      16:31:57: [<ffffffffa0d80ed6>] ? mdt_object_find+0x56/0x170 [mdt]
      16:31:57: [<ffffffffa0d8e0f3>] ? mdt_get_info+0xee3/0x1a20 [mdt]
      16:31:57: [<ffffffffa089346c>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc]
      16:31:57: [<ffffffffa084266a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc]
      16:31:57: [<ffffffffa0841950>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
      16:31:57: [<ffffffff8109af06>] ? kthread+0x96/0xa0
      16:31:57: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      16:31:57: [<ffffffff8109ae70>] ? kthread+0x0/0xa0
      16:31:57: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      16:31:57:---[ end trace dd02610dd223ce47 ]---
      16:31:57:------------[ cut here ]------------
      

      Attachments

        Issue Links

          Activity

            [LU-4644] Test failure sanity test_161a: list_del corruption

            multiple calls to lu_object_find() shouldn't be a problem since LU-9049 landed.

            bzzz Alex Zhuravlev added a comment - multiple calls to lu_object_find() shouldn't be a problem since LU-9049 landed.

            Oscar, I'm not sure of the context of my comment.  I may have been confused because there were typos in the function names.  In any case, this test has not failed in a long time, and this code has changed significantly since the bug was filed, so I don't think it is worthwhile to keep it open.

            adilger Andreas Dilger added a comment - Oscar, I'm not sure of the context of my comment.  I may have been confused because there were typos in the function names.  In any case, this test has not failed in a long time, and this code has changed significantly since the bug was filed, so I don't think it is worthwhile to keep it open.
            0santos0 Oscar Santos added a comment -

            Andreas,

            Because my understanding is that mdt_get_info, mdt_path and mdt_patch_current are still present in master, can you clarify which referenced functions no longer exist? And as of what version?

            0santos0 Oscar Santos added a comment - Andreas, Because my understanding is that mdt_get_info, mdt_path and mdt_patch_current are still present in master, can you clarify which referenced functions no longer exist? And as of what version?

            The referenced functions no longer exist, and no repeat of this problem in the past 3 years.

            adilger Andreas Dilger added a comment - The referenced functions no longer exist, and no repeat of this problem in the past 3 years.
            tappro Mikhail Pershin added a comment - - edited

            The mdt_path()->mdt_path_current() may cause double mdt_object_find() it seems. I'll prepare patch.

            tappro Mikhail Pershin added a comment - - edited The mdt_path()->mdt_path_current() may cause double mdt_object_find() it seems. I'll prepare patch.

            Mike, can you please update this ticket?

            adilger Andreas Dilger added a comment - Mike, can you please update this ticket?

            Mike, can you please update this ticket. Sounds like it could be fixed by a relatively simple patch?

            adilger Andreas Dilger added a comment - Mike, can you please update this ticket. Sounds like it could be fixed by a relatively simple patch?

            The only possible point of problem is mdt_path_current() which may call mdt_object_find() twice for the same object, I need some time to make sure about that.

            tappro Mikhail Pershin added a comment - The only possible point of problem is mdt_path_current() which may call mdt_object_find() twice for the same object, I need some time to make sure about that.

            Mike,
            Can you have a look at this please and comment?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Mike, Can you have a look at this please and comment? Thank you!

            this is due to extra lu_object_find() somewhere in mdt_get_info(), instead it should be picking that object directly from info like the other handlers do. please check with Mike Pershin for the details.

            bzzz Alex Zhuravlev added a comment - this is due to extra lu_object_find() somewhere in mdt_get_info(), instead it should be picking that object directly from info like the other handlers do. please check with Mike Pershin for the details.

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: