Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1640

Test failure on test suite lustre-rsync-test, subtest test_2c

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • 3
    • 6363

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/dcdbf220-cd1f-11e1-957a-52540035b04c.

      The sub-test test_2c failed with the following error:

      test failed to respond and timed out

      It seems the MDS is stuck for some reason

      mdt00_000     D 0000000000000001     0 12165      2 0x00000080
       ffff8800673a5aa0 0000000000000046 ffff8800779fdb40 ffff8800673a5b20
       ffff8800673a5a50 ffffc900018c502c 0000000000000246 0000000000000246
       ffff8800355fc6b8 ffff8800673a5fd8 000000000000f4e8 ffff8800355fc6b8
      Call Trace:
       [<ffffffffa053b5d4>] ? htable_lookup+0x1a4/0x1c0 [obdclass]
       [<ffffffffa0ced77e>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa053b6a0>] lu_object_find_at+0xb0/0x450 [obdclass]
       [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
       [<ffffffffa053ba7f>] lu_object_find_slice+0x1f/0x80 [obdclass]
       [<ffffffffa095c160>] mdd_object_find+0x10/0x70 [mdd]
       [<ffffffffa096395f>] mdd_path+0x35f/0x1060 [mdd]
       [<ffffffffa053b67c>] ? lu_object_find_at+0x8c/0x450 [obdclass]
       [<ffffffffa0963600>] ? mdd_path+0x0/0x1060 [mdd]
       [<ffffffffa0af47da>] cml_path+0x6a/0x180 [cmm]
       [<ffffffffa09c9db6>] ? mdt_object_find+0x66/0x170 [mdt]
       [<ffffffffa09ce3ff>] mdt_get_info+0x64f/0xa90 [mdt]
       [<ffffffffa09c9f0d>] ? mdt_unpack_req_pack_rep+0x4d/0x4d0 [mdt]
       [<ffffffffa09d2922>] mdt_handle_common+0x922/0x1740 [mdt]
       [<ffffffffa09d3815>] mdt_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa066757d>] ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
       [<ffffffffa0ced65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa065ea37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
       [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
       [<ffffffffa0668b79>] ptlrpc_main+0xb69/0x1870 [ptlrpc]
       [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffff8100c14a>] child_rip+0xa/0x20
       [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffffa0668010>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-1640] Test failure on test suite lustre-rsync-test, subtest test_2c
            bobijam Zhenyu Xu added a comment -

            discussion move to LU-2492

            bobijam Zhenyu Xu added a comment - discussion move to LU-2492
            yujian Jian Yu added a comment -

            Lustre Tag: v2_3_0_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
            Distro/Arch: RHEL6.3/x86_64

            The same issue occurred again: https://maloo.whamcloud.com/test_sets/664cd250-12ac-11e2-bd97-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_3_0_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32 Distro/Arch: RHEL6.3/x86_64 The same issue occurred again: https://maloo.whamcloud.com/test_sets/664cd250-12ac-11e2-bd97-52540035b04c
            pjones Peter Jones added a comment -

            As per Bobijam this issue rarely occurs (not seen in the last three tags) and so decreasing in priority to focus on more frequently hit issues

            pjones Peter Jones added a comment - As per Bobijam this issue rarely occurs (not seen in the last three tags) and so decreasing in priority to focus on more frequently hit issues
            bobijam Zhenyu Xu added a comment -

            patch tracking at http://review.whamcloud.com/3439

            obdclass: htable_lookup could miss a waking up signal

            In lu_object_free(), a wakeing up signal is issued to hash bucket
            waiting queue telling blocking thread that a dying object is freed,
            but it does not take the bucket lock, without it, there is a chance
            that a thread calling htable_lookup() could be add to the bucket
            waiting queue missing this waking up signal and waiting forever.

            bobijam Zhenyu Xu added a comment - patch tracking at http://review.whamcloud.com/3439 obdclass: htable_lookup could miss a waking up signal In lu_object_free(), a wakeing up signal is issued to hash bucket waiting queue telling blocking thread that a dying object is freed, but it does not take the bucket lock, without it, there is a chance that a thread calling htable_lookup() could be add to the bucket waiting queue missing this waking up signal and waiting forever.
            pjones Peter Jones added a comment -

            Bobijam

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobijam Could you please look into this one? Thanks Peter

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: