Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10525

racer test_1: Failure to initialize cl object

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5
    • None
    • 3
    • 9223372036854775807

    Description

      racer test_1 - test_1 failed with 2
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/60989662-f935-11e7-a7cd-52540065bddc

      test_1 failed with the following error:

      test_1 failed with 2
      

      server: 2.10.3 RC1 EL7
      client: 2.10.3 RC1 EL6.9

      test log

      Lustre: Skipped 3 previous similar messages
      Lustre: lustre-MDT0000-mdc-ffff880041847c00: Connection restored to 10.9.5.195@tcp (at 10.9.5.195@tcp)
      Lustre: Skipped 3 previous similar messages
      LustreError: 16503:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2366:0x0]: -16
      LustreError: 32509:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x2486:0x0]: -16
      INFO: task dir_create.sh:12824 blocked for more than 120 seconds.
            Tainted: G        W  -- ------------    2.6.32-696.18.7.el6.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      dir_create.sh D 0000000000000000     0 12824  12691 0x00000080
       ffff88007a03fb68 0000000000000086 ffff88007a03fba8 0000000000000080
       0000000000000000 ffff880059833149 0000004b00000000 ffffffffa103b393
       0000000000000098 0020000000000080 ffff88002c1a1068 ffff88007a03ffd8
      Call Trace:
       [<ffffffff8154d146>] __mutex_lock_slowpath+0x96/0x210
       [<ffffffff811b5f68>] ? __d_lookup+0xd8/0x150
       [<ffffffff8154cc6b>] mutex_lock+0x2b/0x50
       [<ffffffff811aa63b>] do_lookup+0x11b/0x230
       [<ffffffff811aace0>] __link_path_walk+0x200/0x1060
       [<ffffffff81342f5c>] ? memory_open+0x3c/0xa0
       [<ffffffff811abdfa>] path_walk+0x6a/0xe0
       [<ffffffff811ad5da>] do_filp_open+0x1fa/0xd20
       [<ffffffff8115aeca>] ? handle_mm_fault+0x2aa/0x3f0
       [<ffffffff812a97fa>] ? strncpy_from_user+0x4a/0x90
       [<ffffffff811bac52>] ? alloc_fd+0x92/0x160
       [<ffffffff81197507>] do_sys_open+0x67/0x130
       [<ffffffff8155660b>] ? system_call_after_swapgs+0x16b/0x220
       [<ffffffff81556604>] ? system_call_after_swapgs+0x164/0x220
       [<ffffffff815565fd>] ? system_call_after_swapgs+0x15d/0x220
       [<ffffffff81197610>] sys_open+0x20/0x30
       [<ffffffff815566d6>] system_call_fastpath+0x16/0x1b
       [<ffffffff8155656a>] ? system_call_after_swapgs+0xca/0x220
      INFO: task ls:18508 blocked for more than 120 seconds.
      

      Attachments

        Issue Links

          Activity

            [LU-10525] racer test_1: Failure to initialize cl object

            racer test 1 times out with dir_create, dd, mv, lfs, etc. hung is a D state for servers and client el7 with logs at https://testing.whamcloud.com/test_sets/a3be8a94-9c02-11e8-a9f7-52540065bddc. We don't see the 'initialize cl object' error in this hang.

            jamesanunez James Nunez (Inactive) added a comment - racer test 1 times out with dir_create, dd, mv, lfs, etc. hung is a D state for servers and client el7 with logs at https://testing.whamcloud.com/test_sets/a3be8a94-9c02-11e8-a9f7-52540065bddc . We don't see the 'initialize cl object' error in this hang.
            sarah Sarah Liu added a comment - +1 on b2_10 https://testing.hpdd.intel.com/test_sets/03bf84be-5c0e-11e8-b9d3-52540065bddc
            green Oleg Drokin added a comment -

            the message quoted here is just a symptom on the client that something is hogging MDS resources so the client cannot send more than one request and that one request is stuck in processing on mds.

            I looked at hte logs on mds and it seems to be somewhat confirmed there with all the "cannot add more time, not sending early replies" messages, but no definite point at what is stuck where and since it was not a crash, we did not collect any crashdumps either, I guess. As such we don't really know all that much about what was going on here that lead to MDS being stuck.

            green Oleg Drokin added a comment - the message quoted here is just a symptom on the client that something is hogging MDS resources so the client cannot send more than one request and that one request is stuck in processing on mds. I looked at hte logs on mds and it seems to be somewhat confirmed there with all the "cannot add more time, not sending early replies" messages, but no definite point at what is stuck where and since it was not a crash, we did not collect any crashdumps either, I guess. As such we don't really know all that much about what was going on here that lead to MDS being stuck.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: