Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11125

ofd_create_hdl() destroys_in_progress already cleared

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.10.6
    • Lustre 2.10.4
    • lustre-2.10.4_1.chaos-1.ch6.x86_64 servers
      RHEL 7.5
      DNE1 file system
    • 3
    • 9223372036854775807

    Description

      Servers were restarted and appeared to recover normally.  They briefly appeared to be handling the same (heavy) workload from before they were powered off, then started logging the "system was overloaded" message.  The kernel then reported several stacks like this:

      INFO: task ll_ost00_007:108440 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ll_ost00_007 D ffff8ba4dc35bf40 0 108440 2 0x00000080
      Call Trace:
      [<ffffffffaad38919>] schedule_preempt_disabled+0x39/0x90
      [<ffffffffaad3654f>] __mutex_lock_slowpath+0x10f/0x250
      [<ffffffffaad357f2>] mutex_lock+0x32/0x42
      [<ffffffffc1669afb>] ofd_create_hdl+0xdcb/0x2090 [ofd]
      [<ffffffffc1322007>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
      [<ffffffffc132235f>] ? lustre_pack_reply_v2+0x14f/0x290 [ptlrpc]
      [<ffffffffc1322691>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [<ffffffffc138653a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [<ffffffffc132db5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [<ffffffffc132b26b>] ? ptlrpc_wait_event+0xab/0x350 [ptlrpc]
      [<ffffffffaa6d5c32>] ? default_wake_function+0x12/0x20
      [<ffffffffaa6cb01b>] ? __wake_up_common+0x5b/0x90
      [<ffffffffc1331c70>] ptlrpc_main+0xae0/0x1e90 [ptlrpc]
      [<ffffffffc1331190>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      [<ffffffffaa6c0ad1>] kthread+0xd1/0xe0
      [<ffffffffaa6c0a00>] ? insert_kthread_work+0x40/0x40
      [<ffffffffaad44837>] ret_from_fork_nospec_begin+0x21/0x21
      [<ffffffffaa6c0a00>] ? insert_kthread_work+0x40/0x40

       And lustre began reporting:
      LustreError: 108448:0:(ofd_dev.c:1627:ofd_create_hdl()) lquake-OST0003:[27917288460] destroys_in_progress already cleared

       

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: