Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4194

kfree fails kernel paging request under ldlm_lock_put()

    XMLWordPrintable

Details

    • 3
    • 11361

    Description

      We have a lustre client dedicated to the task of running robinhood. This node is running lustre 2.4.0-18chaos and talking to our 55PB zfs-osd filesystem running a similar tag of lustre.

      The lustre client node that runs robinhood is hitting a "BUG: unable to handle kernel paging request at <pointer>" as a result of attempting to kfree() a bad pointer in lustre's ldlm_lock_put() function.

      There are two stacks that we have seen lead up to this, both ending in the same failed kfree.

      The first is from ldlm_bl_* threads:

      cfs_free
      ldlm_lock_put
      ldlm_cli_cancel_list
      ldlm_bl_thread_main
      

      The second was under a robinhood process:

      cfs_free
      ldlm_lock_put
      ldlm_cli_cancel_list
      ldlm_prep_elc_req
      ldlm_prep_enqueue_req
      mdc_intent_getattr_pack
      mdc_enqueue
      mdc_intent_lock
      lmv_intent_lookup
      lmv_intent_lock
      ll_lookup_it
      ll_lookup_nd
      do_lookup
      __link_path_walk
      path_walk
      do_path_lookup
      user_path_at
      vfs_fstatat
      vfs_lstat
      sys_newlstat
      system_call_fastpath
      

      Note that it was the second call to ldlm_cli_cancel_list() in ldlm_prep_elc_req() that we were under.

      The problem is pretty reproducible on our system. The client node usually crashes in less than an hour. But I am not aware of how to reproduce this elsewhere. We have other server clusters with robinhood instances on clients that are not crashing.

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: