Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10892

hang at 'echo clear > /proc/fs/lustre/ldlm/namespaces/.../lru_size'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.5.5
    • 3
    • 9223372036854775807

    Description

      We are encountering frequent hangs when we execute:

      echo clear > $server/lru_size
      

      where $server is a path like /proc/fs/lustre/ldlm/namespaces/ls6-OST000a-osc-<UUID>/.

      In the cases we've documented the target is an OST.   That OST shows as active in lfs check servers.  We see no indication of problems (on the OST (nothing in console logs, no flapping connections, etc.).

      The stack trace looks like this.

      __ldlm_bl_to_thread+0x144
      ldlm_bl_to_thread+0x473
      ldlm_bl_to_thread_list+0x19
      ldlm_cancel_lru+0x70
      lprocfs_lru_size_seq_write+0x10c
      proc_reg_write+0x7e
      ...
      

      The client version is lustre-2.5.5-11chaos. The server version is lustre 2.8.2.

      Code where stuck thread is blocking:

      (gdb) l *(__ldlm_bl_to_thread+0x144)
      0x28874 is in __ldlm_bl_to_thread (/usr/src/debug/lustre-2.5.5/lustre/ldlm/ldlm_lockd.c:1997).
      1992 wake_up(&blp->blp_waitq);
      1993
      1994 /* can not check blwi->blwi_flags as blwi could be already freed in
      1995 LCF_ASYNC mode */
      1996 if (!(cancel_flags & LCF_ASYNC))
      1997         wait_for_completion(&blwi->blwi_comp);
      1998
      1999 RETURN(0);
      2000 }
      2001
      (gdb) quit
      

       
      We find that we can kill the user space process without any obvious ill effects. The user space process dies.

      We are working on retiring our Lustre 2.5 systems, so a workaround is sufficient. Our questions are:
      1. Is it correct that the locks being purged are not protecting any dirty cache on the client?
      2. Can simply kill these stuck processes without data loss?

      Attachments

        Activity

          People

            pjones Peter Jones
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: