XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • Sequoia client, lustre 2.3.54-2chaos, github.com/chaos/lustre. Servers were running lustre 2.3.54-6chaos.
    • 3
    • 5563

    Description

      When running a 98,304 task ior, one of the lustre clients (Sequoia I/O Node) hit this:

      2012-11-14 15:26:41.248076 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
      me, I'm ptlrpcd_7:3288
      2012-11-14 15:26:41.287858 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
      me, I'm sysiod:3752
      

      I believe there were then tasks stuck in read(). sysiod is the process on the I/O Node that is part of the I/O forwarding system, and is doing I/O on behalf of an ior process on a Sequoia compute node.

      The attached file "seqio685_console.txt" shows more of the console output when the problem hit. "seqio685_lustre_log.txt" contains the "lctl dk" output. "seqio685_backtraces.txt" contains the output of sysrq "l" and sysrq "t".

      Attachments

        1. seqio685_backtraces.txt
          1.75 MB
          Christopher Morrone
        2. seqio685_console.txt
          2.82 MB
          Christopher Morrone
        3. seqio685_lustre_log.txt
          0.2 kB
          Christopher Morrone

        Activity

          People

            jay Jinshan Xiong (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: