Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6143

Client hangs with Lustre 2.6

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.6.0
    • None
    • RHEL6.5 clients running Lustre 2.6.0
    • 3
    • 17122

    Description

      Recently users have run into conditions where their file transfers never complete. The back trace observed is as follows:

      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: INFO: task tar:12650 blocked
      for more than 120 seconds.
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: Tainted: G W
      --------------- 2.6.32-504.el6.x86_64 #1
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: "echo 0 >
      /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: tar D
      0000000000000002 0 12650 11138 0x00000080
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffff8805f6919aa8
      0000000000000086 ffff8805f6919a48 ffff8805f6919a28
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffff8805f6919a28
      0000000000000082 ffff880122eebaa8 ffff8800282919a0
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffffffff81aac480
      0000000000000282 ffff8808398b7058 ffff8805f6919fd8
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: Call Trace:
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a41e25>] ?
      lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152b346>]
      __mutex_lock_slowpath+0x96/0x210
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a696ce>] ?
      req_capsule_get_size+0x4e/0x90 [ptlrpc]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152ae6b>]
      mutex_lock+0x2b/0x50
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1a96c>]
      mdc_reint+0x3c/0x3b0 [mdc]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1c6f0>]
      mdc_setattr+0x2a0/0xa00 [mdc]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0bc6962>]
      lmv_setattr+0x232/0x5c0 [lmv]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfcfe6>]
      ll_md_setattr+0x116/0x940 [lustre]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a1fc60>] ?
      ldlm_completion_ast+0x0/0x930 [ptlrpc]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152c166>] ?
      down_read+0x16/0x30
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfde6f>]
      ll_setattr_raw+0x24f/0x10d0 [lustre]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
      mntput_no_expire+0x30/0x110
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfed55>]
      ll_setattr+0x65/0xd0 [lustre]
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811ad0a8>]
      notify_change+0x168/0x340
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c15ac>]
      utimes_common+0xdc/0x1b0
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff810f036e>] ?
      call_rcu+0xe/0x10
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
      mntput_no_expire+0x30/0x110
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c1750>]
      do_utimes+0xd0/0x170
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c18f2>]
      sys_utimensat+0x32/0x90
      Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8100b072>]
      system_call_fastpath+0x16/0x1b

      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: Call Trace:
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a41e25>] ?
      lustre_msg_buf+0x55/0x60 [ptlrpc]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152b346>]
      __mutex_lock_slowpath+0x96/0x210
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a696ce>] ?
      req_capsule_get_size+0x4e/0x90 [ptlrpc]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152ae6b>]
      mutex_lock+0x2b/0x50
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1a96c>]
      mdc_reint+0x3c/0x3b0 [mdc]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1c6f0>]
      mdc_setattr+0x2a0/0xa00 [mdc]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0bc6962>]
      lmv_setattr+0x232/0x5c0 [lmv]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfcfe6>]
      ll_md_setattr+0x116/0x940 [lustre]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a1fc60>] ?
      ldlm_completion_ast+0x0/0x930 [ptlrpc]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152c166>] ?
      down_read+0x16/0x30
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfde6f>]
      ll_setattr_raw+0x24f/0x10d0 [lustre]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
      mntput_no_expire+0x30/0x110
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfed55>]
      ll_setattr+0x65/0xd0 [lustre]
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811ad0a8>]
      notify_change+0x168/0x340
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c15ac>]
      utimes_common+0xdc/0x1b0
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff810f036e>] ?
      call_rcu+0xe/0x10
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
      mntput_no_expire+0x30/0x110
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c1750>]
      do_utimes+0xd0/0x170
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c18f2>]
      sys_utimensat+0x32/0x90
      Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8100b072>]
      system_call_fastpath+0x16/0x1b

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            simmonsja James A Simmons
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: