Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5527

compilebench hung in cl_lock_state_wait() when writing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 15385

    Description

      A compilebench run on Lola, where we inject 0.1% message drops between clients and servers, hung like this:

      Aug 20 20:29:41 lola-24 kernel: python        S 0000000000000001     0 53350  53330 0x00000080
      Aug 20 20:29:41 lola-24 kernel: ffff8807c033dbd8 0000000000000082 0000000000000000 ffff8807b52a4a98
      Aug 20 20:29:41 lola-24 kernel: ffff8807c033db78 ffffffffa0af7f8f ffff8807c033db78 ffff8807ef4d7cf8
      Aug 20 20:29:41 lola-24 kernel: ffff880804e15098 ffff8807c033dfd8 000000000000fbc8 ffff880804e15098
      Aug 20 20:29:41 lola-24 kernel: Call Trace:
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0af7f8f>] ? lov_sublock_unlock+0x5f/0x140 [lov]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ed643>] cl_lock_state_wait+0x1d3/0x320 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ede0b>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ee97e>] cl_lock_request+0x7e/0x270 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05f3934>] cl_io_lock+0x3c4/0x560 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05f3b72>] cl_io_loop+0xa2/0x1b0 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7e1c2>] ll_file_io_generic+0x412/0x8f0 [lustre]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05e3ca9>] ? cl_env_get+0x29/0x350 [obdclass]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7eee3>] ll_file_aio_write+0x133/0x2b0 [lustre]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7f1b9>] ll_file_write+0x159/0x290 [lustre]
      Aug 20 20:29:41 lola-24 kernel: [<ffffffff811892e8>] vfs_write+0xb8/0x1a0
      Aug 20 20:29:41 lola-24 kernel: [<ffffffff81189cb1>] sys_write+0x51/0x90
      Aug 20 20:29:41 lola-24 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Attachment compilebench_hang-lola-24.log contains the complete stack dump. This is likely to be difficult to reproduce.

      Attachments

        Activity

          People

            wc-triage WC Triage
            liwei Li Wei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: