Uploaded image for project: 'Lustre Documentation'
  1. Lustre Documentation
  2. LUDOC-225

Possible ll_stop_statahead deadlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • None
    • 3
    • 12652

    Description

      It looks like we may have a deadlock in the statahead client code. Running lustre 2.4.0-19chaos (see http://github.com/chaos/lustre) we have some clients hanging on close() like so:

      PID: 67983  TASK: ffff8804448d0080  CPU: 10  COMMAND: "java"
       #0 [ffff88041b969d08] schedule at ffffffff815109b2
       #1 [ffff88041b969dd0] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff88041b969de0] ll_stop_statahead+0x1b8 at ffffffffa0aad1b8 [lustre]
       #3 [ffff88041b969e60] ll_file_release+0x2d8 at ffffffffa0a6b698 [lustre]
       #4 [ffff88041b969ea0] ll_dir_release+0xdb at ffffffffa0a52d5b [lustre]
       #5 [ffff88041b969ec0] __fput+0x108 at ffffffff81183898
       #6 [ffff88041b969f10] fput+0x25 at ffffffff811839e5
       #7 [ffff88041b969f20] filp_close+0x5d at ffffffff8117eddd
       #8 [ffff88041b969f50] sys_close+0xa5 at ffffffff8117eeb5
       #9 [ffff88041b969f80] system_call_fastpath+0x16 at ffffffff8100b0b2
          RIP: 00002aaaaacdb5ad  RSP: 00002aaab505f3f8  RFLAGS: 00010206
          RAX: 0000000000000003  RBX: ffffffff8100b0b2  RCX: 000000000000003a
          RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000131
          RBP: 0000000000000131   R8: 00002aaaab586580   R9: 00002aaaab33a780
          R10: 0000000000000131  R11: 0000000000000293  R12: 0000000000000002
          R13: 00002aaabc013df0  R14: 00002aaabc107700  R15: 00002aaabc013df0
          ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
      
      PID: 67984  TASK: ffff880639bcb500  CPU: 10  COMMAND: "ll_sa_66672"
       #0 [ffff88065abc9d40] schedule at ffffffff815109b2
       #1 [ffff88065abc9e08] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff88065abc9e18] ll_statahead_thread+0x59e at ffffffffa0ab288e [lustre]
       #3 [ffff88065abc9f48] child_rip+0xa at ffffffff8100c10a
       
      PID: 67985  TASK: ffff880d2faba080  CPU: 11  COMMAND: "ll_agl_66672"
       #0 [ffff8809d06fddc0] schedule at ffffffff815109b2
       #1 [ffff8809d06fde88] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff8809d06fde98] ll_agl_thread+0x44a at ffffffffa0aadbba [lustre]
       #3 [ffff8809d06fdf48] child_rip+0xa at ffffffff8100c10a
      

      Attachments

        Activity

          People

            jlevi Jodi Levi (Inactive)
            morrone Christopher Morrone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: