Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4624

ll_stop_statahead deadlock

    XMLWordPrintable

Details

    • 3
    • 12653

    Description

      We are seeing processes on Lustre clients hang in close():

      PID: 67983  TASK: ffff8804448d0080  CPU: 10  COMMAND: "java"
       #0 [ffff88041b969d08] schedule at ffffffff815109b2
       #1 [ffff88041b969dd0] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff88041b969de0] ll_stop_statahead+0x1b8 at ffffffffa0aad1b8 [lustre]
       #3 [ffff88041b969e60] ll_file_release+0x2d8 at ffffffffa0a6b698 [lustre]
       #4 [ffff88041b969ea0] ll_dir_release+0xdb at ffffffffa0a52d5b [lustre]
       #5 [ffff88041b969ec0] __fput+0x108 at ffffffff81183898
       #6 [ffff88041b969f10] fput+0x25 at ffffffff811839e5
       #7 [ffff88041b969f20] filp_close+0x5d at ffffffff8117eddd
       #8 [ffff88041b969f50] sys_close+0xa5 at ffffffff8117eeb5
       #9 [ffff88041b969f80] system_call_fastpath+0x16 at ffffffff8100b0b2
          RIP: 00002aaaaacdb5ad  RSP: 00002aaab505f3f8  RFLAGS: 00010206
          RAX: 0000000000000003  RBX: ffffffff8100b0b2  RCX: 000000000000003a
          RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000131
          RBP: 0000000000000131   R8: 00002aaaab586580   R9: 00002aaaab33a780
          R10: 0000000000000131  R11: 0000000000000293  R12: 0000000000000002
          R13: 00002aaabc013df0  R14: 00002aaabc107700  R15: 00002aaabc013df0
          ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
       
      PID: 67984  TASK: ffff880639bcb500  CPU: 10  COMMAND: "ll_sa_66672"
       #0 [ffff88065abc9d40] schedule at ffffffff815109b2
       #1 [ffff88065abc9e08] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff88065abc9e18] ll_statahead_thread+0x59e at ffffffffa0ab288e [lustre]
       #3 [ffff88065abc9f48] child_rip+0xa at ffffffff8100c10a
       
      PID: 67985  TASK: ffff880d2faba080  CPU: 11  COMMAND: "ll_agl_66672"
       #0 [ffff8809d06fddc0] schedule at ffffffff815109b2
       #1 [ffff8809d06fde88] cfs_waitq_wait+0xe at ffffffffa043577e [libcfs]
       #2 [ffff8809d06fde98] ll_agl_thread+0x44a at ffffffffa0aadbba [lustre]
       #3 [ffff8809d06fdf48] child_rip+0xa at ffffffff8100c10a
      

      We are running 2.4.0-19chaos (see http://github.com/chaos/lustre) on these clients.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: