Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11100

Clients hangs in LNetMDUnlink

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • None
    • Lustre 2.10.3
    • client sles12sp2 lustre 2.10.3
      servers 2.7.3 and 2.10.3
    • 2
    • 9223372036854775807

    Description

      Clients hang in LNetMDUnlink. May be a dup of LU-11092 and LU-10669.

       

      [166855.238376] CPU: 33 PID: 2938 Comm: ptlrpcd_01_02 Tainted: P        W  OEL  NX 4.4.90-92.45.1.20171031-nasa #1
      [166855.238378] Hardware name: SGI.COM SUMMIT/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
      [166855.238381] task: ffff8807db820bc0 ti: ffff8807db824000 task.ti: ffff8807db824000
      [166855.238383] RIP: 0010:[<ffffffff810cc0a1>]  [<ffffffff810cc0a1>] native_queued_spin_lock_slowpath+0x111/0x1a0
      [166855.238392] RSP: 0018:ffff8807db827b98  EFLAGS: 00000246
      [166855.238393] RAX: 0000000000000000 RBX: ffff880fe93574e0 RCX: 0000000000880000
      [166855.238395] RDX: ffff88081e2567c0 RSI: 0000000000280001 RDI: ffff88101cdb6e00
      [166855.238396] RBP: ffff8807db827b98 R08: ffff88101db567c0 R09: 0000000000000000
      [166855.238398] R10: 0000000000000000 R11: ffff880ee98f8817 R12: 0000000000000008
      [166855.238400] R13: 000000000a222d0f R14: 0000000000000001 R15: 0000000000000000
      [166855.238402] FS:  0000000000000000(0000) GS:ffff88101db40000(0000) knlGS:0000000000000000
      [166855.238403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [166855.238405] CR2: 0000000000641038 CR3: 0000000001afe000 CR4: 00000000001406e0
      [166855.238407] Stack:
      [166855.238408]  ffff8807db827ba8 ffffffff8119162a ffff8807db827bb8 ffffffff8161e640
      [166855.238411]  ffff8807db827be0 ffffffffa0a96683 ffffffffa1dc78e7 0000000000000001
      [166855.238414]  000000002888b43d ffff8807db827cb8 ffffffffa0b254f5 ffffffffa1dc78d8
      [166855.238417] Call Trace:
      [166855.238431]  [<ffffffff8119162a>] queued_spin_lock_slowpath+0xb/0xf
      [166855.238439]  [<ffffffff8161e640>] _raw_spin_lock+0x20/0x30
      [166855.238467]  [<ffffffffa0a96683>] cfs_percpt_lock+0x53/0x100 [libcfs]
      [166855.238510]  [<ffffffffa0b254f5>] LNetMDUnlink+0x65/0x150 [lnet]
      [166855.238573]  [<ffffffffa1d5cc88>] ptlrpc_unregister_reply+0xf8/0x6f0 [ptlrpc]
      [166855.238636]  [<ffffffffa1d616d8>] ptlrpc_expire_one_request+0xb8/0x430 [ptlrpc]
      [166855.238674]  [<ffffffffa1d61aff>] ptlrpc_expired_set+0xaf/0x190 [ptlrpc]
      [166855.238719]  [<ffffffffa1d8f998>] ptlrpcd+0x258/0x4e0 [ptlrpc]
      [166855.238729]  [<ffffffff8109f276>] kthread+0xd6/0xf0
      [166855.238735]  [<ffffffff8161ed3f>] ret_from_fork+0x3f/0x70
      [166855.241341] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
      [166855.241342] 
      [166855.241343] Leftover inexact backtrace:
                      
      [166855.241348]  [<ffffffff8109f1a0>] ? kthread_park+0x60/0x60
       

      We will try to get a reproducer.

      Attachments

        1. nasa_debug.patch
          2 kB
          Amir Shehata
        2. nasa_LU-11079.patch
          3 kB
          Amir Shehata

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: