Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5913

client stuck in ptlrpc_invalidate_import

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.3
    • None
    • client and server version lustre2.4.3-7nas
    • 3
    • 16510

    Description

      ?DUP of LU-10?
      1. Client get a odb_ping failed
      2. client is evicted
      3. client can't reconnect and stuck in ptlrpc_invalidate_import

      Nov 12 04:32:30 pfe22 kernel: [69643.816473] LustreError: 11-0: nbp9-OST0075-osc-ffff880a28404400: Communicating with 10.151.26.11@o2ib, operation obd_ping failed with -107.
      Nov 12 04:32:30 pfe22 kernel: [69643.854158] Lustre: nbp9-OST0075-osc-ffff880a28404400: Connection to nbp9-OST0075 (at 10.151.26.11@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Nov 12 04:32:30 pfe22 kernel: [69643.922758] LustreError: 167-0: nbp9-OST0075-osc-ffff880a28404400: This client was evicted by nbp9-OST0075; in progress operations using this service will fail.
      Nov 12 04:34:11 pfe22 kernel: [69743.757653] LustreError: 92481:0:(import.c:324:ptlrpc_invalidate_import()) nbp9-OST0075_UUID: rc = -110 waiting for callback (1 != 0)
      Nov 12 04:34:11 pfe22 kernel: [69743.793518] LustreError: 92481:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list  req@ffff8802204cec00 x1484496199485132/t0(0) o4->nbp9-OST0075-osc-ffff880a28404400@10.151.26.11@o2ib:6/4 lens 488/448 e 0 to 0 dl 1415726912 ref 2 fl Rpc:RE/0/ffffffff rc -5/-1
      Nov 12 04:34:11 pfe22 kernel: [69743.866993] LustreError: 92481:0:(import.c:366:ptlrpc_invalidate_import()) nbp9-OST0075_UUID: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
      

      We find ldlm_bl_ threads stuck in D state.

      Stack traceback for pid 8080
      0xffff8802569b2540     8080        2  0    1   D  0xffff8802569b2bb0  ldlm_bl_04
       [<ffffffff8146fb6b>] thread_return+0x0/0x295
       [<ffffffff81470c58>] __mutex_lock_slowpath+0xf8/0x150
       [<ffffffff814706ea>] mutex_lock+0x1a/0x40
       [<ffffffffa08bd28e>] cl_lock_mutex_get+0x6e/0xc0 [obdclass]
       [<ffffffffa0b8988e>] osc_dlm_blocking_ast0+0x5e/0x210 [osc]
       [<ffffffffa0b89a8c>] osc_ldlm_blocking_ast+0x4c/0x100 [osc]
       [<ffffffffa09e30a0>] ldlm_handle_bl_callback+0xc0/0x420 [ptlrpc]
       [<ffffffffa09e3609>] ldlm_bl_thread_main+0x209/0x430 [ptlrpc]
       [<ffffffff8147ade4>] kernel_thread_helper+0x4/0x10
      [0]kdb> btp 8090
      Stack traceback for pid 8090
      0xffff88024c9a6640     8090        2  0    6   D  0xffff88024c9a6cb0  ldlm_bl_12
       [<ffffffff8146fb6b>] thread_return+0x0/0x295
       [<ffffffff81470c58>] __mutex_lock_slowpath+0xf8/0x150
       [<ffffffff814706ea>] mutex_lock+0x1a/0x40
       [<ffffffffa08bd28e>] cl_lock_mutex_get+0x6e/0xc0 [obdclass]
       [<ffffffffa0b8988e>] osc_dlm_blocking_ast0+0x5e/0x210 [osc]
       [<ffffffffa0b89a8c>] osc_ldlm_blocking_ast+0x4c/0x100 [osc]
       [<ffffffffa09e30a0>] ldlm_handle_bl_callback+0xc0/0x420 [ptlrpc]
       [<ffffffffa09e3609>] ldlm_bl_thread_main+0x209/0x430 [ptlrpc]
       [<ffffffff8147ade4>] kernel_thread_helper+0x4/0x10
      [0]kdb> go
      

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: