Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6898

ldlm_resource_dump()) Granted locks (in reverse order)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.5.3
    • None
    • CENTOS6 Lustre2.5.3 Server MOFED2.4
      SLES11 Lustre2.5.3 Client MOFED3.0
    • 2
    • 9223372036854775807

    Description

      On the client we see errors

      [1437673998.848774] LustreError: 11-0: nbp8-MDT0000-mdc-ffff8806cb247400: Communicating with 10.151.27.60@o2ib, operation obd_ping failed with -107.^M
      [1437673998.860774] Lustre: nbp8-MDT0000-mdc-ffff8806cb247400: Connection to nbp8-MDT0000 (at 10.151.27.60@o2ib) was lost; in progress operations using this service will wait for recovery to complete^M
      [1437673998.880773] LustreError: 167-0: nbp8-MDT0000-mdc-ffff8806cb247400: This client was evicted by nbp8-MDT0000; in progress operations using this service will fail.^M
      [1437673998.916773] LustreError: 81375:0:(ldlm_resource.c:809:ldlm_resource_complain()) nbp8-MDT0000-mdc-ffff8806cb247400: namespace resource [0x360375393:0xe66d:0x0].0 (ffff8fc07bee8a80) refcount nonzero (1) after lock cleanup; forcing cleanup.^M
      [1437673998.940772] LustreError: 81375:0:(ldlm_resource.c:809:ldlm_resource_complain()) Skipped 2587 previous similar messages^M
      [1437673998.952772] LustreError: 81375:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x360375393:0xe66d:0x0].0 (ffff8fc07bee8a80) refcount = 2^M
      [1437673998.952772] LustreError: 81375:0:(ldlm_resource.c:1451:ldlm_resource_dump()) Granted locks (in reverse order):^M
      [1437673998.952772] LustreError: 81375:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: nbp8-MDT0000-mdc-ffff8806cb247400 lock: ffff8fc0fa26dbc0/0x7f099458bb92bf52 lrc: 2/0,0 mode: PR/PR res: [0x360375393:0xe66d:0x0].0 bits 0x1b rrc: 2 type: IBT flags: 0x12e400000000 nid: local remote: 0x551d423294fa4bce expref: -99 pid: 46426 timeout: 0 lvb_type: 3^M
      [1437673998.952772] LustreError: 81375:0:(ldlm_resource.c:1454:ldlm_resource_dump()) Skipped 3648 previous similar messages^M
      [1437673998.952772] LustreError: 81375:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x3603755cc:0x6454:0x0].0 (ffff8b075a9a8bc0) refcount = 2^M
      

      Server

      Jul 23 10:53:08 nbp8-mds1 kernel: LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 226s: evicting client at 10.151.63.50@o2ib  ns: mdt-nbp8-MDT0000_UUID lock: ffff882a2c6794c0/0x551d4232f4dfbb5e lrc: 3/0,0 mode: PR/PR res: [0x4976d01:0xe2d4a4f3:0x0].0 bits 0x13 rrc: 848 type: IBT flags: 0x60200000000020 nid: 10.151.63.50@o2ib remote: 0x7f099458bb9c07c4 expref: 9 pid: 9672 timeout: 8029699438 lvb_type: 0
      Jul 23 10:53:09 nbp8-mds1 kernel: LNet: 5828:0:(lib-move.c:865:lnet_post_send_locked()) Aborting message for 12345-10.151.12.174@o2ib: LNetM[DE]Unlink() already called on the MD/ME.
      Jul 23 10:53:09 nbp8-mds1 kernel: LNet: 5828:0:(lib-move.c:865:lnet_post_send_locked()) Skipped 41 previous similar messages
      Jul 23 10:53:39 nbp8-mds1 kernel: format at ldlm_pool.c:628:ldlm_pool_recalc doesn't end in newline
      

      On the client all ldlm_bl threads are stuck

      0xffff8a0739b06080    21185        2  1  711   R  0xffff8a0739b066f0  ldlm_bl_110^M
       [<ffffffff814760e8>] _raw_spin_unlock_irqrestore+0x8/0x10^M
       [<ffffffffa0d7a807>] osc_page_delete+0xe7/0x360 [osc]^M
       [<ffffffffa0ad14d5>] cl_page_delete0+0xc5/0x4e0 [obdclass]^M
       [<ffffffffa0ad192a>] cl_page_delete+0x3a/0x120 [obdclass]^M
       [<ffffffffa0ee16a6>] ll_invalidatepage+0x96/0x160 [lustre]^M
       [<ffffffffa0ef314d>] vvp_page_discard+0x8d/0x120 [lustre]^M
       [<ffffffffa0acda58>] cl_page_invoid+0x78/0x170 [obdclass]^M
       [<ffffffffa0ad490c>] discard_cb+0xbc/0x1e0 [obdclass]^M
       [<ffffffffa0ad2467>] cl_page_gang_lookup+0x1f7/0x3f0 [obdclass]^M
       [<ffffffffa0ad471a>] cl_lock_discard_pages+0xfa/0x1d0 [obdclass]^M
       [<ffffffffa0d7c0d2>] osc_lock_flush+0xf2/0x260 [osc]^M
       [<ffffffffa0d7c339>] osc_lock_cancel+0xf9/0x1e0 [osc]^M
       [<ffffffffa0ad2bd5>] cl_lock_cancel0+0x65/0x150 [obdclass]^M
       [<ffffffffa0ad394b>] cl_lock_cancel+0x14b/0x150 [obdclass]^M
       [<ffffffffa0d7cc1d>] osc_lock_blocking+0x5d/0xf0 [osc]^M
       [<ffffffffa0d7dff9>] osc_dlm_blocking_ast0+0xf9/0x210 [osc]^M
       [<ffffffffa0d7e15c>] osc_ldlm_blocking_ast+0x4c/0x100 [osc]^M
       [<ffffffffa0be4eef>] ldlm_cancel_callback+0x5f/0x180 [ptlrpc]^M
       [<ffffffffa0bf380f>] ldlm_cli_cancel_local+0x7f/0x480 [ptlrpc]^M
       [<ffffffffa0bf6b82>] ldlm_cli_cancel_list_local+0xf2/0x290 [ptlrpc]^M
       [<ffffffffa0bfba07>] ldlm_bl_thread_main+0xf7/0x450 [ptlrpc]^M
       [<ffffffff81083ae6>] kthread+0x96/0xa0^M
       [<ffffffff8147f164>] kernel_thread_helper+0x4/0x10^M
      

      These events will cause the MDS IO to stop for a few minutes.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: