Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-706

1.8<->2.1 interop: recovery-small test 51 hung

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 3
    • 10720

    Description

      After the clean upgrading, running recovery-small test 51 hung, dmesg on the client node client-12 showed that:

      INFO: task writemany:32634 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      writemany     D 0000000000000002     0 32634  32632 0x00000080
       ffff8802dc4f9d38 0000000000000082 0000000000000000 02001bcd48000001
       ffff880202001bcd ffff8802d7a84238 ffff8802000041ed 00000001002e2106
       ffff880326b8b038 ffff8802dc4f9fd8 000000000000f598 ffff880326b8b038
      Call Trace:
       [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff8115b102>] ? kmem_cache_alloc+0x182/0x190
       [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50
       [<ffffffff81182650>] do_filp_open+0x2c0/0xd90
       [<ffffffff811af00e>] ? __fsnotify_inode_delete+0xe/0x10
       [<ffffffff8118c6aa>] ? __destroy_inode+0x2a/0xa0
       [<ffffffff8117f005>] ? putname+0x35/0x50
       [<ffffffff8118f0d2>] ? alloc_fd+0x92/0x160
       [<ffffffff8116f879>] do_sys_open+0x69/0x140
       [<ffffffff8100c635>] ? math_state_restore+0x45/0x60
       [<ffffffff8116f990>] sys_open+0x20/0x30
       [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      INFO: task writemany:32635 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      writemany     D 0000000000000001     0 32635  32632 0x00000080
       ffff8802d6e67e08 0000000000000086 0000000000000000 ffffffff812061ff
       ffff8802d6e67de8 ffffffff8118087d 0000000000000000 00000001002e211b
       ffff8802dd459ab8 ffff8802d6e67fd8 000000000000f598 ffff8802dd459ab8
      Call Trace:
       [<ffffffff812061ff>] ? security_inode_permission+0x1f/0x30
       [<ffffffff8118087d>] ? __link_path_walk+0x11d/0x820
       [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50
       [<ffffffff81183456>] do_unlinkat+0x96/0x1c0
       [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff81183596>] sys_unlink+0x16/0x20
       [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      INFO: task writemany:32637 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      writemany     D 0000000000000002     0 32637  32632 0x00000080
       ffff8802dc6e3e08 0000000000000086 0000000000000000 ffffffff812061ff
       ffff8802dc6e3de8 ffffffff8118087d 0000000000000000 00000001002e20c1
       ffff8802d0799af8 ffff8802dc6e3fd8 000000000000f598 ffff8802d0799af8
      Call Trace:
       [<ffffffff812061ff>] ? security_inode_permission+0x1f/0x30
       [<ffffffff8118087d>] ? __link_path_walk+0x11d/0x820
       [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50
       [<ffffffff81183456>] do_unlinkat+0x96/0x1c0
       [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff81183596>] sys_unlink+0x16/0x20
       [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      LustreError: 32638:0:(client.c:1899:ptlrpc_interrupted_set()) INTERRUPTED SET ffff8802e30e2b40
      LustreError: 32638:0:(file.c:158:ll_close_inode_openhandle()) inode 144146779976237058 mdc close failed: rc = -4
      INFO: task ptlrpcd-rcv:6723 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ptlrpcd-rcv   D 0000000000000001     0  6723      2 0x00000080
       ffff8803222cd7f0 0000000000000046 0000000000000000 0000000000000000
       ffff8802cdaf92b8 ffff88031b326400 ffff8802cdaf9180 00000001002f14f2
       ffff8803222c9078 ffff8803222cdfd8 000000000000f598 ffff8803222c9078
      Call Trace:
       [<ffffffffa02f96fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa065d140>] seq_client_flush+0x80/0x110 [fid]
       [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
       [<ffffffffa066bb7c>] mdc_fid_delete+0x1c/0x20 [mdc]
       [<ffffffffa07b1c4f>] ll_delete_inode+0x12f/0x650 [lustre]
       [<ffffffffa07b1b20>] ? ll_delete_inode+0x0/0x650 [lustre]
       [<ffffffff8118d15e>] generic_delete_inode+0xde/0x1d0
       [<ffffffff8118d2b5>] generic_drop_inode+0x65/0x80
       [<ffffffff8118c122>] iput+0x62/0x70
       [<ffffffffa07ce640>] ll_md_blocking_ast+0x270/0x600 [lustre]
       [<ffffffffa04b5ba3>] ldlm_cancel_callback+0x63/0xf0 [ptlrpc]
       [<ffffffffa04b5c89>] ldlm_lock_cancel+0x59/0x190 [ptlrpc]
       [<ffffffffa04d1e91>] ? ldlm_prepare_lru_list+0x281/0x460 [ptlrpc]
       [<ffffffffa04d2df8>] ldlm_cli_cancel_list_local+0x78/0x230 [ptlrpc]
       [<ffffffffa0303691>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
       [<ffffffffa04cf9b0>] ? ldlm_cancel_no_wait_policy+0x0/0xf0 [ptlrpc]
       [<ffffffffa04d3265>] ldlm_cancel_lru_local+0x35/0x40 [ptlrpc]
       [<ffffffffa04d3395>] ldlm_replay_locks+0x125/0x7d0 [ptlrpc]
       [<ffffffffa0520f5a>] ptlrpc_import_recovery_state_machine+0x8ca/0xd20 [ptlrpc]
       [<ffffffffa04fe19c>] ? lustre_msg_get_last_committed+0x7c/0xe0 [ptlrpc]
       [<ffffffffa0521ba8>] ptlrpc_connect_interpret+0x5d8/0x1fe0 [ptlrpc]
       [<ffffffffa04f17ea>] ptlrpc_check_set+0x47a/0x1b20 [ptlrpc]
       [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
       [<ffffffff8107a1a2>] ? del_timer_sync+0x22/0x30
       [<ffffffffa0525428>] ptlrpcd_check+0x1c8/0x250 [ptlrpc]
       [<ffffffffa05257cb>] ptlrpcd+0x31b/0x390 [ptlrpc]
       [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
       [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc]
       [<ffffffff8100c1ca>] child_rip+0xa/0x20
       [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc]
       [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc]
       [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      

      Maloo report: https://maloo.whamcloud.com/test_sets/d10f8468-e515-11e0-9909-52540025f9af

      This test passed while running it after clean upgrading from Lustre Lustre 1.8.5 and 1.8.6-wc1 to Lustre 2.1.0 RC1:
      https://maloo.whamcloud.com/sub_tests/d9c48674-d82e-11e0-8d02-52540025f9af

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: