Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8792

Interop - master<->2.8 :sanity-hsm test_107: hung while umount MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.9.0, Lustre 2.10.0
    • None
    • Interop - 2.8.0 EL6.7 Server/ EL7.2 Client
      Server - b2_8_fe build# 12, RHEL6.7
      Client - master ,build# 3468 RHEL 7.2
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5d28dbcc-a076-11e6-8761-5254006e85c2.

      The sub-test test_107 failed with the following error:

      test failed to respond and timed out
      

      OST console:

      23:51:00:Lustre: DEBUG MARKER: == sanity-hsm test 107: Copytool re-register after MDS restart ======================================= 22:50:03 (1477954203)
      23:51:00:LustreError: 11-0: lustre-MDT0000-lwp-OST0001: operation obd_ping to node 10.9.5.239@tcp failed: rc = -107
      23:51:00:Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete
      
      23:51:01:automount     D 0000000000000000     0  3827      1 0x00000080
      23:51:01: ffff88003e827be8 0000000000000082 00000000ffffffff 000023fcaf1a2ab5
      23:51:01: ffff880037f78070 ffff8800466a4bb0 000000000081ce38 ffffffffa7e85d92
      23:51:01: 000000001cc8adaa 0000000100d02944 ffff88005a82bad8 ffff88003e827fd8
      23:51:01:Call Trace:
      23:51:01: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40
      23:51:01: [<ffffffff815399b3>] io_schedule+0x73/0xc0
      23:51:01: [<ffffffff811279ad>] sync_page+0x3d/0x50
      23:51:01: [<ffffffff811279ce>] sync_page_killable+0xe/0x40
      23:51:01: [<ffffffff8153a24a>] __wait_on_bit_lock+0x5a/0xc0
      23:51:01: [<ffffffff811278d7>] __lock_page_killable+0x67/0x70
      23:51:01: [<ffffffff810a14e0>] ? wake_bit_function+0x0/0x50
      23:51:01: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0
      23:51:01: [<ffffffff81129604>] generic_file_aio_read+0x4b4/0x700
      23:51:01: [<ffffffff81191e4a>] do_sync_read+0xfa/0x140
      23:51:01: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
      23:51:01: [<ffffffff81159b85>] ? do_mmap_pgoff+0x335/0x380
      23:51:01: [<ffffffff81232036>] ? security_file_permission+0x16/0x20
      23:51:01: [<ffffffff81192745>] vfs_read+0xb5/0x1a0
      23:51:01: [<ffffffff8119351f>] ? fget_light_pos+0x3f/0x50
      23:51:01: [<ffffffff81192a91>] sys_read+0x51/0xb0
      23:51:01: [<ffffffff810e8c7e>] ? __audit_syscall_exit+0x25e/0x290
      23:51:01: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      

      MDS consoe:

      23:17:51:LustreError: 17770:0:(client.c:1133:ptlrpc_import_delay_req()) Skipped 9 previous similar messages
      23:17:51:Lustre: lustre-MDT0000: Not available for connect from 10.9.5.241@tcp (stopping)
      23:17:51:Lustre: Skipped 1130 previous similar messages
      23:17:51:LustreError: 0-0: Forced cleanup waiting for mdt-lustre-MDT0000_UUID namespace with 1 resources in use, (rc=-110)
      23:17:51:LustreError: Skipped 102 previous similar messages
      23:17:51:LustreError: 17786:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5
      23:17:51:LustreError: 17786:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 9 previous similar messages
      23:17:51:LustreError: 17822:0:(client.c:1133:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8800444f73c0 x1549736680098240/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
      23:17:51:LustreError: 17822:0:(client.c:1133:ptlrpc_import_delay_req()) Skipped 17 previous similar messages
      23:17:51:LustreError: 17836:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5
      23:17:51:LustreError: 17836:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 17 previous similar messages
      23:17:51:Lustre: lustre-MDT0000: Not available for connect from 10.9.5.242@tcp (stopping)
      23:17:51:Lustre: Skipped 1321 previous similar messages
      23:17:51:LustreError: 0-0: Forced cleanup waiting for mdt-lustre-MDT0000_UUID namespace with 1 resources in use, (rc=-110)
      23:50:37:********** Timeout by autotest system **********
      

      Client Console:

      23:50:57:[10301.531931] Lustre: DEBUG MARKER: == sanity-hsm test 107: Copytool re-register after MDS restart ======================================= 22:50:03 (1477954203)
      23:50:57:[10311.388026] LustreError: 11-0: lustre-MDT0000-mdc-ffff880007dc1800: operation obd_ping to node 10.9.5.239@tcp failed: rc = -107
      23:50:57:[10311.396253] Lustre: lustre-MDT0000-mdc-ffff880007dc1800: Connection to lustre-MDT0000 (at 10.9.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete
      
      23:50:57:[13944.654013] auditd          D ffff88007bb8fd90     0   490      1 0x00000000
      23:50:57:[13944.654013]  ffff88007bb8fc30 0000000000000086 ffff880036707300 ffff88007bb8ffd8
      23:50:57:[13944.654013]  ffff88007bb8ffd8 ffff88007bb8ffd8 ffff880036707300 ffff88007fc167c0
      23:50:57:[13944.654013]  0000000000000000 7fffffffffffffff ffffffff81168d10 ffff88007bb8fd90
      23:50:57:[13944.654013] Call Trace:
      23:50:57:[13944.654013]  [<ffffffff81168d10>] ? wait_on_page_read+0x60/0x60
      23:50:57:[13944.654013]  [<ffffffff8163bb39>] schedule+0x29/0x70
      23:50:57:[13944.654013]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      23:50:57:[13944.654013]  [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30
      23:50:57:[13944.654013]  [<ffffffff81168d10>] ? wait_on_page_read+0x60/0x60
      23:50:57:[13944.654013]  [<ffffffff8163b16e>] io_schedule_timeout+0xae/0x130
      23:50:57:[13944.654013]  [<ffffffff8163b208>] io_schedule+0x18/0x20
      23:50:57:[13944.654013]  [<ffffffff81168d1e>] sleep_on_page+0xe/0x20
      23:50:57:[13944.654013]  [<ffffffff816399b0>] __wait_on_bit+0x60/0x90
      23:50:57:[13944.654013]  [<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0
      23:50:57:[13944.654013]  [<ffffffff810a6c00>] ? wake_atomic_t_function+0x40/0x40
      23:50:57:[13944.654013]  [<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0
      23:50:57:[13944.654013]  [<ffffffff81175d3e>] ? do_writepages+0x1e/0x40
      23:50:57:[13944.654013]  [<ffffffff8116aad5>] ? __filemap_fdatawrite_range+0x65/0x80
      23:50:57:[13944.654013]  [<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70
      23:50:57:[13944.654013]  [<ffffffffa01807aa>] ext4_sync_file+0xba/0x320 [ext4]
      23:50:57:[13944.654013]  [<ffffffff81210185>] do_fsync+0x65/0xa0
      23:50:57:[13944.654013]  [<ffffffff81210450>] SyS_fsync+0x10/0x20
      23:50:57:[13944.654013]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: