Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.9.0, Lustre 2.10.0
-
None
-
Interop - 2.8.0 EL6.7 Server/ EL7.2 Client
Server - b2_8_fe build# 12, RHEL6.7
Client - master ,build# 3468 RHEL 7.2
-
3
-
9223372036854775807
Description
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5d28dbcc-a076-11e6-8761-5254006e85c2.
The sub-test test_107 failed with the following error:
test failed to respond and timed out
OST console:
23:51:00:Lustre: DEBUG MARKER: == sanity-hsm test 107: Copytool re-register after MDS restart ======================================= 22:50:03 (1477954203) 23:51:00:LustreError: 11-0: lustre-MDT0000-lwp-OST0001: operation obd_ping to node 10.9.5.239@tcp failed: rc = -107 23:51:00:Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete 23:51:01:automount D 0000000000000000 0 3827 1 0x00000080 23:51:01: ffff88003e827be8 0000000000000082 00000000ffffffff 000023fcaf1a2ab5 23:51:01: ffff880037f78070 ffff8800466a4bb0 000000000081ce38 ffffffffa7e85d92 23:51:01: 000000001cc8adaa 0000000100d02944 ffff88005a82bad8 ffff88003e827fd8 23:51:01:Call Trace: 23:51:01: [<ffffffff811279c0>] ? sync_page_killable+0x0/0x40 23:51:01: [<ffffffff815399b3>] io_schedule+0x73/0xc0 23:51:01: [<ffffffff811279ad>] sync_page+0x3d/0x50 23:51:01: [<ffffffff811279ce>] sync_page_killable+0xe/0x40 23:51:01: [<ffffffff8153a24a>] __wait_on_bit_lock+0x5a/0xc0 23:51:01: [<ffffffff811278d7>] __lock_page_killable+0x67/0x70 23:51:01: [<ffffffff810a14e0>] ? wake_bit_function+0x0/0x50 23:51:01: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 23:51:01: [<ffffffff81129604>] generic_file_aio_read+0x4b4/0x700 23:51:01: [<ffffffff81191e4a>] do_sync_read+0xfa/0x140 23:51:01: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 23:51:01: [<ffffffff81159b85>] ? do_mmap_pgoff+0x335/0x380 23:51:01: [<ffffffff81232036>] ? security_file_permission+0x16/0x20 23:51:01: [<ffffffff81192745>] vfs_read+0xb5/0x1a0 23:51:01: [<ffffffff8119351f>] ? fget_light_pos+0x3f/0x50 23:51:01: [<ffffffff81192a91>] sys_read+0x51/0xb0 23:51:01: [<ffffffff810e8c7e>] ? __audit_syscall_exit+0x25e/0x290 23:51:01: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
MDS consoe:
23:17:51:LustreError: 17770:0:(client.c:1133:ptlrpc_import_delay_req()) Skipped 9 previous similar messages 23:17:51:Lustre: lustre-MDT0000: Not available for connect from 10.9.5.241@tcp (stopping) 23:17:51:Lustre: Skipped 1130 previous similar messages 23:17:51:LustreError: 0-0: Forced cleanup waiting for mdt-lustre-MDT0000_UUID namespace with 1 resources in use, (rc=-110) 23:17:51:LustreError: Skipped 102 previous similar messages 23:17:51:LustreError: 17786:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 23:17:51:LustreError: 17786:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 9 previous similar messages 23:17:51:LustreError: 17822:0:(client.c:1133:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8800444f73c0 x1549736680098240/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 23:17:51:LustreError: 17822:0:(client.c:1133:ptlrpc_import_delay_req()) Skipped 17 previous similar messages 23:17:51:LustreError: 17836:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 23:17:51:LustreError: 17836:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 17 previous similar messages 23:17:51:Lustre: lustre-MDT0000: Not available for connect from 10.9.5.242@tcp (stopping) 23:17:51:Lustre: Skipped 1321 previous similar messages 23:17:51:LustreError: 0-0: Forced cleanup waiting for mdt-lustre-MDT0000_UUID namespace with 1 resources in use, (rc=-110) 23:50:37:********** Timeout by autotest system **********
Client Console:
23:50:57:[10301.531931] Lustre: DEBUG MARKER: == sanity-hsm test 107: Copytool re-register after MDS restart ======================================= 22:50:03 (1477954203) 23:50:57:[10311.388026] LustreError: 11-0: lustre-MDT0000-mdc-ffff880007dc1800: operation obd_ping to node 10.9.5.239@tcp failed: rc = -107 23:50:57:[10311.396253] Lustre: lustre-MDT0000-mdc-ffff880007dc1800: Connection to lustre-MDT0000 (at 10.9.5.239@tcp) was lost; in progress operations using this service will wait for recovery to complete 23:50:57:[13944.654013] auditd D ffff88007bb8fd90 0 490 1 0x00000000 23:50:57:[13944.654013] ffff88007bb8fc30 0000000000000086 ffff880036707300 ffff88007bb8ffd8 23:50:57:[13944.654013] ffff88007bb8ffd8 ffff88007bb8ffd8 ffff880036707300 ffff88007fc167c0 23:50:57:[13944.654013] 0000000000000000 7fffffffffffffff ffffffff81168d10 ffff88007bb8fd90 23:50:57:[13944.654013] Call Trace: 23:50:57:[13944.654013] [<ffffffff81168d10>] ? wait_on_page_read+0x60/0x60 23:50:57:[13944.654013] [<ffffffff8163bb39>] schedule+0x29/0x70 23:50:57:[13944.654013] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 23:50:57:[13944.654013] [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30 23:50:57:[13944.654013] [<ffffffff81168d10>] ? wait_on_page_read+0x60/0x60 23:50:57:[13944.654013] [<ffffffff8163b16e>] io_schedule_timeout+0xae/0x130 23:50:57:[13944.654013] [<ffffffff8163b208>] io_schedule+0x18/0x20 23:50:57:[13944.654013] [<ffffffff81168d1e>] sleep_on_page+0xe/0x20 23:50:57:[13944.654013] [<ffffffff816399b0>] __wait_on_bit+0x60/0x90 23:50:57:[13944.654013] [<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0 23:50:57:[13944.654013] [<ffffffff810a6c00>] ? wake_atomic_t_function+0x40/0x40 23:50:57:[13944.654013] [<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0 23:50:57:[13944.654013] [<ffffffff81175d3e>] ? do_writepages+0x1e/0x40 23:50:57:[13944.654013] [<ffffffff8116aad5>] ? __filemap_fdatawrite_range+0x65/0x80 23:50:57:[13944.654013] [<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70 23:50:57:[13944.654013] [<ffffffffa01807aa>] ext4_sync_file+0xba/0x320 [ext4] 23:50:57:[13944.654013] [<ffffffff81210185>] do_fsync+0x65/0xa0 23:50:57:[13944.654013] [<ffffffff81210450>] SyS_fsync+0x10/0x20 23:50:57:[13944.654013] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
Attachments
Issue Links
- is related to
-
LU-2067 ldlm_resource_complain()) Namespace MGC resource refcount nonzero after lock cleanup
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...