[LU-1612] Test failure on test suite parallel-scale-nfsv3, subtest test_iorssf Created: 09/Jul/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-2205 recovery-small test 11 multiop eterna... Resolved
is related to LU-2208 deadlock in add_lsmref Resolved
Severity: 3
Rank (Obsolete): 5858

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/fe305eda-c8aa-11e1-ba35-52540035b04c.

The sub-test test_iorssf failed with the following error:

test failed to respond and timed out

15:06:14:LustreError: 0:0:(ldlm_lockd.c:374:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 0@lo  ns: mdt-ffff88000d869000 lock: ffff88007b7bb000/0x96e75471be57b4e9 lrc: 3/0,0 mode: CW/CW res: 8589949618/35607 bits 0x5 rrc: 3 type: IBT flags: 0x4000020 nid: 0@lo remote: 0x96e75471be57b4d4 expref: 10 pid: 572 timeout: 4348556478
15:06:15:LustreError: 572:0:(ldlm_lockd.c:1246:ldlm_handle_enqueue0()) ### lock on destroyed export ffff88005c2ca800 ns: mdt-ffff88000d869000 lock: ffff8800678ae240/0x96e75471be57b4fe lrc: 3/0,0 mode: PR/PR res: 8589949618/35607 bits 0x3 rrc: 2 type: IBT flags: 0x4000000 nid: 0@lo remote: 0x96e75471be57b4db expref: 10 pid: 572 timeout: 0
15:06:15:LustreError: 11-0: an error occurred while communicating with 0@lo. The ldlm_enqueue operation failed with -107
15:06:15:Lustre: lustre-MDT0000-mdc-ffff88005a526000: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
15:06:15:LNet: Service thread pid 572 completed after 150.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
15:06:15:LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
15:06:15:LustreError: 452:0:(mdc_locks.c:722:mdc_enqueue()) ldlm_cli_enqueue: -5
15:06:15:LustreError: 452:0:(file.c:2321:ll_inode_revalidate_fini()) failure -5 inode 144115440170339095
15:06:16:rpc-srv/tcp: nfsd: got error -104 when sending 56 bytes - shutting down socket
15:06:16:LustreError: 1298:0:(ldlm_resource.c:761:ldlm_resource_complain()) Namespace lustre-MDT0000-mdc-ffff88005a526000 resource refcount nonzero (1) after lock cleanup; forcing cleanup.
15:06:16:LustreError: 1298:0:(ldlm_resource.c:767:ldlm_resource_complain()) Resource: ffff880057864c00 (8589949618/35607/0/0) (rc: 1)
15:06:16:Lustre: lustre-MDT0000-mdc-ffff88005a526000: Connection restored to lustre-MDT0000 (at 0@lo)
15:08:02:INFO: task nfsd:447 blocked for more than 120 seconds.
15:08:03:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
15:08:03:nfsd          D 0000000000000000     0   447      2 0x00000080
15:08:03: ffff88006769dab0 0000000000000046 000000446769da30 ffffffffa0e33313
15:08:03: 000000000000008d 0020000000000080 4ff8b23c00000000 000000000001162a
15:08:03: ffff880079237a78 ffff88006769dfd8 000000000000f4e8 ffff880079237a78
15:08:03:Call Trace:
15:08:03: [<ffffffff814eea5e>] __mutex_lock_slowpath+0x13e/0x180
15:08:03: [<ffffffffa0dddb50>] ? ll_file_open+0x0/0xca0 [lustre]
15:08:03: [<ffffffff814ee8fb>] mutex_lock+0x2b/0x50
15:08:03: [<ffffffffa0dddd3d>] ll_file_open+0x1ed/0xca0 [lustre]
15:08:03: [<ffffffffa0dddb50>] ? ll_file_open+0x0/0xca0 [lustre]
15:08:03: [<ffffffff8117413a>] __dentry_open+0x10a/0x360
15:08:03: [<ffffffff811743e2>] dentry_open+0x52/0xc0
15:08:03: [<ffffffffa0344a2e>] nfsd_open+0x11e/0x210 [nfsd]
15:08:03: [<ffffffffa0344ea3>] nfsd_write+0xb3/0x100 [nfsd]
15:08:03: [<ffffffffa034c7ef>] nfsd3_proc_write+0xaf/0x140 [nfsd]
15:08:03: [<ffffffffa033d43e>] nfsd_dispatch+0xfe/0x240 [nfsd]
15:08:03: [<ffffffffa02ab534>] svc_process_common+0x344/0x640 [sunrpc]
15:08:04: [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
15:08:04: [<ffffffffa02abb70>] svc_process+0x110/0x160 [sunrpc]
15:08:04: [<ffffffffa033db62>] nfsd+0xc2/0x160 [nfsd]
15:08:04: [<ffffffffa033daa0>] ? nfsd+0x0/0x160 [nfsd]
15:08:04: [<ffffffff810909c6>] kthread+0x96/0xa0
15:08:04: [<ffffffff8100c14a>] child_rip+0xa/0x20
15:08:04: [<ffffffff81090930>] ? kthread+0x0/0xa0
15:08:04: [<ffffffff8100c140>] ? child_rip+0x0/0x20
15:08:04:INFO: task nfsd:448 blocked for more than 120 seconds.
15:08:04:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
15:08:04:nfsd          D 0000000000000001     0   448      2 0x00000080
15:08:04: ffff88003768d6c8 0000000000000046 0000000000000000 ffff88003768d670
15:08:04: ffffffff81038488 00000000ffffffff 00008625cd5d2fc1 0000000000000004
15:08:04: ffff8800378b3a78 ffff88003768dfd8 000000000000f4e8 ffff8800378b3a78
15:08:04:Call Trace:
15:08:04: [<ffffffff81038488>] ? pvclock_clocksource_read+0x58/0xd0
15:08:04: [<ffffffff81275b56>] ? vsnprintf+0x2b6/0x5f0
15:08:04: [<ffffffff814ef965>] rwsem_down_failed_common+0x95/0x1d0
15:08:04: [<ffffffffa0ce827b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
15:08:05: [<ffffffff814efaf6>] rwsem_down_read_failed+0x26/0x30
15:08:05: [<ffffffff81277654>] call_rwsem_down_read_failed+0x14/0x30
15:08:05: [<ffffffff814eeff4>] ? down_read+0x24/0x30
15:08:05: [<ffffffffa08ab21b>] lov_lsm_addref+0x2b/0xc0 [lov]
15:08:05: [<ffffffffa08b5445>] lov_io_init_raid0+0xb5/0x810 [lov]
15:08:05: [<ffffffffa08aba42>] lov_io_init+0x92/0x150 [lov]
15:08:05: [<ffffffffa0e29b19>] ? vvp_io_init+0x1f9/0x2e0 [lustre]
15:08:05: [<ffffffffa0550348>] cl_io_init0+0x98/0x160 [obdclass]
15:08:05: [<ffffffffa0553234>] cl_io_init+0x64/0x100 [obdclass]
15:08:05: [<ffffffffa0554a5d>] cl_io_rw_init+0x8d/0x200 [obdclass]
15:08:05: [<ffffffffa0dd4f50>] ll_file_io_generic+0xc0/0x560 [lustre]
15:08:05: [<ffffffffa0dd61cc>] ll_file_aio_write+0x13c/0x2c0 [lustre]
15:08:05: [<ffffffffa06597b0>] ? lustre_swab_mdt_body+0x0/0x150 [ptlrpc]
15:08:05: [<ffffffffa0dd6090>] ? ll_file_aio_write+0x0/0x2c0 [lustre]
15:08:05: [<ffffffff811764db>] do_sync_readv_writev+0xfb/0x140
15:08:05: [<ffffffffa04aa705>] ? mdc_set_open_replay_data+0x345/0x520 [mdc]
15:08:05: [<ffffffffa07974e9>] ? lmv_set_open_replay_data+0x299/0x7b0 [lmv]
15:08:05: [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40
15:08:06: [<ffffffff8120c6f6>] ? security_file_permission+0x16/0x20
15:08:06: [<ffffffff8117756f>] do_readv_writev+0xcf/0x1f0
15:08:06: [<ffffffff811776d6>] vfs_writev+0x46/0x60
15:08:06: [<ffffffffa03423d5>] nfsd_vfs_write+0x105/0x430 [nfsd]
15:08:06: [<ffffffff811743e2>] ? dentry_open+0x52/0xc0
15:08:06: [<ffffffffa0344a4e>] ? nfsd_open+0x13e/0x210 [nfsd]
15:08:06: [<ffffffffa0344ed7>] nfsd_write+0xe7/0x100 [nfsd]
15:08:06: [<ffffffffa034c7ef>] nfsd3_proc_write+0xaf/0x140 [nfsd]
15:08:06: [<ffffffffa034d6f0>] ? nfs3svc_decode_writeargs+0x100/0x180 [nfsd]
15:08:06: [<ffffffffa033d43e>] nfsd_dispatch+0xfe/0x240 [nfsd]
15:08:06: [<ffffffffa02ab534>] svc_process_common+0x344/0x640 [sunrpc]
15:08:06: [<ffffffffa02abb70>] svc_process+0x110/0x160 [sunrpc]
15:08:06: [<ffffffffa033db62>] nfsd+0xc2/0x160 [nfsd]
15:08:06: [<ffffffffa033daa0>] ? nfsd+0x0/0x160 [nfsd]
15:08:06: [<ffffffff810909c6>] kthread+0x96/0xa0
15:08:06: [<ffffffff8100c14a>] child_rip+0xa/0x20
15:08:06: [<ffffffff81090930>] ? kthread+0x0/0xa0
15:08:06: [<ffffffff8100c140>] ? child_rip+0x0/0x20
15:08:06:INFO: task nfsd:449 blocked for more than 120 seconds.
15:08:07:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
15:08:07:nfsd          D 0000000000000001     0   449      2 0x00000080
15:08:07: ffff880019005ab0 0000000000000046 0000004419005a30 ffffffffa0e33313
15:08:07: 000000000000008d 0020000000000080 4ff8b23c00000001 000000000000f785
15:08:07: ffff880037acd078 ffff880019005fd8 000000000000f4e8 ffff880037acd078
15:08:07:Call Trace:
15:08:07: [<ffffffff814eea5e>] __mutex_lock_slowpath+0x13e/0x180
15:08:07: [<ffffffffa0dddb50>] ? ll_file_open+0x0/0xca0 [lustre]
15:08:07: [<ffffffff814ee8fb>] mutex_lock+0x2b/0x50
15:08:07: [<ffffffffa0dddd3d>] ll_file_open+0x1ed/0xca0 [lustre]
15:08:07: [<ffffffffa0dddb50>] ? ll_file_open+0x0/0xca0 [lustre]
15:08:07: [<ffffffff8117413a>] __dentry_open+0x10a/0x360
15:08:07: [<ffffffff811743e2>] dentry_open+0x52/0xc0
15:08:07: [<ffffffffa0344a2e>] nfsd_open+0x11e/0x210 [nfsd]
15:08:07: [<ffffffffa0344ea3>] nfsd_write+0xb3/0x100 [nfsd]
15:08:07: [<ffffffffa034c7ef>] nfsd3_proc_write+0xaf/0x140 [nfsd]
15:08:07: [<ffffffffa033d43e>] nfsd_dispatch+0xfe/0x240 [nfsd]
15:08:07: [<ffffffffa02ab534>] svc_process_common+0x344/0x640 [sunrpc]
15:08:07: [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
15:08:08: [<ffffffffa02abb70>] svc_process+0x110/0x160 [sunrpc]
15:08:08: [<ffffffffa033db62>] nfsd+0xc2/0x160 [nfsd]
15:08:08: [<ffffffffa033daa0>] ? nfsd+0x0/0x160 [nfsd]
15:08:08: [<ffffffff810909c6>] kthread+0x96/0xa0
15:08:08: [<ffffffff8100c14a>] child_rip+0xa/0x20
15:08:08: [<ffffffff81090930>] ? kthread+0x0/0xa0
15:08:08: [<ffffffff8100c140>] ? child_rip+0x0/0x20


 Comments   
Comment by Jinshan Xiong (Inactive) [ 09/Jul/12 ]

This is imported by my recent patches of 2025 and 2026, I'll fix it.

Comment by Jinshan Xiong (Inactive) [ 10/Jul/12 ]

I can't find corresponding backtrace in dmesg either I can't figure out why it was blocked at lo_type_guard. Let's see if there is any other occurrences of this bug.

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:18:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.