Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

This seems to be hanging for me relatively frequently:

== recovery-small test 66: lock enqueue re-send vs client eviction == 21:24:18 (1438219458)
fail_loc=0x8000030c
  File: `/mnt/lustre/f66.recovery-small'
  Size: 0         	Blocks: 0          IO Block: 4194304 regular empty file
Device: 2c54f966h/743766374d	Inode: 144115809252278274  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-29 21:24:19.000000000 -0400
Modify: 2015-07-29 21:24:19.000000000 -0400
Change: 2015-07-29 21:24:19.000000000 -0400
fail_loc=0
fail_loc=0x80000136
mdc.lustre-MDT0000-mdc-ffff8800815b17f0.import=connection=192.168.10.211@tcp
./../tests/test-framework.sh: line 2399: /mnt/lustre/recon: Input/output error

In dmesg we get:

[195769.388166] Lustre: DEBUG MARKER: == recovery-small test 66: lock enqueue re-send vs client eviction == 21:24:18 (1438219458)
[195773.859644] Lustre: 17831:0:(genops.c:1491:obd_export_evict_by_uuid()) lustre-MDT0000: evicting f797195c-b579-5409-6da6-f79d52d8f961 at adminstrative request
[195773.860464] Lustre: 17831:0:(genops.c:1491:obd_export_evict_by_uuid()) Skipped 1 previous similar message
[195773.906764] LustreError: 17610:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 136 sleeping for 40000ms
[195809.905048] Lustre: lustre-OST0001: haven't heard from client 9007d4b0-ffa9-1973-fae0-b8ec6df9deb5 (at 0@lo) in 48 seconds. I think it's dead, and I am evicting it. exp ffff88003bea07f0, cur 1438219499 expire 1438219469 last 1438219451
[195813.904060] LustreError: 17610:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 136 awake
[195813.904062] LNet: Service thread pid 17610 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[195813.904064] Pid: 17610, comm: mdt01_003
[195813.904065] 
[195813.904065] Call Trace:
[195813.904075]  [<ffffffff81522574>] ? _spin_lock_irqsave+0x24/0x30
[195813.904079]  [<ffffffff810866a2>] ? del_timer_sync+0x22/0x30
[195813.904081]  [<ffffffff8151f279>] ? schedule_timeout+0x199/0x2e0
[195813.904091]  [<ffffffffa0909c41>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[195813.904102]  [<ffffffffa0906c4c>] ? __cfs_fail_timeout_set+0x13c/0x160 [libcfs]
[195813.904116]  [<ffffffffa0c113a5>] ? mdt_getattr_name_lock+0x475/0x1910 [mdt]
[195813.904125]  [<ffffffffa0c16cd3>] ? old_init_ucred+0x1e3/0x390 [mdt]
[195813.904133]  [<ffffffffa0c12d62>] ? mdt_intent_getattr+0x292/0x470 [mdt]
[195813.904141]  [<ffffffffa0c02674>] ? mdt_intent_policy+0x494/0xc40 [mdt]
[195813.904169]  [<ffffffffa053ef6f>] ? ldlm_lock_enqueue+0x12f/0x860 [ptlrpc]
[195813.904194]  [<ffffffffa056ab97>] ? ldlm_handle_enqueue0+0x7e7/0x1520 [ptlrpc]
[195813.904227]  [<ffffffffa05f1111>] ? tgt_enqueue+0x61/0x230 [ptlrpc]
[195813.904256]  [<ffffffffa05f1d22>] ? tgt_request_handle+0xa42/0x1230 [ptlrpc]
[195813.904282]  [<ffffffffa059d5e4>] ? ptlrpc_main+0xd74/0x1850 [ptlrpc]
[195813.904284]  [<ffffffff8152240e>] ? _spin_unlock_irq+0xe/0x20
[195813.904287]  [<ffffffff81061829>] ? finish_task_switch+0x59/0xe0
[195813.904289]  [<ffffffff8151e3ee>] ? thread_return+0x4e/0x7e0
[195813.904314]  [<ffffffffa059c870>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
[195813.904317]  [<ffffffff8109ce4e>] ? kthread+0x9e/0xc0
[195813.904319]  [<ffffffff8100c24a>] ? child_rip+0xa/0x20
[195813.904321]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
[195813.904322]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
[195813.904323] 
[195813.904325] LustreError: dumping log to /tmp/lustre-log.1438219503.17610
[195813.913595] LNet: Service thread pid 17610 completed after 40.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).

Assignee:: Amir Shehata (Inactive)

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 30/Jul/15 1:46 PM

Updated:: 10/Oct/21 10:34 PM

Resolved:: 10/Oct/21 10:34 PM

Details

Description

Attachments

Activity

People

Dates