Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.12.0, Lustre 2.10.6
Affects Version/s: Lustre 2.10.4
Labels:
- llnl
Environment:
lustre-2.10.4_1.chaos-1.ch6.x86_64 servers
RHEL 7.5
DNE1 file system

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Servers were restarted and appeared to recover normally. They briefly appeared to be handling the same (heavy) workload from before they were powered off, then started logging the "system was overloaded" message. The kernel then reported several stacks like this:

INFO: task ll_ost00_007:108440 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ll_ost00_007 D ffff8ba4dc35bf40 0 108440 2 0x00000080
Call Trace:
[<ffffffffaad38919>] schedule_preempt_disabled+0x39/0x90
[<ffffffffaad3654f>] __mutex_lock_slowpath+0x10f/0x250
[<ffffffffaad357f2>] mutex_lock+0x32/0x42
[<ffffffffc1669afb>] ofd_create_hdl+0xdcb/0x2090 [ofd]
[<ffffffffc1322007>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
[<ffffffffc132235f>] ? lustre_pack_reply_v2+0x14f/0x290 [ptlrpc]
[<ffffffffc1322691>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[<ffffffffc138653a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[<ffffffffc132db5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[<ffffffffc132b26b>] ? ptlrpc_wait_event+0xab/0x350 [ptlrpc]
[<ffffffffaa6d5c32>] ? default_wake_function+0x12/0x20
[<ffffffffaa6cb01b>] ? __wake_up_common+0x5b/0x90
[<ffffffffc1331c70>] ptlrpc_main+0xae0/0x1e90 [ptlrpc]
[<ffffffffc1331190>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
[<ffffffffaa6c0ad1>] kthread+0xd1/0xe0
[<ffffffffaa6c0a00>] ? insert_kthread_work+0x40/0x40
[<ffffffffaad44837>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffffaa6c0a00>] ? insert_kthread_work+0x40/0x40

And lustre began reporting:
LustreError: 108448:0:(ofd_dev.c:1627:ofd_create_hdl()) lquake-OST0003:[27917288460] destroys_in_progress already cleared

Attachments

Issue Links

is related to

LU-11399 use separate locks for orphan destroy and objects re-create at OFD

Open

Activity

People

Assignee:: Mikhail Pershin

Reporter:: Olaf Faaland

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Jul/18 7:08 AM

Updated:: 03/Jan/19 7:17 PM

Resolved:: 01/Sep/18 5:57 AM