Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.12.7
Labels:
- llnl
Environment:
3.10.0-1160.45.1.1chaos.ch6.x86_64
lustre-2.12.7_2.llnl
3.10.0-1160.53.1.1chaos.ch6.x86_64
lustre-2.12.8_6.llnl
RHEL7.9
zfs-0.7.11-9.8llnl

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We upgraded a lustre server cluster from lustre-2.12.7_2.llnl to lustre-2.12.8_6.llnl. Almost immediately after boot, clients begin reporting soft lockups on the console, with stacks like this:

2022-02-08 09:43:10 [1644342190.528916] 
Call Trace:
 queued_spin_lock_slowpath+0xb/0xf
 _raw_spin_lock+0x30/0x40
 cfs_percpt_lock+0xc1/0x110 [libcfs]
 lnet_discover_peer_locked+0xa0/0x450 [lnet]
 ? wake_up_atomic_t+0x30/0x30
 LNetPrimaryNID+0xd5/0x220 [lnet]
 ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
 target_handle_connect+0x12f1/0x2b90 [ptlrpc]
 ? enqueue_task_fair+0x208/0x6c0
 ? check_preempt_curr+0x80/0xa0
 ? ttwu_do_wakeup+0x19/0x100
 tgt_request_handle+0x4fa/0x1570 [ptlrpc]
 ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
 ? __getnstimeofday64+0x3f/0xd0
 ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
 ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc]
 ? __wake_up_common_lock+0x91/0xc0
 ? sched_feat_set+0xf0/0xf0
 ptlrpc_main+0xc49/0x1c50 [ptlrpc]
 ? __switch_to+0xce/0x5a0
 ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
 kthread+0xd1/0xe0
 ? insert_kthread_work+0x40/0x40
 ret_from_fork_nospec_begin+0x21/0x21
 ? insert_kthread_work+0x40/0x40

Some servers never exit recovery, and others do but seem to be unable to service requests.

Seen during the same lustre server update where we saw ~~LU-15539~~ but appears to be a separate issue.

Patch stacks are:
https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl
https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

vmcore-dmesg.copper1.txt
993 kB
10/Feb/22 2:43 AM
vmcore-dmesg.copper2.txt
569 kB
10/Feb/22 2:43 AM

Issue Links

is related to

LU-14668 LNet: do discovery in the background

Resolved

mentioned in: Page Loading...

Activity

People

Assignee:: Serguei Smirnov

Reporter:: Olaf Faaland

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 10/Feb/22 2:19 AM

Updated:: 24/Sep/24 9:56 PM

Resolved:: 11/Mar/24 10:02 PM