[LU-15517] MDT stuck in recovery if one other MDT is failed over to partner node - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.12.8
Labels:
- llnl
Environment:
3.10.0-1160.53.1.1chaos.ch6.x86_64
zfs-0.7.11-9.8llnl.ch6.x86_64
lustre-2.12.8_6.llnl-1.ch6.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

MDT fails to enter recovery when there is one MDT running on its partner MDS. The recovery_status file indicates "WAITING" and reports the MDT running on its parter as a non-ready MDT.

The console log of the MDT unable to enter recovery repeatedly shows messages like:

[Thu Feb  3 12:15:18 2022] Lustre: 4102:0:(ldlm_lib.c:1827:extend_recovery_timer()) lquake-MDT0009: extended recovery timer reached hard limit: 900, extend: 1
[Thu Feb  3 12:15:18 2022] Lustre: 4102:0:(ldlm_lib.c:1827:extend_recovery_timer()) Skipped 29 previous similar messages

My steps to reproduce:
1. Start all MDTs on their primary MDS (in my case, that's jet1...jet16 => MDT0000...MDT000f). Allow them to complete recovery.
2. umount MDT000e on jet15, and mount it on jet16. Allow it to complete recovery.
3. umount MDT0009 on jet10, and then mount it again (on the same node, jet10, where it was happily running moments ago).

Debug log shows that the update log for MDT000e was not received. There are no messages in the console log regarding MDT000e after MDT0009 starts up.

I determined the PID of the thread lod_sub_recovery_thread() for MDT000e. The debug log shows the thread starts up, follows roughly this sequence of calls, and never returns from ptlrpc_set_wait().

lod_sub_prep_llog>llog_osd_get_cat_list->dt_locate_at->lu_object_find_at-> ?->
osp_attr_get->osp_remote_sync->ptlrpc_queue_wait->ptlrpc_set_wait

So apparently the RPC never times out, so upper layers never get to retry, there are no error messages reporting the problem, and the MDT never enters recovery.

Our patch stack is:

82ea54e (tag: 2.12.8_6.llnl) LU-13356 client: don't use OBD_CONNECT_MNE_SWAB
d06f5b2 LU-15357 mdd: fix changelog context leak
543b60b LU-9964 llite: prevent mulitple group locks
d776b67 LU-15234 lnet: Race on discovery queue
77040da Revert "LU-15234 lnet: Race on discovery queue"
bba827c LU-15234 lnet: Race on discovery queue
7faa872 LU-14865 utils: llog_reader.c printf type mismatch
5dc104e LU-13946 build: OpenZFS 2.0 compatibility
0fef268 TOSS-4917 grant: chatty warning in tgt_grant_incoming
5b70822 TOSS-4917 grant: improve debug for grant calcs
8af02e8 log lfs setstripe paths to syslog
391be81 Don't install lustre init script on systemd systems
c613926 LLNL build customizations
a4f71cd TOSS-4431 build: build ldiskfs only for x86_64
067cb55 (tag: v2_12_8, tag: 2.12.8) New release 2.12.8

See https://github.com/LLNL/lustre/tree/2.12.8-llnl for the details.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dk.1.jet16.txt.gz
287 kB
03/Feb/22 10:33 PM
dk.3.txt.gz
5.18 MB
03/Feb/22 8:40 PM
dmesg.jet16.txt.gz
26 kB
03/Feb/22 10:33 PM
dmesg.txt.gz
27 kB
03/Feb/22 8:40 PM
lu-15517.llog_reader.out
328 kB
07/Apr/22 6:08 AM

Activity

People

Assignee:: Mikhail Pershin

Reporter:: Olaf Faaland

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 03/Feb/22 8:29 PM

Updated:: 22/Jul/22 10:43 PM