[LU-13073] Multiple MDS deadlocks (in lod_qos_prep_create) after OSS crash - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.12.7, Lustre 2.15.0
Affects Version/s: Lustre 2.12.3
Labels:
None
Environment:
lustre-2.12.3_4_g142b4d4-1.el7.x86_64, kernel-3.10.0-957.27.2.el7_lustre.pl2.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

After hitting a first issue on an OSS last night reported in ~~LU-13072~~, several of our MDS started to exhibit blocked threads and locking issues today. A total of 3 out of our 4 MDS required to be rebooted. Only tonight things seem to have stabilized for us. Some research on this Jira has easily revealed that this is not a new problem:

~~LU-9688~~ Stuck MDT in lod_qos_prep_create
LU-10697 MDT locking issues after failing over OSTs from hung OSS
~~LU-11091~~ MDS threads stuck in lod_qos_prep_create after OSS crash
LU-12360 Can't restart filesystem (2.12) even with abort_recov

So it looks like it's not fixed in 2.12.3, and as mentioned by NASA in ~~LU-11091~~, the MDS deadlock can happen hours after the OSS crash, and this is exactly what happened to us today on Fir.

Note that LU-12360 is a bit different (and we don't see this specific problem anymore) but Patrick discussed about some possible issue that might still be related to our remaining problem here.

In our case, shortly after the reboot of the OSS fir-io8-s1 this morning, a first MDS fir-md1-s4 was stuck. Later around noon, a second MDS fir-md1-s2 had the same issue. Finally this evening, MDT0000 on fir-md1-s1 was impacted too, leading to most of the filesystem unaccessible.

The typical errors/backtraces we get on MDS in that case are:

[190125.381344] LustreError: 22258:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1576181573, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff9e5d81e0e780/0x91615908336938c8 lrc: 3/0,1 mode: --/CW res: [0x240038caa:0x288e:0x0].0x0 bits 0x2/0x0 rrc: 39 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22258 timeout: 0 lvb_type: 0
[190125.381346] LustreError: 92260:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ...
[190568.849438] LNet: Service thread pid 22622 was inactive for 601.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[190568.866461] Pid: 22622, comm: mdt03_009 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
[190568.876717] Call Trace:
[190568.879275]  [<ffffffff9dd88c47>] call_rwsem_down_write_failed+0x17/0x30
[190568.886100]  [<ffffffffc15ce537>] lod_qos_statfs_update+0x97/0x2b0 [lod]
[190568.892946]  [<ffffffffc15d06da>] lod_qos_prep_create+0x16a/0x1890 [lod]
[190568.899768]  [<ffffffffc15d2015>] lod_prepare_create+0x215/0x2e0 [lod]
[190568.906432]  [<ffffffffc15c1e1e>] lod_declare_striped_create+0x1ee/0x980 [lod]
[190568.913774]  [<ffffffffc15c66f4>] lod_declare_create+0x204/0x590 [lod]
[190568.920435]  [<ffffffffc163cca2>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
[190568.928419]  [<ffffffffc162c6dc>] mdd_declare_create+0x4c/0xcb0 [mdd]
[190568.934997]  [<ffffffffc1630067>] mdd_create+0x847/0x14e0 [mdd]
[190568.941041]  [<ffffffffc14cd5ff>] mdt_reint_open+0x224f/0x3240 [mdt]
[190568.947539]  [<ffffffffc14c0693>] mdt_reint_rec+0x83/0x210 [mdt]
[190568.953669]  [<ffffffffc149d1b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[190568.960331]  [<ffffffffc14a9a92>] mdt_intent_open+0x82/0x3a0 [mdt]
[190568.966642]  [<ffffffffc14a7bb5>] mdt_intent_policy+0x435/0xd80 [mdt]
[190568.973204]  [<ffffffffc0f54d46>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[190568.980077]  [<ffffffffc0f7d336>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[190568.987271]  [<ffffffffc1005a12>] tgt_enqueue+0x62/0x210 [ptlrpc]
[190568.993547]  [<ffffffffc100a36a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[190569.000581]  [<ffffffffc0fb124b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[190569.008382]  [<ffffffffc0fb4bac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
[190569.014810]  [<ffffffff9dac2e81>] kthread+0xd1/0xe0
[190569.019812]  [<ffffffff9e177c24>] ret_from_fork_nospec_begin+0xe/0x21
[190569.026374]  [<ffffffffffffffff>] 0xffffffffffffffff

Sometimes, some threads seem to be completing after a long time:

[191025.465281] LNet: Service thread pid 41905 completed after 687.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).

The load of the server increases tremendously but it's doing almost nothing. Only very rarely the MDS can recover by itself. And if it does, there is a big chance in my experience that a few hours after, it hangs again.

Our current workaround is to stop/start the impacted MDT, and it's usually fast, because now in 2.12.3 the recovery works better and doesn't hang anymore at the end. And then the load drops and things are working normally again.

I'm opening this ticket with our logs in that case, in the hope we can find the root cause of this recurrent MDS issue after OSS problem.

First MDT stuck MDT0003 on MDS fir-md1-s4 at 2019-12-12-08:20:42:

vmcore uploaded to the FTP as vmcore_fir-md1-s4_2019-12-12-08-20-42

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl2.x86_64/vmlinux
    DUMPFILE: vmcore_fir-md1-s4_2019-12-12-08-20-42  [PARTIAL DUMP]
        CPUS: 48
        DATE: Thu Dec 12 08:20:33 2019
      UPTIME: 2 days, 01:36:01
LOAD AVERAGE: 112.81, 86.62, 53.87
       TASKS: 1658
    NODENAME: fir-md1-s4
     RELEASE: 3.10.0-957.27.2.el7_lustre.pl2.x86_64
     VERSION: #1 SMP Thu Nov 7 15:26:16 PST 2019
     MACHINE: x86_64  (1996 Mhz)
      MEMORY: 255.6 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 92886
     COMMAND: "bash"
        TASK: ffff9521056fe180  [THREAD_INFO: ffff952212ae0000]
         CPU: 18
       STATE: TASK_RUNNING (SYSRQ)

output of "foreach bt" as foreach-bt_fir-md1-s4_2019-12-12-08-20-42.txt
vmcore-dmesg.txt as vmcore-dmesg_fir-md1-s4_2019-12-12-08-20-42.txt

Second MDT stuck MDT0001 on fir-md1-s2 at 2019-12-12-12:36:21:

vmcore uploaded to the FTP as vmcore_fir-md1-s2-2019-12-12-12-36-21

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl2.x86_64/vmlinux
    DUMPFILE: vmcore_fir-md1-s2-2019-12-12-12-36-21  [PARTIAL DUMP]
        CPUS: 48
        DATE: Thu Dec 12 12:36:12 2019
      UPTIME: 2 days, 05:06:59
LOAD AVERAGE: 59.50, 57.71, 52.86
       TASKS: 1267
    NODENAME: fir-md1-s2
     RELEASE: 3.10.0-957.27.2.el7_lustre.pl2.x86_64
     VERSION: #1 SMP Thu Nov 7 15:26:16 PST 2019
     MACHINE: x86_64  (1996 Mhz)
      MEMORY: 255.6 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 97428
     COMMAND: "bash"
        TASK: ffff9e7da2444100  [THREAD_INFO: ffff9e779e61c000]
         CPU: 17
       STATE: TASK_RUNNING (SYSRQ)

output of "foreach bt" as foreach-bt_fir-md1-s2-2019-12-12-12-36-21.txt
vmcore-dmesg.txt as vmcore-dmesg_fir-md1-s2-2019-12-12-12-36-21.txt

Third MDT stuck MDT0000 on fir-md1-s1:

in that case, we just did a umount/mount of MDT0000 at Dec 12 17:29:39 and it did work. Attaching kernel logs as fir-md1-s1_2019-12-12_kernel.log
since the last MDT restart, this is OK

OSS logs

console log for fir-io8-s1, the OSS that originally crashed (from last boot, so you may have to scroll a bit) as fir-io8-s1_console.log (problem reported in ~~LU-13072~~)

NIDs:

MDS fir-md1-s1: 10.0.10.51@o2ib7
MDS fir-md1-s2: 10.0.10.52@o2ib7
MDS fir-md1-s3: 10.0.10.53@o2ib7
MDS fir-md1-s4: 10.0.10.54@o2ib7
OSS fir-io8-s1: 10.0.10.115@o2ib7

Let me know if you need anything else.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

fir-io8-s1_console.log
2.05 MB
13/Dec/19 6:47 AM
fir-md1-s1_2019-12-12_kernel.log
668 kB
13/Dec/19 6:58 AM
foreach-bt_fir-md1-s2-2019-12-12-12-36-21.txt
710 kB
13/Dec/19 6:55 AM
foreach-bt_fir-md1-s4_2019-12-12-08-20-42.txt
1.13 MB
13/Dec/19 6:41 AM
reproducer
1 kB
16/Oct/20 8:05 AM
vmcore-dmesg_fir-md1-s2-2019-12-12-12-36-21.txt
393 kB
13/Dec/19 6:55 AM
vmcore-dmesg_fir-md1-s4_2019-12-12-08-20-42.txt
671 kB
13/Dec/19 6:41 AM

Issue Links

is duplicated by

LU-14277 any create blocked due any OST fail

Resolved

is related to

LU-13462 MDS deadlocks in osd_read_lock()

Resolved

LU-15393 object allocation when OST is lost

Resolved

Activity

[LU-13073] Multiple MDS deadlocks (in lod_qos_prep_create) after OSS crash

Gerrit Updater added a comment - 05/May/21 9:23 PM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43202/
Subject: ~~LU-13073~~ osp: don't block waiting for new objects
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6a023a8d772b052a70927dc5c8b481072bfe164e

Gerrit Updater added a comment - 05/May/21 9:23 PM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43202/ Subject: LU-13073 osp: don't block waiting for new objects Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 6a023a8d772b052a70927dc5c8b481072bfe164e

Gerrit Updater added a comment - 02/Apr/21 5:35 PM

Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43202
Subject: ~~LU-13073~~ osp: don't block waiting for new objects
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b8c4a50a7b71c14cbca941be552e41024c2a3835

Gerrit Updater added a comment - 02/Apr/21 5:35 PM Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43202 Subject: LU-13073 osp: don't block waiting for new objects Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: b8c4a50a7b71c14cbca941be552e41024c2a3835

Peter Jones added a comment - 11/Mar/21 5:13 AM

Landed for 2.15

Peter Jones added a comment - 11/Mar/21 5:13 AM Landed for 2.15

Gerrit Updater added a comment - 10/Mar/21 8:03 AM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40274/
Subject: ~~LU-13073~~ osp: don't block waiting for new objects
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2112ccb3c48ccf86aaf2a61c9f040571a6323f9c

Gerrit Updater added a comment - 10/Mar/21 8:03 AM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40274/ Subject: LU-13073 osp: don't block waiting for new objects Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2112ccb3c48ccf86aaf2a61c9f040571a6323f9c

Peter Jones added a comment - 16/Oct/20 4:49 PM - edited

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40274
Subject: ~~LU-13073~~ osp: don't block waiting for new objects
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d618b1491ec898b835d09f9992fc55fd8f3a962f

Peter Jones added a comment - 16/Oct/20 4:49 PM - edited Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40274 Subject: LU-13073 osp: don't block waiting for new objects Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d618b1491ec898b835d09f9992fc55fd8f3a962f

Peter Jones added a comment - 13/Dec/19 6:34 PM

Hongchao

Can you please advise?

Thanks

Peter

Peter Jones added a comment - 13/Dec/19 6:34 PM Hongchao Can you please advise? Thanks Peter

Multiple MDS deadlocks (in lod_qos_prep_create) after OSS crash

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates