[LU-13073] Multiple MDS deadlocks (in lod_qos_prep_create) after OSS crash Created: 13/Dec/19 Updated: 02/Aug/22 Resolved: 11/Mar/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.3 |
| Fix Version/s: | Lustre 2.12.7, Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephane Thiell | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre-2.12.3_4_g142b4d4-1.el7.x86_64, kernel-3.10.0-957.27.2.el7_lustre.pl2.x86_64 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
After hitting a first issue on an OSS last night reported in
So it looks like it's not fixed in 2.12.3, and as mentioned by NASA in Note that LU-12360 is a bit different (and we don't see this specific problem anymore) but Patrick discussed about some possible issue that might still be related to our remaining problem here. In our case, shortly after the reboot of the OSS fir-io8-s1 this morning, a first MDS fir-md1-s4 was stuck. Later around noon, a second MDS fir-md1-s2 had the same issue. Finally this evening, MDT0000 on fir-md1-s1 was impacted too, leading to most of the filesystem unaccessible. The typical errors/backtraces we get on MDS in that case are: [190125.381344] LustreError: 22258:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1576181573, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff9e5d81e0e780/0x91615908336938c8 lrc: 3/0,1 mode: --/CW res: [0x240038caa:0x288e:0x0].0x0 bits 0x2/0x0 rrc: 39 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22258 timeout: 0 lvb_type: 0 [190125.381346] LustreError: 92260:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ... [190568.849438] LNet: Service thread pid 22622 was inactive for 601.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [190568.866461] Pid: 22622, comm: mdt03_009 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 [190568.876717] Call Trace: [190568.879275] [<ffffffff9dd88c47>] call_rwsem_down_write_failed+0x17/0x30 [190568.886100] [<ffffffffc15ce537>] lod_qos_statfs_update+0x97/0x2b0 [lod] [190568.892946] [<ffffffffc15d06da>] lod_qos_prep_create+0x16a/0x1890 [lod] [190568.899768] [<ffffffffc15d2015>] lod_prepare_create+0x215/0x2e0 [lod] [190568.906432] [<ffffffffc15c1e1e>] lod_declare_striped_create+0x1ee/0x980 [lod] [190568.913774] [<ffffffffc15c66f4>] lod_declare_create+0x204/0x590 [lod] [190568.920435] [<ffffffffc163cca2>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd] [190568.928419] [<ffffffffc162c6dc>] mdd_declare_create+0x4c/0xcb0 [mdd] [190568.934997] [<ffffffffc1630067>] mdd_create+0x847/0x14e0 [mdd] [190568.941041] [<ffffffffc14cd5ff>] mdt_reint_open+0x224f/0x3240 [mdt] [190568.947539] [<ffffffffc14c0693>] mdt_reint_rec+0x83/0x210 [mdt] [190568.953669] [<ffffffffc149d1b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [190568.960331] [<ffffffffc14a9a92>] mdt_intent_open+0x82/0x3a0 [mdt] [190568.966642] [<ffffffffc14a7bb5>] mdt_intent_policy+0x435/0xd80 [mdt] [190568.973204] [<ffffffffc0f54d46>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [190568.980077] [<ffffffffc0f7d336>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [190568.987271] [<ffffffffc1005a12>] tgt_enqueue+0x62/0x210 [ptlrpc] [190568.993547] [<ffffffffc100a36a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [190569.000581] [<ffffffffc0fb124b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [190569.008382] [<ffffffffc0fb4bac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [190569.014810] [<ffffffff9dac2e81>] kthread+0xd1/0xe0 [190569.019812] [<ffffffff9e177c24>] ret_from_fork_nospec_begin+0xe/0x21 [190569.026374] [<ffffffffffffffff>] 0xffffffffffffffff Sometimes, some threads seem to be completing after a long time: [191025.465281] LNet: Service thread pid 41905 completed after 687.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). The load of the server increases tremendously but it's doing almost nothing. Only very rarely the MDS can recover by itself. And if it does, there is a big chance in my experience that a few hours after, it hangs again. Our current workaround is to stop/start the impacted MDT, and it's usually fast, because now in 2.12.3 the recovery works better and doesn't hang anymore at the end. And then the load drops and things are working normally again. I'm opening this ticket with our logs in that case, in the hope we can find the root cause of this recurrent MDS issue after OSS problem. First MDT stuck MDT0003 on MDS fir-md1-s4 at 2019-12-12-08:20:42:
KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl2.x86_64/vmlinux
DUMPFILE: vmcore_fir-md1-s4_2019-12-12-08-20-42 [PARTIAL DUMP]
CPUS: 48
DATE: Thu Dec 12 08:20:33 2019
UPTIME: 2 days, 01:36:01
LOAD AVERAGE: 112.81, 86.62, 53.87
TASKS: 1658
NODENAME: fir-md1-s4
RELEASE: 3.10.0-957.27.2.el7_lustre.pl2.x86_64
VERSION: #1 SMP Thu Nov 7 15:26:16 PST 2019
MACHINE: x86_64 (1996 Mhz)
MEMORY: 255.6 GB
PANIC: "SysRq : Trigger a crash"
PID: 92886
COMMAND: "bash"
TASK: ffff9521056fe180 [THREAD_INFO: ffff952212ae0000]
CPU: 18
STATE: TASK_RUNNING (SYSRQ)
Second MDT stuck MDT0001 on fir-md1-s2 at 2019-12-12-12:36:21:
KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl2.x86_64/vmlinux
DUMPFILE: vmcore_fir-md1-s2-2019-12-12-12-36-21 [PARTIAL DUMP]
CPUS: 48
DATE: Thu Dec 12 12:36:12 2019
UPTIME: 2 days, 05:06:59
LOAD AVERAGE: 59.50, 57.71, 52.86
TASKS: 1267
NODENAME: fir-md1-s2
RELEASE: 3.10.0-957.27.2.el7_lustre.pl2.x86_64
VERSION: #1 SMP Thu Nov 7 15:26:16 PST 2019
MACHINE: x86_64 (1996 Mhz)
MEMORY: 255.6 GB
PANIC: "SysRq : Trigger a crash"
PID: 97428
COMMAND: "bash"
TASK: ffff9e7da2444100 [THREAD_INFO: ffff9e779e61c000]
CPU: 17
STATE: TASK_RUNNING (SYSRQ)
Third MDT stuck MDT0000 on fir-md1-s1:
OSS logs
NIDs:
Let me know if you need anything else. |
| Comments |
| Comment by Peter Jones [ 13/Dec/19 ] |
|
Hongchao Can you please advise? Thanks Peter |
| Comment by Peter Jones [ 16/Oct/20 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40274 |
| Comment by Gerrit Updater [ 10/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40274/ |
| Comment by Peter Jones [ 11/Mar/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 02/Apr/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43202 |
| Comment by Gerrit Updater [ 05/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43202/ |