[LU-10961] Clients hang after failovers. LustreError: 223668:0:(file.c:4213:ll_inode_revalidate_fini()) soaked: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 Created: 27/Apr/18 Updated: 10/Sep/18 Resolved: 10/Sep/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
soak cluster |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We are seeing repeated hard hang on clients after server failover.
We also see repeated errors referencing the MDT: Apr 27 17:25:42 soak-23 kernel: LustreError: 223668:0:(file.c:4213:ll_inode_revalidate_fini()) soaked: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 The error appears very repeatable. Logs and stack traces are attached. |
| Comments |
| Comment by Cliff White (Inactive) [ 30/Apr/18 ] |
|
During the weekend, most of the clients hung. Issued the command 'umount -f' all clients except soak-24 successfully un-mounted Lustre. |
| Comment by Cliff White (Inactive) [ 30/Apr/18 ] |
|
Some new errors in the latest hang: [431310.219421] traps: lfs[157180] trap divide error ip:40dc1a sp:7ffda599e940 error:0 in lfs[400000+20000]
[431346.590086] traps: lfs[157395] trap divide error ip:40dc1a sp:7ffee73a3900 error:0 in lfs[400000+20000]
[431385.007477] Lustre: 154503:0:(client.c:2099:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1525103886/real 0] req@ffff8805bcb47200 x1599185360888640/t0(0) o4->soaked-MDT0000-mdc-ffff880c502a0000@192.168.1.108@o2ib:13/10 lens 608/448 e 0 to 1 dl 1525103893 ref 3 fl Rpc:X/0/ffffffff rc 0/-1
|
| Comment by Cliff White (Inactive) [ 22/May/18 ] |
|
Seeing this error again on current master. version=2.11.52_28_ge2d7e67 |
| Comment by Cliff White (Inactive) [ 23/May/18 ] |
|
More logs. [12385.187946] LustreError: 22349:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6151550@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb5040], [28672|1|+|-|ffff999a22957980|256|ffff999b27449fa0]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.187949] LustreError: 12910:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6151130@{[0 -> 0/255], [2|0|+|cache|wihuY|ffff9998b4fb75c0], [28672|1|+|-|ffff999a22957bc0|256| (null)]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.187951] LustreError: 12901:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6150e70@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb5400], [28672|1|+|-|ffff99998672c140|256|ffff9997bfc20000]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.187953] LustreError: 12749:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c61500b0 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999986729b00/0x9ba73c642656dc6b lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8ccb:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be10ed expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.187955] LustreError: 22349:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6151550 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999a22957980/0x9ba73c642656dd13 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cd0:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be2a07 expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.187958] LustreError: 12910:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6151130 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999a22957bc0/0x9ba73c642656dd8a lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cd5:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be408f expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.187960] LustreError: 12901:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6150e70 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff99998672c140/0x9ba73c642656dcaa lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8ccd:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be18bf expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.187980] LustreError: 12751:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6150420@{[0 -> 0/255], [2|0|+|cache|wihuY|ffff9998b4fb4000], [28672|1|+|-|ffff999a22954800|256| (null)]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.187986] LustreError: 12751:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6150420 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999a22954800/0x9ba73c642656dd6e lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cd4:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be3965 expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.188015] LustreError: 16607:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6150a50@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb7c00], [28672|1|+|-|ffff99998672ad00|256|ffff9997bfc26eb0]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.188021] LustreError: 16607:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6150a50 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff99998672ad00/0x9ba73c642656dc02 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cc7:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be004d expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.188032] LustreError: 16605:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6151d90@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb4a00], [28672|1|+|-|ffff99998672e0c0|256|ffff999b6bb62f70]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.188038] LustreError: 16605:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6151d90 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff99998672e0c0/0x9ba73c642656dc48 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cc9:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be0adb expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.188259] LustreError: 12677:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6150210@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb6800], [28672|1|+|-|ffff99998672d580|256|ffff999b6b272f70]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.188264] LustreError: 12677:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6150210 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff99998672d580/0x9ba73c642656dbd1 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cc5:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88bdf6b4 expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12385.188911] LustreError: 12746:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6151340@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff9998b4fb6300], [28672|1|+|-|ffff999a22956540|256|ffff999f5b654f10]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12385.188918] LustreError: 12746:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6151340 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999a22956540/0x9ba73c642656ddd7 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cd7:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be48d1 expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12388.654048] LustreError: 12904:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c61502c0 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff999986728d80/0x9ba73c642656dc25 lrc: 4/0,0 mode: PW/PW res: [0x200010e09:0x8cc8:0x0].0x0 bits 0x40/0x40 rrc: 2 type: IBT flags: 0x449400000000 nid: local remote: 0xaa9e212e88be05da expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[12484.834610] INFO: task simul:25858 blocked for more than 120 seconds.
[12484.857231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12484.884393] simul D ffff999910485ee0 0 25858 25851 0x00000080
[12484.908994] Call Trace:
[12484.918386] [<ffffffffbb029e72>] ? path_lookupat+0x122/0x8b0
[12484.938662] [<ffffffffbb514e99>] schedule_preempt_disabled+0x29/0x70
[12484.961219] [<ffffffffbb512c57>] __mutex_lock_slowpath+0xc7/0x1d0
[12484.982900] [<ffffffffbb51203f>] mutex_lock+0x1f/0x2f
[12485.001116] [<ffffffffbb0272d0>] lock_rename+0xc0/0xe0
[12485.019600] [<ffffffffbb02d39f>] SYSC_renameat2+0x22f/0x5a0
[12485.039503] [<ffffffffbaed849c>] ? update_curr+0x14c/0x1e0
[12485.059063] [<ffffffffbaed62dc>] ? set_next_entity+0x3c/0xe0
[12485.079200] [<ffffffffbb51367e>] ? __schedule+0x14e/0xa20
[12485.098433] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12485.121395] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12485.144330] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12485.167214] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12485.190098] [<ffffffffbb02e58e>] SyS_renameat2+0xe/0x10
[12485.208651] [<ffffffffbb02e5ce>] SyS_rename+0x1e/0x20
[12485.226616] [<ffffffffbb52082f>] system_call_fastpath+0x1c/0x21
[12485.247435] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12605.265688] INFO: task simul:25858 blocked for more than 120 seconds.
[12605.343807] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12605.438557] simul D ffff999910485ee0 0 25858 25851 0x00000080
[12605.524156] Call Trace:
[12605.554383] [<ffffffffbb029e72>] ? path_lookupat+0x122/0x8b0
[12605.624122] [<ffffffffbb514e99>] schedule_preempt_disabled+0x29/0x70
[12605.702164] [<ffffffffbb512c57>] __mutex_lock_slowpath+0xc7/0x1d0
[12605.777070] [<ffffffffbb51203f>] mutex_lock+0x1f/0x2f
[12605.839473] [<ffffffffbb0272d0>] lock_rename+0xc0/0xe0
[12605.902894] [<ffffffffbb02d39f>] SYSC_renameat2+0x22f/0x5a0
[12605.971484] [<ffffffffbaed849c>] ? update_curr+0x14c/0x1e0
[12606.039000] [<ffffffffbaed62dc>] ? set_next_entity+0x3c/0xe0
[12606.108587] [<ffffffffbb51367e>] ? __schedule+0x14e/0xa20
[12606.175047] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12606.254985] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12606.334925] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12606.414838] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12606.494700] [<ffffffffbb02e58e>] SyS_renameat2+0xe/0x10
[12606.558964] [<ffffffffbb02e5ce>] SyS_rename+0x1e/0x20
[12606.621092] [<ffffffffbb52082f>] system_call_fastpath+0x1c/0x21
[12606.693633] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12726.768761] INFO: task simul:25858 blocked for more than 120 seconds.
[12726.776720] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12726.786099] simul D ffff999910485ee0 0 25858 25851 0x00000080
[12726.794629] Call Trace:
[12726.794629] Call Trace:
[12726.797974] [<ffffffffbb029e72>] ? path_lookupat+0x122/0x8b0
[12726.804992] [<ffffffffbb514e99>] schedule_preempt_disabled+0x29/0x70
[12726.812770] [<ffffffffbb512c57>] __mutex_lock_slowpath+0xc7/0x1d0
[12726.820246] [<ffffffffbb51203f>] mutex_lock+0x1f/0x2f
[12726.826543] [<ffffffffbb0272d0>] lock_rename+0xc0/0xe0
[12726.832932] [<ffffffffbb02d39f>] SYSC_renameat2+0x22f/0x5a0
[12726.839803] [<ffffffffbaed849c>] ? update_curr+0x14c/0x1e0
[12726.846573] [<ffffffffbaed62dc>] ? set_next_entity+0x3c/0xe0
[12726.853530] [<ffffffffbb51367e>] ? __schedule+0x14e/0xa20
[12726.860202] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12726.868137] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12726.876068] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12726.883992] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12726.891915] [<ffffffffbb02e58e>] SyS_renameat2+0xe/0x10
[12726.898373] [<ffffffffbb02e5ce>] SyS_rename+0x1e/0x20
[12726.904628] [<ffffffffbb52082f>] system_call_fastpath+0x1c/0x21
[12726.911868] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12846.915800] INFO: task simul:25858 blocked for more than 120 seconds.
[12846.937671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12846.964035] simul D ffff999910485ee0 0 25858 25851 0x00000080
[12846.987832] Call Trace:
[12846.996459] [<ffffffffbb029e72>] ? path_lookupat+0x122/0x8b0
[12847.015963] [<ffffffffbb514e99>] schedule_preempt_disabled+0x29/0x70
[12847.037750] [<ffffffffbb512c57>] __mutex_lock_slowpath+0xc7/0x1d0
[12847.058669] [<ffffffffbb51203f>] mutex_lock+0x1f/0x2f
[12847.076143] [<ffffffffbb0272d0>] lock_rename+0xc0/0xe0
[12847.093894] [<ffffffffbb02d39f>] SYSC_renameat2+0x22f/0x5a0
[12847.113076] [<ffffffffbaed849c>] ? update_curr+0x14c/0x1e0
[12847.131964] [<ffffffffbaed62dc>] ? set_next_entity+0x3c/0xe0
[12847.151421] [<ffffffffbb51367e>] ? __schedule+0x14e/0xa20
[12847.170022] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12847.192344] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12847.214662] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
[12847.236971] [<ffffffffbb52076f>] ? system_call_after_swapgs+0xbc/0x160
[12847.259272] [<ffffffffbb02e58e>] SyS_renameat2+0xe/0x10
[12847.277283] [<ffffffffbb02e5ce>] SyS_rename+0x1e/0x20
[12847.294717] [<ffffffffbb52082f>] system_call_fastpath+0x1c/0x21
[12847.315019] [<ffffffffbb52077b>] ? system_call_after_swapgs+0xc8/0x160
12977.554496] LustreError: 12274:0:(osc_cache.c:955:osc_extent_wait()) extent ffff9997c6151290@{[0 -> 0/255], [2|0|+|cache|wihuY|ffff9998b4fb6940], [28672|1|+|-|ffff99998672b180|256| (null)]} soaked-MDT0000-mdc-ffff999f58e9f000: wait ext to 0 timedout, recovery in progress?
[12977.587096] LustreError: 12274:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff9997c6151290 ns: soaked-MDT0000-mdc-ffff999f58e9f000 lock: ffff99998672b180/0x9ba73c642656db99 lrc: 3/0,0 mode: PW/PW res: [0x200010e09:0x8cc4:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x49400000000 nid: local remote: 0xaa9e212e88bdf0a2 expref: -99 pid: 24926 timeout: 0 lvb_type: 0
[24406.001241] LustreError: 22890:0:(vvp_io.c:1495:vvp_io_init()) soaked: refresh file layout [0x200010dc3:0x1d336:0x0] error -4.
[25696.331238] LustreError: 24884:0:(file.c:4213:ll_inode_revalidate_fini()) soaked: revalidate FID [0x200000401:0x2:0x0] error: rc = -4
[27088.300797] perf: interrupt took too long (5045 > 4917), lowering kernel.perf_event_max_sample_rate to 39000
[62130.364898] perf: interrupt took too long (6328 > 6306), lowering kernel.perf_event_max_sample_rate to 31000
|
| Comment by Peter Jones [ 23/May/18 ] |
|
Dmitry Could you please investigate? Thanks Peter |
| Comment by Cliff White (Inactive) [ 24/May/18 ] |
|
Working backwards to find a 'good' version. Seeing the error once in 2.10.58_139_g630cd49 with a different result code. (EIO instead of EINTR) |
| Comment by Dmitry Eremin (Inactive) [ 25/May/18 ] |
|
After several connection change during recovery it hangs with following states: The last recovery restart was for MGS but it looks REPLAY_LOCKS was not complete.
|
| Comment by Cliff White (Inactive) [ 29/May/18 ] |
|
So, is there a fix, or something to be done? Is the more data you require? This is halting soak testing ATM |
| Comment by Dmitry Eremin (Inactive) [ 29/May/18 ] |
|
I cannot find out the root cause yet. I'm looking into logs and try to understand what was wrong. Mingwile could you check the value of following param: lctl get_param osc.*.pinger_recov What activity was on this cluster during this hangs?
|
| Comment by Cliff White (Inactive) [ 29/May/18 ] |
|
Cluster was running a large mix of jobs for the stress tests. multiple failovers were occurring ]# lctl get_param osc.*.pinger_recov osc.soaked-OST0000-osc-ffff880823116800.pinger_recov=1 osc.soaked-OST0001-osc-ffff880823116800.pinger_recov=1 osc.soaked-OST0002-osc-ffff880823116800.pinger_recov=1 osc.soaked-OST0003-osc-ffff880823116800.pinger_recov=1 osc.soaked-OST0004-osc-ffff880823116800.pinger_recov=1 osc.soaked-OST0005-osc-ffff880823116800.pinger_recov=1 |
| Comment by Cliff White (Inactive) [ 29/May/18 ] |
|
I can repeat the error and crash dump the systems, if that will help |
| Comment by Dmitry Eremin (Inactive) [ 29/May/18 ] |
|
It would be good if you can increase the level of debug logs during this issue. lctl set_param debug="+error +warning +info" debug_mb=1024
|
| Comment by Cliff White (Inactive) [ 29/May/18 ] |
|
Repeated the error with more debug, lustre log attached. There is a crash dump available on spirit - I dumped stack and forced a crash dump on a client. |
| Comment by Cliff White (Inactive) [ 30/May/18 ] |
|
I sm testing today without MDS failures (OSS failover only) I believe this is triggered by MDS failover. |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
Ran successfully for 24 without MDS failover - So, i believe the problem is only triggered by MDS failover. |
| Comment by Dmitry Eremin (Inactive) [ 31/May/18 ] |
|
I see the recovery process and locks replay. (import.c:1498:ptlrpc_import_recovery_state_machine()) replay requested by soaked-MDT0000_UUID (recover.c:88:ptlrpc_replay_next()) import ffff8804024da000 from soaked-MDT0000_UUID committed 25770229782 last 25770229782 (import.c:1502:ptlrpc_import_recovery_state_machine()) ffff8804024da000 soaked-MDT0000_UUID: changing import state from REPLAY to REPLAY_LOCKS But then it looks unexpected communication: (osc_cache.c:955:osc_extent_wait()) extent ffff880412a6be40@{[0 -> 0/255], [3|0|+|rpc|wihY|ffff880058d16bc0], [28672|1|+|-|ffff8803ce09f500|256|ffff880827146eb0]} soaked-MDT0000-mdc-ffff88083f83d000: wait ext to 0 timedout, recovery in progress?
(osc_cache.c:955:osc_extent_wait()) ### extent: ffff880412a6be40 ns: soaked-MDT0000-mdc-ffff88083f83d000 lock: ffff8803ce09f500/0xe64f32d5b6631b06 lrc: 3/0,0 mode: PW/PW res: [0x20000240e:0x1ba06:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x49400000000 nid: local remote: 0xfcd90309686d82d6 expref: -99 pid: 28799 timeout: 0 lvb_type: 0
After this it hangs. Could you attach logs from MDSes? |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
Yes, give me a bit to reproduce the error. Currently only one MDS |
| Comment by Dmitry Eremin (Inactive) [ 31/May/18 ] |
|
Do you have DOM configured? |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
Yes, a percentage of the tests uses DOM striping, basically striping is randomly chosen, either normal striping, a PFL layout or a DOM layout. |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
It works like this: coin=$((RANDOM %% 2)) toss=$((RANDOM %% 2)) # for 2.8 testing # coin=1 # Use shift to get most random result (last bit) # Set directory striping if [ $coin -eq 1 ];then lfs setstripe -c $((RANDOM %% (nrdt + 1))) $dirname else if [ $toss -eq 1 ];then lfs setstripe -E 32M -c 1 -S 1M -E 1G -c 4 -E -1 -c -1 -S 4m $dirname else lfs setstripe -E 1M -L mdt -E EOF -c -1 $dirname fi fi |
| Comment by Dmitry Eremin (Inactive) [ 31/May/18 ] |
|
Could you also check with DOM disabled? I tend to think this is issue with DOM. |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
Yes, i can do this. |
| Comment by Dmitry Eremin (Inactive) [ 31/May/18 ] |
static int osc_extent_wait(const struct lu_env *env, struct osc_extent *ext, enum osc_extent_state state) { struct osc_object *obj = ext->oe_obj; struct l_wait_info lwi = LWI_TIMEOUT_INTR(cfs_time_seconds(600), NULL, LWI_ON_SIGNAL_NOOP, NULL); int rc = 0; ENTRY; osc_object_lock(obj); LASSERT(sanity_check_nolock(ext) == 0); /* `Kick' this extent only if the caller is waiting for it to be * written out. */ if (state == OES_INV && !ext->oe_urgent && !ext->oe_hp) { if (ext->oe_state == OES_ACTIVE) { ext->oe_urgent = 1; } else if (ext->oe_state == OES_CACHE) { ext->oe_urgent = 1; osc_extent_hold(ext); rc = 1; } } osc_object_unlock(obj); if (rc == 1) osc_extent_release(env, ext); /* wait for the extent until its state becomes @state */ rc = l_wait_event(ext->oe_waitq, extent_wait_cb(ext, state), &lwi); if (rc == -ETIMEDOUT) { OSC_EXTENT_DUMP(D_ERROR, ext, "%s: wait ext to %u timedout, recovery in progress?\n", cli_name(osc_cli(obj)), state); lwi = LWI_INTR(NULL, NULL); rc = l_wait_event(ext->oe_waitq, extent_wait_cb(ext, state), <===== It waits here forever. &lwi); } if (rc == 0 && ext->oe_rc < 0) rc = ext->oe_rc; RETURN(rc); } |
| Comment by Cliff White (Inactive) [ 31/May/18 ] |
|
i have started soak with only normal striping, no PFL or DOM. Will let that run for awhile, then add PFL back in. |
| Comment by Cliff White (Inactive) [ 01/Jun/18 ] |
|
Ran through 9 MDT restarts/failovers ( + 6 hours) with no errors. lfs setstripe -E 1M -L mdt -E EOF -c -1 $dirname |
| Comment by Cliff White (Inactive) [ 04/Jun/18 ] |
|
Test with only DOM, got the failure. Tested with only PFL, got the failure. Restarting now and will get some log dumps. |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
Replicated error, but wasn't able to catch the log after the immediate failure, i fell asleep. However, crash-dumped the MDS with full debug. crash dumped the node with full debug. |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
Lustre logs attached |
| Comment by Dmitry Eremin (Inactive) [ 05/Jun/18 ] |
|
Where MDS logs and crash are placed? Can I see them? |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
The crash dump and all logs are on the spirit cluster, do you have a login? /scratch/logs and /scratch/dumps. |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
I have attached the console and system logs from the MDS (soak-8) to this bug. |
| Comment by Dmitry Eremin (Inactive) [ 05/Jun/18 ] |
|
Sorry, I don't have login on spirit. Could you copy it to somewhere on onyx? |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
You should be able to get a Spirit account quickly, file a DCO ticket and label it account-mgmnt. Usually happens in minutes - we've advised DCO, and they are ready to do it now. All they need is your public ssh key |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
In the meantime, the crash dump is on onyx - /home/cliffwhi/lu-10961 - if you can't reach it, point me to a better directory and I'll put it there. |
| Comment by Dmitry Eremin (Inactive) [ 05/Jun/18 ] |
|
Thanks. I submit the DCO ticket and now I'm coping core from onyx... |
| Comment by Cliff White (Inactive) [ 05/Jun/18 ] |
|
Thanks, I forced the core dump several hours after the fault, it's difficult to catch as the fault normally occurs in the middle of my night |
| Comment by Mikhail Pershin [ 06/Jun/18 ] |
|
Cliff, what type of load is used in this testing? Is it something particular, e.g. 'dd' or 'tar' or mix? I am thinking about possibility to reproduce that with simpler test. |
| Comment by Cliff White (Inactive) [ 06/Jun/18 ] |
|
I though we had explained soak to you. The test running at the time of failure were: |
| Comment by Mikhail Pershin [ 07/Jun/18 ] |
|
yes, my question was about just particular time when error happened, so it is not possible to select any specific load causing that. In that case only logs at the moment of failure could make that clear. For this we can try to inject code which will cause error immediately instead of timeout. Dmitry showed already place where timeout happens, probably here we can output more debug info and return error immediately instead if waiting, so lustre logs will contain useful data. |
| Comment by James Nunez (Inactive) [ 12/Jun/18 ] |
|
Mike - Would you please upload a patch with any necessary debug information that will cause an error so we can get fail instead of waiting for a timeout? Thanks. |
| Comment by Gerrit Updater [ 13/Jun/18 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32710 |
| Comment by Cliff White (Inactive) [ 14/Jun/18 ] |
|
Hit the bug right away, lustre logs for two clients attached. Also dumped stacks and crash dumped those nodes, bits are on Spirit. Restarting with full debug. |
| Comment by Cliff White (Inactive) [ 14/Jun/18 ] |
|
In the same incident, soak-18/19 were hung, but not showing the _fini error. Lustre logs attached. Also lustre log from MDS. |
| Comment by Cliff White (Inactive) [ 14/Jun/18 ] |
|
Console logs from soak-17,42 and MDS (soak-8) attached. |
| Comment by Peter Jones [ 19/Jun/18 ] |
|
Mike Has the information provided any insight? Peter |
| Comment by Mikhail Pershin [ 03/Jul/18 ] |
|
Peter, yes, there are clues, I am working on reproducer and patch |
| Comment by Mikhail Pershin [ 05/Jul/18 ] |
|
I was able to reproduce lock replay issue with DoM files, patch will be ready soon. The problem is the lock cancellation before lock replay, MDC cancels locks differently than OSC and that causes DoM locks to be handled improperly. Meanwhile that doesn't explain failover failure with PFL-only file because there is no MDC involved, so there can be another reasons for that. |
| Comment by Gerrit Updater [ 05/Jul/18 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32791 |
| Comment by Mikhail Pershin [ 18/Jul/18 ] |
|
There is another problem with replay - if DOM write cause new component instantiation then replay of both layout change and write RPC failed. I am investigating that. |
| Comment by Mikhail Pershin [ 18/Jul/18 ] |
|
It seems that problem is not in DOM but PFL. When a new component is instantiated during failover then that is not replayed properly. It is worth to create new ticket for that. |
| Comment by Gerrit Updater [ 26/Jul/18 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32889 |
| Comment by Gerrit Updater [ 10/Sep/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32791/ |
| Comment by Peter Jones [ 10/Sep/18 ] |
|
Landed for 2.12 |