[LU-16616] crash in osc_brw_prep_request() ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) ... Created: 03/Mar/23 Updated: 15/Apr/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Tao Lyu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | client | ||
| Environment: |
Three server nodes and one client. Kernel version: Ubuntu-5.4.0-90.101
|
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
An assertion in the Lustre client is triggered by the following calls, which leads to a kernel crash during executing the write syscall shown below: |
| Comments |
| Comment by Patrick Farrell [ 16/Mar/23 ] |
|
Tao, Can you please share the crash messages as well? Specifically the stack trace and LBUG. Also, what Lustre version are you running? (The poc is appreciated, but we need some more general info.) |
| Comment by Tao Lyu [ 16/Mar/23 ] |
|
Sure. Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 The stack trace: root@dfs:~# [ 154.265547] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 0/2 pg: 000000005a02f487 off: 0, count: 3595 [ 154.268801] LustreError: 298:0:(osc_request.c:1819:osc_brw_prep_request()) LBUG [ 154.269714] Kernel panic - not syncing: LBUG [ 154.270224] CPU: 3 PID: 298 Comm: ptlrpcd_00_00 Tainted: G O 5.4.148+ #7 [ 154.271135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 154.272152] Call Trace: [ 154.272455] dump_stack+0x50/0x63 [ 154.272875] panic+0xfb/0x2bc [ 154.274264] lbug_with_loc.cold+0x2c/0x2c [libcfs] [ 154.275445] osc_brw_prep_request+0x5214/0x6d20 [osc] [ 154.280285] osc_build_rpc+0x1487/0x3770 [osc] [ 154.281284] osc_io_unplug0+0x2f0d/0x5110 [osc] [ 154.286077] brw_queue_work+0xbe/0x220 [osc] [ 154.287007] work_interpreter+0xb3/0x340 [ptlrpc] [ 154.287904] ptlrpc_check_set+0x1244/0x7a90 [ptlrpc] [ 154.291430] ptlrpcd+0x1296/0x23c0 [ptlrpc] [ 154.298351] kthread+0xfb/0x130 [ 154.299703] ret_from_fork+0x1f/0x40 [ 154.300279] Kernel Offset: 0xa400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 154.301513] ---[ end Kernel panic - not syncing: LBUG ]--- |
| Comment by Patrick Farrell [ 16/Mar/23 ] |
|
Tao, Why are you running based on 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 ? Are you intending to test an unreleased version? That's a recent-ish pull of our development branch. We appreciate if people want to test it, but we don't recommend it for production. Our current public maintenance release is b2_15. |
| Comment by Patrick Farrell [ 16/Mar/23 ] |
|
Tao, With the poc - it looks like maybe you converted that directly from the strace using some tool? It's very odd to see "syscall(__NR_mmap, ... )" rather than mmap(). Are you able to replace any part of the oic with text and/or symbolic representations instead of all those hex values? For example, it creates a file called 'tmpfile', but that string appears nowhere in poc.c, so I assume it must be encoded in there. |
| Comment by Tao Lyu [ 16/Mar/23 ] |
|
Hi, Patrick, We are developing a bug-finding tool for distributed systems. In order to detect the newest bugs, we run the latest developing version. Yes, this is generated by our tool. It's for directly call syscalls instead of going to libc. |
| Comment by Patrick Farrell [ 17/Mar/23 ] |
|
Tao, Ahh, thank you for explaining. OK - I'll have to set up an Ubuntu VM on my end, the poc works fine on my RHEL kernel based system. It is probably a difference in kernel versions. Is it practical for you to collect debug logs from the client if I give you instructions? It's not particularly difficult. |
| Comment by Tao Lyu [ 17/Mar/23 ] |
|
Sure, glad to help collect the debug logs. |
| Comment by Tao Lyu [ 11/Apr/23 ] |
|
Hi Patrick, Would you mind debugging and fixing this bug? Best, |