[LU-15815] fast_read/stale data/reclaim workround causes SIGBUS Created: 03/May/22 Updated: 19/Jan/24 Resolved: 09/May/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | Andrew Perepechko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||
| Description |
|
The fast_read stale data workaround from |
| Comments |
| Comment by Gerrit Updater [ 03/May/22 ] |
|
"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47204 |
| Comment by John Hammond [ 03/May/22 ] |
|
Here is a reliable if inconvenient reproducer: $LUSTRE/tests/llmount.sh lctl set_param debug_mb=512 debug='+trace page mmap' lctl set_param llite.*.max_read_ahead_mb=0 # Not needed to reproduce. yum install openmpi openmpi-devel mv /usr/lib64/openmpi /mnt/lustre/openmpi ln -s /mnt/lustre/openmpi /usr/lib64/openmpi cd /mnt/lustre wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.9.tar.gz tar -xzf osu-micro-benchmarks-5.9.tar.gz cd osu-micro-benchmarks-5.9 ./configure CC=/usr/lib64/openmpi/bin/mpicc CXX=/usr/lib64/openmpi/bin/mpicxx && make -j4 while true; do echo 3 > /proc/sys/vm/drop_caches ; done & lctl clear while /mnt/lustre/openmpi/bin/mpirun --allow-run-as-root -np 2 --oversubscribe --host k /mnt/lustre/osu-micro-benchmarks-5.9//mpi/\ collective/osu_alltoall -f -m 65536; do true done lctl dk > /tmp/osu_alltoall.dk kill %1 ... [k:18404] *** Process received signal *** [k:18404] Signal: Bus error (7) [k:18404] Signal code: Non-existant physical address (2) [k:18404] Failing at address: 0x7fdb20a62e73 [k:18404] [ 0] /lib64/libpthread.so.0(+0xf5f0)[0x7fdb1fcf45f0] [k:18404] [ 1] /lib64/ld-linux-x86-64.so.2(+0x19d72)[0x7fdb20f88d72] [k:18404] [ 2] /lib64/ld-linux-x86-64.so.2(+0x8ae2)[0x7fdb20f77ae2] [k:18404] [ 3] /lib64/ld-linux-x86-64.so.2(+0x14254)[0x7fdb20f83254] [k:18404] [ 4] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784] [k:18404] [ 5] /lib64/ld-linux-x86-64.so.2(+0x13b3b)[0x7fdb20f82b3b] [k:18404] [ 6] /lib64/libdl.so.2(+0xeeb)[0x7fdb2084beeb] [k:18404] [ 7] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784] [k:18404] [ 8] /lib64/libdl.so.2(+0x14ed)[0x7fdb2084c4ed] [k:18404] [ 9] /lib64/libdl.so.2(dlopen+0x31)[0x7fdb2084bf81] [k:18404] [10] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x59edd)[0x7fdb20aa8edd] [k:18404] [11] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3c7d1)[0x7fdb20a8b7d1] [k:18404] [12] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x7fdb20a8cd4a] [k:18404] [13] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x7fdb20a96cb6] [k:18404] [14] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x7fdb20a97166] [k:18404] [15] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x7fdb20a971c2] [k:18404] [16] /usr/lib64/openmpi/lib/openmpi/mca_ess_hnp.so(+0x4a48)[0x7fdb1eadea48] [k:18404] [17] /usr/lib64/openmpi/lib/libopen-rte.so.12(orte_init+0x168)[0x7fdb20d09398] [k:18404] [18] /mnt/lustre/openmpi/bin/mpirun[0x40449f] [k:18404] [19] /mnt/lustre/openmpi/bin/mpirun[0x40361d] [k:18404] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdb1f939505] [k:18404] [21] /mnt/lustre/openmpi/bin/mpirun[0x403539] [k:18404] *** End of error message *** Bus error (core dumped) # grep SIGBUS /tmp/osu_alltoall.dk 00000080:00008000:3.0:1651610586.302378:0:18404:0:(vvp_io.c:1353:vvp_io_kernel_fault()) got addr 00007fdb20a62000 - SIGBUS Trimmed logs are attached as osu_alltoall_trimmed.dk. PID 18404 is osu_alltoall, 16737 is bash wrring to drop_caches. |
| Comment by Gerrit Updater [ 05/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47204/ |
| Comment by Peter Jones [ 05/May/22 ] |
|
Landed for 2.15 |
| Comment by Cory Spitz [ 06/May/22 ] |
|
jhammond, can we re-open this ticket? It makes no sense to me to revert |
| Comment by Peter Jones [ 06/May/22 ] |
|
Reopening until this discussion is settled |
| Comment by John Hammond [ 06/May/22 ] |
|
> Can we find another solution please? panda shared some ideas on |
| Comment by Andrew Perepechko [ 06/May/22 ] |
Yes, it is. |
| Comment by Sebastien Piechurski [ 06/May/22 ] |
|
We tested this patch on a 2.12.6 basis, and disabling fast_read is actually catastrophic in terms of performance on our customer code. The revert of So, yes please can we find another solution ? |
| Comment by John Hammond [ 06/May/22 ] |
|
> The revert of Where? On what branch? > but we recently found that it was causing the corruptions on mmap'ed pages. Do you mean that > So, yes please can we find another solution ? There seems to be a lot of hopeful use of "we" in this ticket. |
| Comment by John Hammond [ 06/May/22 ] |
|
With [ 179.609120] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.613492] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.618134] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.621452] LustreError: 11007:0:(osc_cache.c:2495:osc_teardown_async_page()) extent 000000009717064a@{[3 -> 3/1023], [2|0|-|cache|wi|00000000b2cc94b9], [28672|1|+|-|00000000d41a65d9|1024|00000000baa546d7]} trunc at 3.
[ 179.622752] LustreError: 11007:0:(osc_cache.c:2495:osc_teardown_async_page()) ### extent: 000000009717064a ns: lustre-OST0000-osc-ffff8ba637eb9000 lock: 00000000d41a65d9/0xa4961a626f53297d lrc: 4/0,1 mode: PW/PW res: [0x19:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 12288->16383) gid 0 flags: 0x800020000000000 nid: local remote: 0xa4961a626f532984 expref: -99 pid: 11007 timeout: 0 lvb_type: 1
[ 179.622988] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.625126] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) page@00000000285d5729[3 00000000214d642c 5 1 00000000baa546d7]
[ 179.625126]
[ 179.626325] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) vvp-page@000000002564fdc0(1:0) vm@00000000c9fa97b2 17ffffc0001035 2:0 ffff8ba5f1774140 3 lru
[ 179.626325]
[ 179.627359] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) lov-page@00000000b1b9d4b5
[ 179.627359]
[ 179.628033] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) osc-page@00000000a3522c1f 3: 1< 0x845fed 2 + - > 2< 12288 0 4096 0x0 0x40420 | 00000000baa546d7 00000000cd5d95e4 00000000b2cc94b9 > 3< 0 0 0 > 4< 0 0 8 667238400 - | - - + - > 5< - - + - | 0 - | 1 - ->
[ 179.628033]
[ 179.629656] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) end page@00000000285d5729
[ 179.629656]
[ 179.629826] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.630301] LustreError: 11007:0:(osc_page.c:182:osc_page_delete()) Trying to teardown failed: -16
[ 179.630302] LustreError: 11007:0:(osc_page.c:183:osc_page_delete()) ASSERTION( 0 ) failed:
[ 179.630304] LustreError: 11007:0:(osc_page.c:183:osc_page_delete()) LBUG
[ 179.632209] Pid: 11007, comm: mmap_fault_vs_d 4.18.0-348.7.1.el8.x86_64 #1 SMP Thu Mar 3 10:39:00 CST 2022
[ 179.632823] Call Trace TBD:
[ 179.633101] [<0>] libcfs_call_trace+0x8f/0xe0 [libcfs]
[ 179.633455] [<0>] lbug_with_loc+0x53/0xb0 [libcfs]
[ 179.633774] [<0>] osc_page_delete+0x666/0x670 [osc]
[ 179.634124] [<0>] cl_page_delete0+0x9d/0x2d0 [obdclass]
[ 179.634491] [<0>] cl_page_delete+0x3e/0x130 [obdclass]
[ 179.634516] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.634854] [<0>] ll_invalidatepage+0xc0/0x230 [lustre]
[ 179.634862] [<0>] truncate_cleanup_page+0x8d/0x170
[ 179.635861] [<0>] generic_error_remove_page+0x31/0x80
[ 179.636274] [<0>] vvp_page_discard+0x49/0x120 [lustre]
[ 179.636640] [<0>] cl_page_discard+0x5e/0xc0 [obdclass]
[ 179.636993] [<0>] cl_page_list_discard+0x59/0x160 [obdclass]
[ 179.637394] [<0>] ll_io_read_page+0x27c/0xd00 [lustre]
[ 179.637749] [<0>] ll_readpage+0x133/0xb70 [lustre]
[ 179.638076] [<0>] filemap_fault+0x84c/0xa60
[ 179.638369] [<0>] vvp_io_fault_start+0x53f/0x12b0 [lustre]
[ 179.638753] [<0>] cl_io_start+0x66/0x190 [obdclass]
[ 179.638928] mmap_fault_vs_d (11006): drop_caches: 3
[ 179.639107] [<0>] cl_io_loop+0xde/0x2a0 [obdclass]
[ 179.639755] [<0>] ll_fault+0x916/0xc60 [lustre]
[ 179.640066] [<0>] __do_fault+0x48/0x130
[ 179.640323] [<0>] do_fault+0x110/0x660
[ 179.640572] [<0>] __handle_mm_fault+0xa0f/0xdb0
[ 179.640873] [<0>] handle_mm_fault+0x11e/0x310
[ 179.641168] [<0>] __do_page_fault+0x317/0x900
[ 179.641461] [<0>] do_page_fault+0x87/0x300
[ 179.641737] [<0>] page_fault+0x1e/0x30
See mmap_fault_vs_drop_caches3.c to reproduce. |
| Comment by Sebastien Piechurski [ 07/May/22 ] |
|
>> The revert of >Where? On what branch? The site is ECMWF, and it was done on a 2.12.6 basis. I unfortunately don't have the whole history as I was not involved in these at the time and this was handled between the local team and DDN directly. >> but we recently found that it was causing the corruptions on mmap'ed pages. >Do you mean that Reverting it caused the corruption, or at least we get the corruptions with it reverted. As we only understood recently that the crash/hang issues we observed were related to in-memory data corruption, it is actually only an assumption that the revert caused it. There is a DDN support case open for this. >> So, yes please can we find another solution ? >There seems to be a lot of hopeful use of "we" in this ticket. Obviously, this last "we" was probably more a "you". As much as I would like to provide solutions, my understanding of the memory management subsystem is too limited for this.
Regards,
Sebastien. |
| Comment by Patrick Farrell [ 09/May/22 ] |
|
So, we landed a patch from John here to resolve the SIGBUS issue by removing the clearpageuptodate() call in vvp_page_delete, ie, reverting So, that means the SIGBUS issue is resolved, but we have to find another way to solve So, I'm going to close this ticket as resolved, and let's move the discussion to |
| Comment by Patrick Farrell [ 09/May/22 ] |
|
Resolved by revert of |