Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15815

fast_read/stale data/reclaim workround causes SIGBUS

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The fast_read stale data workaround from LU-14541 can cause applications to receive a spurious SIGBUS when reclaim runs concurrently with page fault handler for mmaped files.

      Attachments

        Issue Links

          Activity

            [LU-15815] fast_read/stale data/reclaim workround causes SIGBUS

            We tested this patch on a 2.12.6 basis, and disabling fast_read is actually catastrophic in terms of performance on our customer code.

            The revert of LU-14541 had already been done several months ago (due to the SIGBUS errors), but we recently found that it was causing the corruptions on mmap'ed pages. Disabling fast_read on top of it is causing a x5 slowdown on the application.

            So, yes please can we find another solution ?

            spiechurski Sebastien Piechurski added a comment - We tested this patch on a 2.12.6 basis, and disabling fast_read is actually catastrophic in terms of performance on our customer code. The revert of LU-14541 had already been done several months ago (due to the SIGBUS errors), but we recently found that it was causing the corruptions on mmap'ed pages. Disabling fast_read on top of it is causing a x5 slowdown on the application. So, yes please can we find another solution ?

            Is that work in progress?

            Yes, it is.

            panda Andrew Perepechko added a comment - Is that work in progress? Yes, it is.
            jhammond John Hammond added a comment -

            > Can we find another solution please?

            panda shared some ideas on LU-15819. Is that work in progress?

            jhammond John Hammond added a comment - > Can we find another solution please? panda shared some ideas on LU-15819 . Is that work in progress?
            pjones Peter Jones added a comment -

            Reopening until this discussion is settled

            pjones Peter Jones added a comment - Reopening until this discussion is settled
            spitzcor Cory Spitz added a comment -

            jhammond, can we re-open this ticket? It makes no sense to me to revert LU-14541 when you've confirmed will reintroduce a data corruption. Can we find another solution please?

            spitzcor Cory Spitz added a comment - jhammond , can we re-open this ticket? It makes no sense to me to revert LU-14541 when you've confirmed will reintroduce a data corruption. Can we find another solution please?
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47204/
            Subject: LU-15815 llite: disable fast_read and workaround
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 201ade9442828fbb3bedb3b31154d51ead10af41

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47204/ Subject: LU-15815 llite: disable fast_read and workaround Project: fs/lustre-release Branch: master Current Patch Set: Commit: 201ade9442828fbb3bedb3b31154d51ead10af41
            jhammond John Hammond added a comment -

            Here is a reliable if inconvenient reproducer:

            $LUSTRE/tests/llmount.sh
            lctl set_param debug_mb=512 debug='+trace page mmap'
            lctl set_param llite.*.max_read_ahead_mb=0 # Not needed to reproduce.
            
            yum install openmpi openmpi-devel
            mv /usr/lib64/openmpi /mnt/lustre/openmpi
            ln -s /mnt/lustre/openmpi /usr/lib64/openmpi
            cd /mnt/lustre
            wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.9.tar.gz
            tar -xzf osu-micro-benchmarks-5.9.tar.gz
            cd osu-micro-benchmarks-5.9
            ./configure CC=/usr/lib64/openmpi/bin/mpicc CXX=/usr/lib64/openmpi/bin/mpicxx && make -j4
            
            while true; do echo 3 > /proc/sys/vm/drop_caches ; done &
            lctl clear
            while /mnt/lustre/openmpi/bin/mpirun --allow-run-as-root -np 2 --oversubscribe --host k /mnt/lustre/osu-micro-benchmarks-5.9//mpi/\
            collective/osu_alltoall -f -m 65536; do
              true
            done
            lctl dk > /tmp/osu_alltoall.dk
            kill %1
            
            ...
            [k:18404] *** Process received signal ***
            [k:18404] Signal: Bus error (7)
            [k:18404] Signal code: Non-existant physical address (2)
            [k:18404] Failing at address: 0x7fdb20a62e73
            [k:18404] [ 0] /lib64/libpthread.so.0(+0xf5f0)[0x7fdb1fcf45f0]
            [k:18404] [ 1] /lib64/ld-linux-x86-64.so.2(+0x19d72)[0x7fdb20f88d72]
            [k:18404] [ 2] /lib64/ld-linux-x86-64.so.2(+0x8ae2)[0x7fdb20f77ae2]
            [k:18404] [ 3] /lib64/ld-linux-x86-64.so.2(+0x14254)[0x7fdb20f83254]
            [k:18404] [ 4] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784]
            [k:18404] [ 5] /lib64/ld-linux-x86-64.so.2(+0x13b3b)[0x7fdb20f82b3b]
            [k:18404] [ 6] /lib64/libdl.so.2(+0xeeb)[0x7fdb2084beeb]
            [k:18404] [ 7] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784]
            [k:18404] [ 8] /lib64/libdl.so.2(+0x14ed)[0x7fdb2084c4ed]
            [k:18404] [ 9] /lib64/libdl.so.2(dlopen+0x31)[0x7fdb2084bf81]
            [k:18404] [10] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x59edd)[0x7fdb20aa8edd]
            [k:18404] [11] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3c7d1)[0x7fdb20a8b7d1]
            [k:18404] [12] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x7fdb20a8cd4a]
            [k:18404] [13] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x7fdb20a96cb6]
            [k:18404] [14] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x7fdb20a97166]
            [k:18404] [15] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x7fdb20a971c2]
            [k:18404] [16] /usr/lib64/openmpi/lib/openmpi/mca_ess_hnp.so(+0x4a48)[0x7fdb1eadea48]
            [k:18404] [17] /usr/lib64/openmpi/lib/libopen-rte.so.12(orte_init+0x168)[0x7fdb20d09398]
            [k:18404] [18] /mnt/lustre/openmpi/bin/mpirun[0x40449f]
            [k:18404] [19] /mnt/lustre/openmpi/bin/mpirun[0x40361d]
            [k:18404] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdb1f939505]
            [k:18404] [21] /mnt/lustre/openmpi/bin/mpirun[0x403539]
            [k:18404] *** End of error message ***
            Bus error (core dumped)
            # grep SIGBUS /tmp/osu_alltoall.dk
            00000080:00008000:3.0:1651610586.302378:0:18404:0:(vvp_io.c:1353:vvp_io_kernel_fault()) got addr 00007fdb20a62000 - SIGBUS
            

            Trimmed logs are attached as osu_alltoall_trimmed.dk. PID 18404 is osu_alltoall, 16737 is bash wrring to drop_caches.

            jhammond John Hammond added a comment - Here is a reliable if inconvenient reproducer: $LUSTRE/tests/llmount.sh lctl set_param debug_mb=512 debug='+trace page mmap' lctl set_param llite.*.max_read_ahead_mb=0 # Not needed to reproduce. yum install openmpi openmpi-devel mv /usr/lib64/openmpi /mnt/lustre/openmpi ln -s /mnt/lustre/openmpi /usr/lib64/openmpi cd /mnt/lustre wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.9.tar.gz tar -xzf osu-micro-benchmarks-5.9.tar.gz cd osu-micro-benchmarks-5.9 ./configure CC=/usr/lib64/openmpi/bin/mpicc CXX=/usr/lib64/openmpi/bin/mpicxx && make -j4 while true; do echo 3 > /proc/sys/vm/drop_caches ; done & lctl clear while /mnt/lustre/openmpi/bin/mpirun --allow-run-as-root -np 2 --oversubscribe --host k /mnt/lustre/osu-micro-benchmarks-5.9//mpi/\ collective/osu_alltoall -f -m 65536; do true done lctl dk > /tmp/osu_alltoall.dk kill %1 ... [k:18404] *** Process received signal *** [k:18404] Signal: Bus error (7) [k:18404] Signal code: Non-existant physical address (2) [k:18404] Failing at address: 0x7fdb20a62e73 [k:18404] [ 0] /lib64/libpthread.so.0(+0xf5f0)[0x7fdb1fcf45f0] [k:18404] [ 1] /lib64/ld-linux-x86-64.so.2(+0x19d72)[0x7fdb20f88d72] [k:18404] [ 2] /lib64/ld-linux-x86-64.so.2(+0x8ae2)[0x7fdb20f77ae2] [k:18404] [ 3] /lib64/ld-linux-x86-64.so.2(+0x14254)[0x7fdb20f83254] [k:18404] [ 4] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784] [k:18404] [ 5] /lib64/ld-linux-x86-64.so.2(+0x13b3b)[0x7fdb20f82b3b] [k:18404] [ 6] /lib64/libdl.so.2(+0xeeb)[0x7fdb2084beeb] [k:18404] [ 7] /lib64/ld-linux-x86-64.so.2(+0xf784)[0x7fdb20f7e784] [k:18404] [ 8] /lib64/libdl.so.2(+0x14ed)[0x7fdb2084c4ed] [k:18404] [ 9] /lib64/libdl.so.2(dlopen+0x31)[0x7fdb2084bf81] [k:18404] [10] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x59edd)[0x7fdb20aa8edd] [k:18404] [11] /usr/lib64/openmpi/lib/libopen-pal.so.13(+0x3c7d1)[0x7fdb20a8b7d1] [k:18404] [12] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_component_find+0x78a)[0x7fdb20a8cd4a] [k:18404] [13] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_components_register+0x56)[0x7fdb20a96cb6] [k:18404] [14] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_register+0x196)[0x7fdb20a97166] [k:18404] [15] /usr/lib64/openmpi/lib/libopen-pal.so.13(mca_base_framework_open+0x12)[0x7fdb20a971c2] [k:18404] [16] /usr/lib64/openmpi/lib/openmpi/mca_ess_hnp.so(+0x4a48)[0x7fdb1eadea48] [k:18404] [17] /usr/lib64/openmpi/lib/libopen-rte.so.12(orte_init+0x168)[0x7fdb20d09398] [k:18404] [18] /mnt/lustre/openmpi/bin/mpirun[0x40449f] [k:18404] [19] /mnt/lustre/openmpi/bin/mpirun[0x40361d] [k:18404] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdb1f939505] [k:18404] [21] /mnt/lustre/openmpi/bin/mpirun[0x403539] [k:18404] *** End of error message *** Bus error (core dumped) # grep SIGBUS /tmp/osu_alltoall.dk 00000080:00008000:3.0:1651610586.302378:0:18404:0:(vvp_io.c:1353:vvp_io_kernel_fault()) got addr 00007fdb20a62000 - SIGBUS Trimmed logs are attached as osu_alltoall_trimmed.dk. PID 18404 is osu_alltoall, 16737 is bash wrring to drop_caches.

            "John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47204
            Subject: LU-15815 llite: disable fast_read and workaround
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d1720e2b774bb4137324fb9e80c3d6151e3b9c0f

            gerrit Gerrit Updater added a comment - "John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47204 Subject: LU-15815 llite: disable fast_read and workaround Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d1720e2b774bb4137324fb9e80c3d6151e3b9c0f

            People

              panda Andrew Perepechko
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: