Details
Description
Running the reproducer from LU-14541 (rw_seq_cst_vs_drop_caches.c) fails about 50% of the time with Lustre 2.15.1 (both client and servers).
[root@mutt21:toss-5803-sigbus]# ./run_test /p/olaf{a,b}/faaland1/test/sigbustest ++ ./rw_seq_cst_vs_drop_caches /p/olafa/faaland1/test/sigbustest /p/olafb/faaland1/test/sigbustest u = 60, v = { 60, 59 } ./run_test: line 11: 120055 Aborted (core dumped) ./rw_seq_cst_vs_drop_caches $1 $2 ++ status=134 ++ signum=6 ++ case $signum in ++ echo FAIL with SIGBUS FAIL with SIGBUS
Although it's not yet confirmed to be the same issue, we have two users reporting jobs dying with a bus error intermittently, when using Lustre for I/O, which is what prompted me to run this against Lustre 2.15.1.
Fix provided in upcoming 2.15.3 release. Marking as duplicate but will only remove topllnl label when LLNL have confirmed effectiveness of fixes.