Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0
-
None
-
3
-
9223372036854775807
Description
sanity test_60a hangs when running llog_test.c. This test started hanging on 18 February 2019 with 2.12.51.51.
Looking at the logs for one of the failures, https://testing.whamcloud.com/test_sets/41ea8004-348f-11e9-a05f-52540065bddc , the last thing seen in the client test_log is
trevis-10vm12: + RC=0 trevis-10vm12: + eval '/sbin/lctl <<-EOF || RC=2 trevis-10vm12: attach llog_test llt_name llt_uuid trevis-10vm12: ignore_errors trevis-10vm12: setup MGS trevis-10vm12: device llt_name trevis-10vm12: cleanup trevis-10vm12: detach trevis-10vm12: EOF' trevis-10vm12: ++ /sbin/lctl
Looking at the MDS console log, we see the test in hung in llog_test 10h
[ 2700.758677] Lustre: 29346:0:(llog_test.c:2045:llog_test_10()) 10h: write 64767 more log records [ 2700.758851] Lustre: 29365:0:(llog_test.c:1469:cat_check_old_cb()) seeing record at index 2 - [0x1:0x4e:0x0] in log [0xa:0x11:0x0] [ 2700.760785] LustreError: 29346:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1317 sleeping [ 2880.228053] INFO: task jbd2/dm-3-8:6113 blocked for more than 120 seconds. [ 2880.228819] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2880.229589] jbd2/dm-3-8 D ffff9862a41e9040 0 6113 2 0x00000080 [ 2880.230319] Call Trace: [ 2880.230599] [<ffffffff85767c49>] schedule+0x29/0x70 [ 2880.231117] [<ffffffffc039c2bc>] jbd2_journal_commit_transaction+0x23c/0x19b0 [jbd2] [ 2880.231941] [<ffffffff8502a59e>] ? __switch_to+0xce/0x580 [ 2880.232676] [<ffffffff850d0880>] ? finish_task_switch+0x50/0x1c0 [ 2880.233269] [<ffffffff850c2d00>] ? wake_up_atomic_t+0x30/0x30 [ 2880.233825] [<ffffffff8576778f>] ? __schedule+0x3ff/0x890 [ 2880.234424] [<ffffffff850aa0de>] ? try_to_del_timer_sync+0x5e/0x90 [ 2880.235057] [<ffffffffc03a2e89>] kjournald2+0xc9/0x260 [jbd2] [ 2880.235611] [<ffffffff850c2d00>] ? wake_up_atomic_t+0x30/0x30 [ 2880.236169] [<ffffffffc03a2dc0>] ? commit_timeout+0x10/0x10 [jbd2] [ 2880.236834] [<ffffffff850c1c31>] kthread+0xd1/0xe0 [ 2880.237302] [<ffffffff850c1b60>] ? insert_kthread_work+0x40/0x40 [ 2880.237882] [<ffffffff85774c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 2880.238500] [<ffffffff850c1b60>] ? insert_kthread_work+0x40/0x40
The sanity test 60a hang at https://testing.whamcloud.com/test_sets/80bd5466-339d-11e9-bd83-52540065bddc has a different call trace in the MDS console log
[ 4208.005408] Lustre: 9960:0:(llog_test.c:2045:llog_test_10()) 10h: write 64767 more log records [ 4208.005528] Lustre: 13221:0:(llog_test.c:1469:cat_check_old_cb()) seeing record at index 2 - [0x1:0x35:0x0] in log [0xa:0x15:0x0] [ 4208.008970] LustreError: 9960:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1317 sleeping [ 4440.279385] INFO: task txg_quiesce:17743 blocked for more than 120 seconds. [ 4440.280697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4440.282045] txg_quiesce D ffff9910f6268000 0 17743 2 0x00000080 [ 4440.283468] Call Trace: [ 4440.283971] [<ffffffffa2f67c49>] schedule+0x29/0x70 [ 4440.284879] [<ffffffffc048c2d5>] cv_wait_common+0x125/0x150 [spl] [ 4440.285989] [<ffffffffa28c2d00>] ? wake_up_atomic_t+0x30/0x30 [ 4440.287197] [<ffffffffc048c315>] __cv_wait+0x15/0x20 [spl] [ 4440.288234] [<ffffffffc05f2c2b>] txg_quiesce_thread+0x2fb/0x410 [zfs] [ 4440.289475] [<ffffffffc05f2930>] ? txg_init+0x2b0/0x2b0 [zfs] [ 4440.290514] [<ffffffffc0487063>] thread_generic_wrapper+0x73/0x80 [spl] [ 4440.291768] [<ffffffffc0486ff0>] ? __thread_exit+0x20/0x20 [spl] [ 4440.292837] [<ffffffffa28c1c31>] kthread+0xd1/0xe0 [ 4440.293814] [<ffffffffa28c1b60>] ? insert_kthread_work+0x40/0x40 [ 4440.294891] [<ffffffffa2f74c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 4440.296120] [<ffffffffa28c1b60>] ? insert_kthread_work+0x40/0x40
The patch for LU-11591, https://review.whamcloud.com/#/c/33683/ , was the last patch to modify llog_test test 10 and landed around the time sanity test 60a started failing.
Logs for additional failures can be found at
https://testing.whamcloud.com/test_sets/6f903f48-348b-11e9-ae87-52540065bddc
https://testing.whamcloud.com/test_sets/86731228-3376-11e9-ae87-52540065bddc