Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/58a6b07c-fb1f-4a2d-ac3c-d7578d6b134f
test_820 failed with the following error:
trevis-28vm3 crashed during sanity test_820 [26282.338565] Lustre: server umount lustre-OST0004 complete [26282.411017] ------------[ cut here ]------------ [26282.412061] kernel BUG at fs/jbd2/transaction.c:378! [26282.413171] invalid opcode: 0000 [#1] SMP PTI [26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1 [26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [26282.435639] Call Trace: [26282.438083] jbd2__journal_start+0xee/0x1f0 [jbd2] [26282.439047] jbd2_journal_start+0x19/0x20 [jbd2] [26282.439979] flush_stashed_stats_work+0x36/0x90 [ldiskfs] [26282.441086] process_one_work+0x1a7/0x360 [26282.442753] worker_thread+0x30/0x390 [26282.444311] kthread+0x134/0x150
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4445 - 4.18.0-477.15.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4445 - 4.18.0-477.15.1.el8_lustre.x86_64
This started around 2023-07-21 +/- 7 days. It looks like the workqueue is somehow running after the journal is cleaned up, since the BUG is
int jbd2_journal_destroy(journal_t *journal) { /* Wait for the commit thread to wake up and die. */ journal_kill_thread(journal); : } static void journal_kill_thread(journal_t *journal) { journal->j_flags |= JBD2_UNMOUNT; : } static int start_this_handle(journal_t *journal, handle_t *handle, gfp_t gfp_mask) { : BUG_ON(journal->j_flags & JBD2_UNMOUNT); : }
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_820 - trevis-28vm3 crashed during sanity test_820
Attachments
Issue Links
- is related to
-
LU-16032 Truncate for large objects can lead to a thread hung
- Resolved