[LU-17332] sanity test_820: kernel BUG at fs/jbd2/transaction.c:378 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.16.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/58a6b07c-fb1f-4a2d-ac3c-d7578d6b134f

test_820 failed with the following error:

trevis-28vm3 crashed during sanity test_820

[26282.338565] Lustre: server umount lustre-OST0004 complete
[26282.411017] ------------[ cut here ]------------
[26282.412061] kernel BUG at fs/jbd2/transaction.c:378!
[26282.413171] invalid opcode: 0000 [#1] SMP PTI
[26282.414068] CPU: 1 PID: 784404 Comm: kworker/1:5 4.18.0-477.15.1.el8_lustre.x86_64 #1
[26282.416473] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[26282.435639] Call Trace:
[26282.438083]  jbd2__journal_start+0xee/0x1f0 [jbd2]
[26282.439047]  jbd2_journal_start+0x19/0x20 [jbd2]
[26282.439979]  flush_stashed_stats_work+0x36/0x90 [ldiskfs]
[26282.441086]  process_one_work+0x1a7/0x360
[26282.442753]  worker_thread+0x30/0x390
[26282.444311]  kthread+0x134/0x150

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4445 - 4.18.0-477.15.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4445 - 4.18.0-477.15.1.el8_lustre.x86_64

This started around 2023-07-21 +/- 7 days. It looks like the workqueue is somehow running after the journal is cleaned up, since the BUG is

int jbd2_journal_destroy(journal_t *journal)
{       
        /* Wait for the commit thread to wake up and die. */
        journal_kill_thread(journal);
        :
}

static void journal_kill_thread(journal_t *journal)
{               
        journal->j_flags |= JBD2_UNMOUNT;
        :
}

static int start_this_handle(journal_t *journal, handle_t *handle,
                             gfp_t gfp_mask)
{
        :
        BUG_ON(journal->j_flags & JBD2_UNMOUNT);
        :
}

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_820 - trevis-28vm3 crashed during sanity test_820

Attachments

Issue Links

is related to

LU-16032 Truncate for large objects can lead to a thread hung

Resolved

Activity

People

Assignee:: Dongyang Li

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 04/Dec/23 8:56 AM

Updated:: 03/Jan/24 2:20 PM

Resolved:: 03/Jan/24 2:20 PM