Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19862

jbd2_journal_commit_transaction BUG with rhel10 and dirty metadata buffer

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • Lustre 2.18.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

      Now that RHEL10 is in testing roster for master, we see that conf-sanity regularly dies in jbd2_journal_commit_transaction BUG like this on umount (different tests):

       [31450.854523] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
      [31451.152706] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25239). There's a risk of filesystem corruption in case of system crash.
      [31451.153459] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25243). There's a risk of filesystem corruption in case of system crash.
      [31451.153681] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25243). There's a risk of filesystem corruption in case of system crash.
      [31451.154483] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25243). There's a risk of filesystem corruption in case of system crash.
      [31451.154678] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25235). There's a risk of filesystem corruption in case of system crash.
      [31451.154958] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 25246). There's a risk of filesystem corruption in case of system crash.
      [31451.155203] ------------[ cut here ]------------
      [31451.155382] kernel BUG at fs/jbd2/commit.c:1026!
      [31451.155553] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [31451.167136] CPU: 1 UID: 0 PID: 1673316 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G        W  OE     -------  ---  6.12.0-55.43.1_lustre.el10.x86_64 #1
      [31451.169101] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
      [31451.170011] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9_5.1 04/01/2014
      [31451.171050] RIP: 0010:jbd2_journal_commit_transaction+0x11b6/0x15f0 [jbd2]
      [31451.172097] Code: 94 24 98 00 00 00 8b b4 24 84 00 00 00 48 89 df e8 2f 99 00 00 e9 4c fd ff ff 44 89 ee 48 89 df e8 6f 88 00 00 e9 25 fd ff ff <0f> 0b be 01 00 00 00 48 89 df e8 5b 88 00 00 e9 ad fc ff ff 4c 89
      [31451.174765] RSP: 0018:ff50364e49d97cd0 EFLAGS: 00010202
      [31451.175548] RAX: 0000000000116003 RBX: ff4906544eb39800 RCX: ff4906544004b000
      [31451.176594] RDX: 0000000000000001 RSI: ff50364e49d97c98 RDI: ff4906544eb39c14
      [31451.177636] RBP: ff49065490b11000 R08: ff4906549ac14340 R09: 00001c99870dbc00
      [31451.178682] R10: ff49065440992010 R11: 000000000016e360 R12: ff4906537113a3a8
      [31451.179727] R13: ff49065475969708 R14: ff4906544eb39c14 R15: ff49065475969710
      [31451.180764] FS:  0000000000000000(0000) GS:ff490654bbb00000(0000) knlGS:0000000000000000
      [31451.181932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [31451.182799] CR2: 0000563902629780 CR3: 0000000105bf0001 CR4: 0000000000771ef0
      [31451.183852] PKRU: 55555554
      [31451.184305] Call Trace:
      [31451.184701]  <TASK>
      [31451.185044]  ? srso_alias_return_thunk+0x5/0xfbef5
      [31451.185768]  ? show_trace_log_lvl+0x255/0x2f0
      [31451.186434]  ? show_trace_log_lvl+0x255/0x2f0
      [31451.187092]  ? kjournald2+0xaa/0x250 [jbd2]
      [31451.187735]  ? jbd2_journal_commit_transaction+0x11b6/0x15f0 [jbd2]
      [31451.188677]  ? __die_body.cold+0x8/0x12
      [31451.189265]  ? die+0x2e/0x50
      [31451.189730]  ? do_trap+0xca/0x110
      [31451.190258]  ? do_error_trap+0x65/0x80
      [31451.190830]  ? jbd2_journal_commit_transaction+0x11b6/0x15f0 [jbd2]
      [31451.191758]  ? exc_invalid_op+0x50/0x70
      [31451.192361]  ? jbd2_journal_commit_transaction+0x11b6/0x15f0 [jbd2]
      [31451.193286]  ? asm_exc_invalid_op+0x1a/0x20
      [31451.193921]  ? jbd2_journal_commit_transaction+0x11b6/0x15f0 [jbd2]
      [31451.194845]  ? srso_alias_return_thunk+0x5/0xfbef5
      [31451.195573]  ? srso_alias_return_thunk+0x5/0xfbef5
      [31451.196294]  ? finish_task_switch.isra.0+0x99/0x2b0
      [31451.197019]  kjournald2+0xaa/0x250 [jbd2]
      [31451.197638]  ? __pfx_autoremove_wake_function+0x10/0x10
      [31451.198426]  ? __pfx_kjournald2+0x10/0x10 [jbd2]
      [31451.199122]  kthread+0xcf/0x100
      [31451.199623]  ? __pfx_kthread+0x10/0x10
      [31451.200201]  ret_from_fork+0x31/0x50
      [31451.200760]  ? __pfx_kthread+0x10/0x10
      [31451.201332]  ret_from_fork_asm+0x1a/0x30
      [31451.201929]  </TASK>
      [31451.202291] Modules linked in: ofd(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey ec(OE) nfsv3 nfs_acl tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_localio netfs rfkill intel_rapl_msr intel_rapl_common sunrpc kvm_amd ccp kvm iTCO_wdt iTCO_vendor_support virtio_net net_failover failover pcspkr i2c_i801 virtio_balloon i2c_smbus lpc_ich joydev fuse loop dm_mod nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci ext4 mbcache jbd2 ahci libahci libata crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw

      Example failures:

      https://testing.whamcloud.com/test_sets/31a697a5-8113-4951-8c57-560520740455

      https://testing.whamcloud.com/test_sets/df2aead1-0975-44c1-bb91-e2cc7cdd9182

      https://testing.whamcloud.com/test_sets/f681627f-35fd-45fe-8626-dc86b790a13e

      https://testing.whamcloud.com/test_sets/a25747e7-6f51-4c43-ad15-c405d9bdfd72

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: