Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Dongyang Li <dongyangli@ddn.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/533d5466-45fa-4864-b7f8-91be02c74b8d
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/122939 - 4.18.0-553.89.1.el8_10.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/122939 - 4.18.0-553.89.1.el8_lustre.x86_64
After fs remounts readonly due to IO errors etc, umount crashes the server failing the
J_ASSERT(list_empty(&sbi->s_orphan))
[ 2138.284023] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [ 2138.390362] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 2138.391989] Lustre: Skipped 13 previous similar messages [ 2138.396022] LDISKFS-fs error (device dm-3): mb_free_blocks:1962: group 10, block 225360:freeing already freed block (bit 16880); block bitmap corrupt. [ 2138.398643] Aborting journal on device dm-3-8. [ 2138.399620] LDISKFS-fs (dm-3): Remounting filesystem read-only [ 2138.400761] LDISKFS-fs warning (device dm-3): ldiskfs_mb_generate_buddy:1239: group 10: block bitmap and bg descriptor inconsistent: 3466 vs 3468 free clusters 3946 in gd, 1 pa's [ 2138.403685] LDISKFS-fs error (device dm-3) in osd_trans_stop:2340: IO failure [ 2138.405139] LustreError: 178024:0:(osd_handler.c:2343:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -5 [ 2138.543692] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check [ 2138.673575] LustreError: 178198:0:(obd_sysfs.c:269:health_check_show()) lustre-MDT0000-osd: device reported unhealthy [ 2138.882232] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mmp test_10: @@@@@@ FAIL: mds1 is in a unhealthy state, got: \'NOT HEALTHY\' [ 2139.045746] Lustre: DEBUG MARKER: mmp test_10: @@@@@@ FAIL: mds1 is in a unhealthy state, got: 'NOT HEALTHY' [ 2139.254692] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2026-03-27/lustre-reviews_review-dne-subtest-change_122939_104_1853660e-67b8-4caf-b360-19cdfc75c178//mmp.test_10.debug_log.$(hostname -s).1774591669.log; dmesg > /autotest/autotest-2/2026-03-27/lustre-reviews_rev [ 2141.187399] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true [ 2141.447256] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 [ 2141.700069] LustreError: 178640:0:(osd_handler.c:2455:osd_drop_preallocated_space()) lustre-MDT0000: can't truncate: rc=-30 [ 2141.709895] LDISKFS-fs (dm-3): Inode 576 (00000000b61feaea): orphan list check failed! [ 2142.807525] LDISKFS-fs (dm-3): sb orphan head is 583 [ 2142.808414] sb_info orphan list: [ 2142.808984] inode dm-3:583 at 0000000086f83954: mode 100644, nlink 1, next 582 [ 2142.810203] inode dm-3:582 at 00000000dc887a10: mode 100644, nlink 1, next 581 [ 2142.811431] inode dm-3:581 at 000000001f4f448f: mode 100644, nlink 1, next 580 [ 2142.812653] inode dm-3:580 at 00000000bc6f781b: mode 100644, nlink 1, next 579 [ 2142.813868] inode dm-3:579 at 00000000d7aea715: mode 100644, nlink 1, next 578 [ 2142.815096] inode dm-3:578 at 00000000df14a9ba: mode 100644, nlink 1, next 577 [ 2142.816316] inode dm-3:577 at 000000000c18ec88: mode 100644, nlink 1, next 576 [ 2142.817562] inode dm-3:576 at 000000009cc4214f: mode 100644, nlink 1, next 0 [ 2142.818786] ------------[ cut here ]------------ [ 2142.819597] kernel BUG at /tmp/rpmbuild-lustre-jenkins-Ui9P732t/BUILD/lustre-2.17.51_23_gb36f204/ldiskfs/super.c:1186! [ 2142.821340] invalid opcode: 0000 [#1] SMP PTI [ 2142.822086] CPU: 0 PID: 178640 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.89.1.el8_lustre.x86_64 #1 [ 2142.824039] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 [ 2142.825143] RIP: 0010:ldiskfs_put_super+0x3b3/0x3c0 [ldiskfs] [ 2142.826121] Code: 85 00 04 00 00 48 8b 40 68 83 60 60 fb 0f b7 85 a0 00 00 00 66 41 89 44 24 3a 41 f6 45 50 01 0f 85 3e fd ff ff e9 2c fd ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8d af [ 2142.829076] RSP: 0018:ffffbf09c56279f8 EFLAGS: 00010293 [ 2142.829957] RAX: ffff9f23aaa01858 RBX: ffff9f23a9c89720 RCX: 0000000000000000 [ 2142.831128] RDX: 0000000000000000 RSI: ffff9f23fbc1e698 RDI: ffff9f23fbc1e698 [ 2142.832306] RBP: ffff9f23a9c89000 R08: 0000000000000000 R09: c0000000ffff7fff [ 2142.833467] R10: 0000000000000001 R11: ffffbf09c5627810 R12: ffff9f23a9c89720 [ 2142.834626] R13: ffff9f2395f29800 R14: ffffffffc13fcc70 R15: ffff9f23c23b8520 [ 2142.835785] FS: 00007f1685a95080(0000) GS:ffff9f23fbc00000(0000) knlGS:0000000000000000 [ 2142.837094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2142.838044] CR2: 000055a1bc589ee8 CR3: 0000000128e6a004 CR4: 0000000000170ef0 [ 2142.839210] Call Trace: [ 2142.839664] ? __die_body+0x1a/0x60 [ 2142.840270] ? die+0x2a/0x50 [ 2142.840788] ? do_trap+0xe7/0x110 [ 2142.841373] ? ldiskfs_put_super+0x3b3/0x3c0 [ldiskfs] [ 2142.842255] ? do_invalid_op+0x36/0x40 [ 2142.842905] ? ldiskfs_put_super+0x3b3/0x3c0 [ldiskfs] [ 2142.843786] ? invalid_op+0x14/0x20 [ 2142.844396] ? ldiskfs_put_super+0x3b3/0x3c0 [ldiskfs] [ 2142.845273] generic_shutdown_super+0x6c/0x110 [ 2142.846034] kill_block_super+0x21/0x50 [ 2142.846700] deactivate_locked_super+0x34/0x70 [ 2142.847456] cleanup_mnt+0x3b/0x70 [ 2142.848050] osd_umount+0x7d/0x1b0 [osd_ldiskfs] [ 2142.848845] osd_device_fini+0x1b8/0x210 [osd_ldiskfs] [ 2142.849723] obd_precleanup.isra.33+0x8e/0x280 [obdclass] [ 2142.850670] ? class_disconnect_exports+0x2d6/0x2f0 [obdclass] [ 2142.851680] class_cleanup+0x326/0x7d0 [obdclass] [ 2142.852536] class_process_config+0x3bb/0x20b0 [obdclass] [ 2142.853485] ? class_manual_cleanup+0x1c8/0x780 [obdclass] [ 2142.854446] ? __kmalloc+0x113/0x250 [ 2142.855063] class_manual_cleanup+0x2a2/0x780 [obdclass] [ 2142.855997] ? __queue_work+0x145/0x3f0 [ 2142.856661] osd_obd_disconnect+0x12c/0x140 [osd_ldiskfs] [ 2142.857586] lustre_put_lsi_free+0x10a/0x570 [obdclass] [ 2142.858513] lustre_put_lsi+0x17c/0x1e0 [obdclass] [ 2142.859368] server_put_super+0x263/0x14a0 [ptlrpc] [ 2142.860278] ? fsnotify_sb_delete+0x138/0x1c0 [ 2142.861020] ? __dentry_kill+0x121/0x170 [ 2142.861702] generic_shutdown_super+0x6c/0x110 [ 2142.862463] kill_anon_super+0x14/0x30 [ 2142.863108] deactivate_locked_super+0x34/0x70 [ 2142.863871] cleanup_mnt+0x3b/0x70 [ 2142.864466] task_work_run+0x8a/0xb0 [ 2142.865083] exit_to_usermode_loop+0xf4/0x100 [ 2142.865828] do_syscall_64+0x1cb/0x1d0 [ 2142.866481] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 2142.867322] RIP: 0033:0x7f16849ee8fb [ 2142.867956] Code: ff d0 48 89 c7 b8 3c 00 00 00 0f 05 48 8b 0d 84 65 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 65 39 00 f7 d8 64 89 01 48 [ 2142.870924] RSP: 002b:00007ffff1aa1508 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6 [ 2142.872154] RAX: 0000000000000000 RBX: 0000564cd4aea400 RCX: 00007f16849ee8fb [ 2142.873318] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000564cd4aeff50 [ 2142.874495] RBP: 0000000000000001 R08: 0000564cd4af9c90 R09: 00007f1684d85bc0 [ 2142.875670] R10: 0000000000000007 R11: 0000000000000202 R12: 0000564cd4aeff50 [ 2142.876835] R13: 00007f1685871184 R14: 0000564cd4af0100 R15: 00000000ffffffff [ 2142.878012] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common sb_edac kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl joydev i2c_i801 virtio_balloon pcspkr lpc_ich sunrpc ext4 mbcache jbd2 ahci libahci libata crc32c_intel virtio_net serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]