Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
Lustre 2.18.0
-
None
-
3
-
9223372036854775807
Description
Sometime in March we started to have large-scale tets 3a crashes on interop with 2.15 servers:
[ 6991.492050] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == large-scale test 3a: recovery time, 2 clients ========= 03:12:04 \(1777518724\) [ 6991.731468] Lustre: DEBUG MARKER: == large-scale test 3a: recovery time, 2 clients ========= 03:12:04 (1777518724) [ 7067.178608] Lustre: DEBUG MARKER: /usr/sbin/lctl mark 1 : Starting failover on mds1 [ 7067.685548] Lustre: DEBUG MARKER: 1 : Starting failover on mds1 [ 7068.172506] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true [ 7068.454029] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 [ 7068.596718] Lustre: Failing over lustre-MDT0000 [ 7068.778816] LustreError: 534424:0:(lquota_entry.c:119:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: [ 7068.780650] LustreError: 534424:0:(lquota_entry.c:119:lqe_iter_cb()) LBUG [ 7068.781739] CPU: 0 PID: 534424 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.117.1.el8_lustre.x86_64 #1 [ 7068.783568] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014 [ 7068.784606] Call Trace: [ 7068.785049] dump_stack+0x41/0x60 [ 7068.785573] lbug_with_loc.cold.7+0x5/0x43 [libcfs] [ 7068.786322] lqe_iter_cb+0x133/0x140 [lquota] [ 7068.787005] cfs_hash_for_each_tight+0x122/0x310 [obdclass] [ 7068.787908] lquota_site_free+0xfe/0x2a0 [lquota] [ 7068.788628] qsd_qtype_fini+0x352/0x460 [lquota] [ 7068.789343] qsd_fini+0x1fe/0x420 [lquota] [ 7068.789984] osd_shutdown+0x42/0x110 [osd_ldiskfs] [ 7068.790736] osd_process_config+0x32e/0x380 [osd_ldiskfs] [ 7068.791564] lod_process_config+0x205/0xe80 [lod] [ 7068.792298] ? srso_alias_return_thunk+0x5/0xfcdfd [ 7068.793027] ? wait_for_completion+0xd2/0x100 [ 7068.793694] mdd_process_config+0x201/0x660 [mdd] [ 7068.794439] mdt_stack_fini+0x351/0xa80 [mdt] [ 7068.795159] mdt_device_fini+0xa3e/0xe90 [mdt] [ 7068.795869] obd_precleanup.isra.33+0x8e/0x280 [obdclass] [ 7068.796783] ? srso_alias_return_thunk+0x5/0xfcdfd [ 7068.797704] ? class_disconnect_exports+0x197/0x300 [obdclass] [ 7068.798867] class_cleanup+0x32b/0x7e0 [obdclass] [ 7068.799816] class_process_config+0x3bb/0x20a0 [obdclass] [ 7068.800899] class_manual_cleanup+0x2ab/0x780 [obdclass] [ 7068.801744] ? srso_alias_return_thunk+0x5/0xfcdfd [ 7068.802570] server_put_super+0xdb5/0x14a0 [ptlrpc] [ 7068.803728] ? srso_alias_return_thunk+0x5/0xfcdfd [ 7068.804581] ? srso_alias_return_thunk+0x5/0xfcdfd [ 7068.805320] ? fsnotify_sb_delete+0x138/0x1c0 [ 7068.806003] generic_shutdown_super+0x6c/0x110 [ 7068.806873] kill_anon_super+0x14/0x30 [ 7068.807570] deactivate_locked_super+0x34/0x70 [ 7068.808353] cleanup_mnt+0x3b/0x70 [ 7068.808903] task_work_run+0x8a/0xb0 [ 7068.809462] exit_to_usermode_loop+0xf4/0x100 [ 7068.810286] do_syscall_64+0x1cb/0x1d0 [ 7068.811014] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 7068.811946] RIP: 0033:0x7fa7efea68fb [ 7068.812578] Code: ff d0 48 89 c7 b8 3c 00 00 00 0f 05 48 8b 0d 84 65 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 65 39 00 f7 d8 64 89 01 48 [ 7068.815959] RSP: 002b:00007ffe7abe6498 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 7068.817303] RAX: 0000000000000000 RBX: 0000558221e369c0 RCX: 00007fa7efea68fb [ 7068.818500] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558221e3c3c0 [ 7068.819746] RBP: 0000000000000000 R08: 0000558221e3cf20 R09: 0000558221e31010 [ 7068.821055] R10: 0000000000000000 R11: 0000000000000246 R12: 0000558221e3c3c0 [ 7068.822378] R13: 00007fa7f0d29184 R14: 0000558221e36ba0 R15: 00000000ffffffff [ 7068.823764] Kernel panic - not syncing: LBUG
Example failures:
(first ever recorded): https://testing.whamcloud.com/test_sets/2c90a510-d83e-4017-9f7d-ff2cb78ea315
https://testing.whamcloud.com/test_sets/435eee2f-daf8-449c-973f-64473d2d65a1
https://testing.whamcloud.com/test_sets/3edd37f5-669e-4fb9-8471-af7692a993bb
https://testing.whamcloud.com/test_sets/c1d24572-e40d-4acd-b78f-174783b2dec2
https://testing.whamcloud.com/test_sets/ca3ff105-53ae-4901-8ceb-26367a0db604
https://testing.whamcloud.com/test_sets/bb0699b0-87c8-4d6f-ab55-f6b74c017d45
https://testing.whamcloud.com/test_sets/8552ec19-5d4a-4baa-912c-94e0373451ed