Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.16.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/33a011e7-eb8f-4b7c-a3d7-b77e6350ffb6

test_16 failed with the following crash on master 10 times out of 123 runs:

trevis-24vm4 MDS crashed during replay-dual test_16

[ 2046.669261] Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
[ 2046.670797] Lustre: 77088:0:(genops.c:1579:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 95a79532-73af-49b8-a4c8-d22c598e5614@<unknown>
[ 2046.673531] Lustre: lustre-MDT0000: disconnecting 1 stale clients
[ 2046.676737] LustreError: 77088:0:(osd_object.c:765:osd_declare_destroy()) ASSERTION( dt_object_exists(dt) ) failed: 
[ 2046.678755] LustreError: 77088:0:(osd_object.c:765:osd_declare_destroy()) LBUG
[ 2046.680139] Pid: 77088, comm: tgt_recover_0 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Tue Aug 1 06:59:39 UTC 2023
[ 2046.682036] Call Trace TBD:
[ 2046.682768] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[ 2046.683842] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[ 2046.684807] [<0>] osd_declare_destroy+0x313/0x470 [osd_zfs]
[ 2046.685989] [<0>] out_destroy_add_exec+0x60/0x250 [ptlrpc]
[ 2046.687594] [<0>] update_recovery_exec.isra.22+0x1017/0x1f00 [ptlrpc]
[ 2046.688937] [<0>] distribute_txn_replay_handle+0x332/0xd60 [ptlrpc]
[ 2046.690247] [<0>] replay_request_or_update.isra.32+0x3dc/0xa20 [ptlrpc]
[ 2046.691612] [<0>] target_recovery_thread+0x742/0x1330 [ptlrpc]
[ 2046.692838] [<0>] kthread+0x134/0x150

Test session details:
clients: https://build.whamcloud.com/job/lustre-master-next/729 - 4.18.0-372.32.1.el8_6.x86_64
servers: https://build.whamcloud.com/job/lustre-master-next/729 - 4.18.0-372.32.1.el8_lustre.x86_64

This started failing on 2023-08-30 after a large set of patch landings:

40c404129b LU-17038 tests: remove unused compile.sh script
57a671d991 LU-17038 tests: remove mlink utility
a0d8023d07 LU-12678 lnet: discard lnet_kvaddr_to_page
e16d5d7d6f LU-17043 enc: fix osd lookup cache for long encrypted names
b01f58e3a9 LU-15367 llite: iotrace standardization
a897803cc7 LU-17039 build: cleanup ib_dma_map_sg
4774d1fab1 LU-17038 tests: remove munlink utility
a52545afed LU-16510 build: check if CONFIG_FORTIFY_SOURCE is defined
479f9585b0 LU-17036 utils: make sure resize option is legit
f4c336ad93 LU-17031 build: fix refefine __compiletime_strlen error
e40be98400 LU-17030 llite: allow setting max_cached_mb to a %
b557fb21c8 LU-10885 docs: note flock now being enabled by default
9758129177 LU-17015 gss: support large kerberos token on client
7f60b2b558 LU-17006 lnet: set up routes for going across subnets
8a9c503c00 LU-16766 obdclass: trim kernel thread names in jobids
39df815cd6 LU-17020 kernel: update RHEL 9.2 [5.14.0-284.25.1.el9_2]
b1739ba3fa LU-17013 lov: fill FIEMAP_EXTENT_LAST flag
f75ac594a2 LU-17011 utils: monotonic clock in lfs mirror
77b133766b LU-17009 tests: fix runtests to read file name with backslash
1db59b7b60 LU-17000 lnet: remove redundant errno check in liblnetconfig.c
8e53a0ea59 LU-16866 tests: Use wait_update to check LNet recovery state
05b289450b LU-17000 misc: remove Coverity annotations
3c0b1fbf34 LU-16984 tests: replay-dual/31 checks file from DIR2
d684885098 LU-16961 clang: plugins and build system integration
a1bc7a84ce LU-16605 lfs: Add -n option to fid2path
74140e5df4 LU-16943 tests: fix replay-single/135 under hard failure mode
0439aaadb1 LU-16936 auster: add --client-only option
7ff1a88126 LU-16883 ldiskfs: update for ext4-delayed-iput for RHEL9.0
b9ce342ee1 LU-16896 flr: resync should not change file size
c9e752c141 LU-16906 build: Server for newer SUSE 15 SP3 kernels
2f0ebff4ad LU-16477 ldiskfs: Add ext4-enc-flag patch for SUSE 15 SP5
d6673e5456 LU-16821 llite: report 1MiB directory blocksize
8d24aa6b8e LU-16816 obdclass: make import_event more robust
99144a595b LU-16232 script: fix the argument parse
88141538c4 LU-9859 libcfs: discard cfs_gettok and cfs_str2num_check
8785f25b05 LU-16552 test: add new lnet test for Multi-Rail setups
d0a722cb8f LU-16374 ldiskfs: implement security.encdata xattr
fe5706e0c1 LU-16235 hsm: check CDT state before adding actions llog
7270e16fcb LU-15526 mdt: enable remote PDO lock
324aa79eb5 LU-13730 tests: add file mirroring to racer
1288681bb8 LU-14361 statahead: add statahead advise IOCTL
e656cccbdc LU-14156 utils: mirror split to check for last in-sync early
e2738d294d LU-12645 llite: Move readahead debug before exit
1f4151ba71 LU-6142 lov: cleanup unneeded macros from lov_request.c
ef414ce6e7 LU-6142 ptlrpc: Fix style issues for layout.c
017cb44d26 LU-6142 ptlrpc: Fix style issues for events.c
00c74b245f LU-16847 ldiskfs: refactor code.
afe5813a48 LU-16077 ptlrpc: Fix ptlrpc_body_v2 with pb_uid/pb_gid
91a3b286ba LU-16827 obdfilter: Fix obdfilter-survery/1a
5c59f8551a LU-11457 osd-ldiskfs: scrub FID reuse
d69b511af6 LU-16097 tests: skip quota subtests in interop
576928b2cd LU-13306 mgs: support large NID for mgs_write_log_osc_to_lov

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_16 - trevis-24vm4 crashed during replay-dual test_16

Attachments

Issue Links

is related to

LU-17242 Clean up and Improve Lustre Debugging

Open

is related to

LU-15526 PDO lock for object on remote MDT

Resolved

replay-dual test_16 ZFS MDS crash: osd_declare_destroy() ASSERTION(dt_object_exists)

Details

Description

Attachments

Issue Links

Activity

People

Dates