[LU-14737] Crash sanity-flr (test_70) Created: 06/Jun/21  Updated: 07/Jun/21  Resolved: 07/Jun/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Arshad Hussain Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This was seen on run : https://testing.whamcloud.com/test_sets/89c7487b-d679-40c9-9ac6-68e2fbfcfd0f
Kernel Crash: https://testing.whamcloud.com/test_logs/105a1a83-85ed-4f43-b7fe-f38dda113d56/show_text

This could be one-off incident. Not sure this is happening evertime. Local run is passing.

[ 8551.885179] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-flr test 70: mirror create and split race ================================================== 06:58:52 \(1622876332\)
[ 8552.278315] Lustre: DEBUG MARKER: == sanity-flr test 70: mirror create and split race ================================================== 06:58:52 (1622876332)
[ 8556.583577] LustreError: 432569:0:(mdt_handler.c:718:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x200003ab2:0x7c:0x0] ACL: rc = -2
[ 8558.507968] LustreError: 433923:0:(mdt_handler.c:718:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x200003ab2:0x94:0x0] ACL: rc = -2
[ 8588.143934] LustreError: 432570:0:(mdt_handler.c:718:mdt_pack_acl2body()) lustre-MDT0000: unable to read [0x200003ab2:0x21d:0x0] ACL: rc = -2
[ 8593.930363] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[ 8593.931670] PGD 0 P4D 0 
[ 8593.932075] Oops: 0000 [#1] SMP PTI
[ 8593.932617] CPU: 1 PID: 432570 Comm: mdt00_002 Kdump: loaded Tainted: P OE --------- - - 4.18.0-240.22.1.el8_lustre.x86_64 #1
[ 8593.934490] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 8593.935456] RIP: 0010:lod_obj_is_striped+0x80/0x100 [lod]
[ 8593.936280] Code: 74 3c 48 8b 07 8b 40 1c 25 00 f0 00 00 3d 00 40 00 00 74 2d f6 47 70 10 75 24 0f b7 47 64 66 85 c0 74 1b 48 8b 97 88 00 00 00 <48> 83 7a 70 00 74 19 66 83 7a 24 00 b8 01 00 00 00 74 33 c3 31 c0
[ 8593.939076] RSP: 0018:ffff9da680ba3b28 EFLAGS: 00010206
[ 8593.939874] RAX: 0000000000000003 RBX: ffff90b613acd390 RCX: 0000000000000000
[ 8593.940945] RDX: 0000000000000000 RSI: ffff90b60b960d20 RDI: ffff90b613acd390
[ 8593.942016] RBP: ffff90b614615d40 R08: 0000000000000000 R09: ffff90b63b434360
[ 8593.943089] R10: ffff90b5cfc03400 R11: 0000000000000008 R12: ffff90b601f4f000
[ 8593.944165] R13: ffff90b60eb5fa80 R14: ffff90b5fa762000 R15: ffff90b5fa762018
[ 8593.945238] FS: 0000000000000000(0000) GS:ffff90b63db00000(0000) knlGS:0000000000000000
[ 8593.946452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8593.947325] CR2: 0000000000000070 CR3: 0000000075a0a003 CR4: 00000000001606e0
[ 8593.948399] Call Trace:
[ 8593.948814] lod_declare_destroy+0x3c1/0x5b0 [lod]
[ 8593.949630] ? mdd_env_info+0x15/0x70 [mdd]
[ 8593.950281] mdd_declare_finish_unlink+0xa9/0x250 [mdd]
[ 8593.951091] mdd_unlink+0x45a/0xb20 [mdd]
[ 8593.951851] mdt_reint_unlink+0xb09/0x12a0 [mdt]
[ 8593.952581] mdt_reint_rec+0x11f/0x250 [mdt]
[ 8593.953252] mdt_reint_internal+0x498/0x780 [mdt]
[ 8593.953996] mdt_reint+0x5e/0x100 [mdt]
[ 8593.954965] tgt_request_handle+0xc78/0x1910 [ptlrpc]
[ 8593.955790] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc]
[ 8593.956726] ptlrpc_main+0xba2/0x14a0 [ptlrpc]
[ 8593.957436] ? __schedule+0x2cc/0x700
[ 8593.958037] ? ptlrpc_wait_event+0x500/0x500 [ptlrpc]
[ 8593.958823] kthread+0x112/0x130
[ 8593.959328] ? kthread_flush_work_fn+0x10/0x10
[ 8593.960017] ret_from_fork+0x35/0x40
[ 8593.960574] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic ata_piix 8139too libata crc32c_intel serio_raw virtio_blk 8139cp mii
[ 8593.971411] CR2: 0000000000000070

 



 Comments   
Comment by Arshad Hussain [ 06/Jun/21 ]
(gdb) list *(lod_obj_is_striped + 0x80)
0x489ba is in dt_write_unlock (/root/lustre-dev/lustre-release/lustre/include/dt_object.h:2376).
2371 }
2372 
2373 static inline void dt_write_unlock(const struct lu_env *env,
2374 struct dt_object *dt)
2375 {
2376 LASSERT(dt);
2377 LASSERT(dt->do_ops);
2378 LASSERT(dt->do_ops->do_write_unlock);
2379 dt->do_ops->do_write_unlock(env, dt);
2380 }
(gdb) 
Comment by Alex Zhuravlev [ 06/Jun/21 ]

I think this is a duplicate of LU-14648 ?

Comment by Arshad Hussain [ 07/Jun/21 ]

Yes it is duplicate. Thanks

Comment by Arshad Hussain [ 07/Jun/21 ]

Duplicate of LU-14648

Generated at Sat Feb 10 03:12:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.