Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0, Lustre 2.15.0
-
ARM Clients
-
3
-
9223372036854775807
Description
sanity-pcc test_7a hangs in down_read for ARM client testing start on 19 DEC 2020 for Lustre 2.13.57.12 with https://testing.whamcloud.com/test_sets/99e85d3e-9f79-46c1-b0a0-be0eeca6ce02.
Looking at the latest failure at https://testing.whamcloud.com/test_sets/362b1cf0-6c36-4ef8-9627-817f93ab2818, the client2 (vm4) has the following in the console log
[205077.391310] Lustre: DEBUG MARKER: multiop /mnt/lustre/f7a.sanity-pcc OSMWUc [205077.877415] Lustre: *** cfs_fail_loc=1412, val=0*** [205078.434483] WARNING: CPU: 0 PID: 743007 at kernel/locking/rwsem.c:283 down_read+0x5c/0xb8 [205078.440491] Modules linked in: loop nfsv3 nfs_acl obdecho(OE) ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) lov(OE) fld(OE) osc(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc32_generic rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm sunrpc iw_cm ib_core vfat fat crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_balloon ip_tables ext4 mbcache jbd2 virtio_net net_failover virtio_blk failover virtio_mmio [last unloaded: libcfs] [205078.480885] CPU: 0 PID: 743007 Comm: multiop Kdump: loaded Tainted: G OE --------- - - 4.18.0-193.28.1.el8_2.aarch64 #1 [205078.489182] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [205078.493966] pstate: 40000005 (nZcv daif -PAN -UAO) [205078.497344] pc : down_read+0x5c/0xb8 [205078.499849] lr : down_read+0x28/0xb8 [205078.502347] sp : ffff000015c6fe10 [205078.504701] x29: ffff000015c6fe10 x28: 0000000000000002 [205078.508440] x27: 0000000000000000 x26: ffff8000be9dfc60 [205078.512107] x25: 000000009200004f x24: 0000000000000055 [205078.515790] x23: ffff8000be9dfc00 x22: 000000009200004f [205078.519509] x21: 0000ffff95650000 x20: ffff80006ac7cc00 [205078.523182] x19: ffff8000be9dfc60 x18: 0000000000000100 [205078.526885] x17: ffffffffffffff00 x16: 0000000000000000 [205078.530533] x15: 0000a1cc9b9b1c85 x14: 0000000000000000 [205078.534284] x13: 0000ffff95363e38 x12: 0000000000000000 [205078.537985] x11: 0000000000000028 x10: 0101010101010101 [205078.541644] x9 : 0000000000000000 x8 : 00000000000000de [205078.545314] x7 : 0000000000000000 x6 : 0000000000000000 [205078.548970] x5 : 0000000000000000 x4 : ffff0000109e0000 [205078.552634] x3 : ffff0000109819b0 x2 : 0000000000000000 [205078.556333] x1 : ffff8000be9dfc60 x0 : ffffffffffffff00 [205078.560075] Call trace: [205078.561815] down_read+0x5c/0xb8 [205078.564088] do_page_fault+0x120/0x470 [205078.566938] do_mem_abort+0x50/0xa8 [205078.569436] el0_da+0x24/0x28 [205078.571525] ---[ end trace 417884e623adee39 ]---
All the servers have the same trace
[111782.156099] sysrq: SysRq : Changing Loglevel [111782.156972] sysrq: Loglevel set to 8 [111782.158967] sysrq: SysRq : Show State [111782.159637] task PC stack pid father [111782.160642] systemd S 0 1 0 0x00000000 [111782.161590] Call Trace: [111782.162092] __schedule+0x2a6/0x700 [111782.162759] schedule+0x38/0xa0 [111782.163307] schedule_hrtimeout_range_clock+0xf7/0x110 [111782.164210] ? ep_scan_ready_list.constprop.22+0x20c/0x230 [111782.165145] ep_poll+0x3c8/0x410 [111782.165741] ? wake_up_q+0x80/0x80 [111782.166329] do_epoll_wait+0xb0/0xd0 [111782.166954] __x64_sys_epoll_wait+0x1a/0x20 [111782.167718] do_syscall_64+0x5b/0x1a0 [111782.168363] entry_SYSCALL_64_after_hwframe+0x65/0xca [111782.169239] RIP: 0033:0x7f75ca0b3257 [111782.169878] Code: Bad RIP value. [111782.170433] RSP: 002b:00007fff37502d90 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8 [111782.171683] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f75ca0b3257 [111782.172879] RDX: 0000000000000056 RSI: 00007fff37502dd0 RDI: 0000000000000004 [111782.174060] RBP: 00007fff37502dd0 R08: 0000000000000000 R09: 7465677261742e79 [111782.175249] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000056 [111782.176432] R13: 00000000ffffffff R14: 00007fff37502dd0 R15: 0000000000000001