Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14346

sanity-pcc test 7a hangs in multiop/down_read

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0, Lustre 2.15.0
    • ARM Clients
    • 3
    • 9223372036854775807

    Description

      sanity-pcc test_7a hangs in down_read for ARM client testing start on 19 DEC 2020 for Lustre 2.13.57.12 with https://testing.whamcloud.com/test_sets/99e85d3e-9f79-46c1-b0a0-be0eeca6ce02.

      Looking at the latest failure at https://testing.whamcloud.com/test_sets/362b1cf0-6c36-4ef8-9627-817f93ab2818, the client2 (vm4) has the following in the console log

      [205077.391310] Lustre: DEBUG MARKER: multiop /mnt/lustre/f7a.sanity-pcc OSMWUc
      [205077.877415] Lustre: *** cfs_fail_loc=1412, val=0***
      [205078.434483] WARNING: CPU: 0 PID: 743007 at kernel/locking/rwsem.c:283 down_read+0x5c/0xb8
      [205078.440491] Modules linked in: loop nfsv3 nfs_acl obdecho(OE) ptlrpc_gss(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) lov(OE) fld(OE) osc(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc32_generic rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm sunrpc iw_cm ib_core vfat fat crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_balloon ip_tables ext4 mbcache jbd2 virtio_net net_failover virtio_blk failover virtio_mmio [last unloaded: libcfs]
      [205078.480885] CPU: 0 PID: 743007 Comm: multiop Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-193.28.1.el8_2.aarch64 #1
      [205078.489182] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [205078.493966] pstate: 40000005 (nZcv daif -PAN -UAO)
      [205078.497344] pc : down_read+0x5c/0xb8
      [205078.499849] lr : down_read+0x28/0xb8
      [205078.502347] sp : ffff000015c6fe10
      [205078.504701] x29: ffff000015c6fe10 x28: 0000000000000002 
      [205078.508440] x27: 0000000000000000 x26: ffff8000be9dfc60 
      [205078.512107] x25: 000000009200004f x24: 0000000000000055 
      [205078.515790] x23: ffff8000be9dfc00 x22: 000000009200004f 
      [205078.519509] x21: 0000ffff95650000 x20: ffff80006ac7cc00 
      [205078.523182] x19: ffff8000be9dfc60 x18: 0000000000000100 
      [205078.526885] x17: ffffffffffffff00 x16: 0000000000000000 
      [205078.530533] x15: 0000a1cc9b9b1c85 x14: 0000000000000000 
      [205078.534284] x13: 0000ffff95363e38 x12: 0000000000000000 
      [205078.537985] x11: 0000000000000028 x10: 0101010101010101 
      [205078.541644] x9 : 0000000000000000 x8 : 00000000000000de 
      [205078.545314] x7 : 0000000000000000 x6 : 0000000000000000 
      [205078.548970] x5 : 0000000000000000 x4 : ffff0000109e0000 
      [205078.552634] x3 : ffff0000109819b0 x2 : 0000000000000000 
      [205078.556333] x1 : ffff8000be9dfc60 x0 : ffffffffffffff00 
      [205078.560075] Call trace:
      [205078.561815]  down_read+0x5c/0xb8
      [205078.564088]  do_page_fault+0x120/0x470
      [205078.566938]  do_mem_abort+0x50/0xa8
      [205078.569436]  el0_da+0x24/0x28
      [205078.571525] ---[ end trace 417884e623adee39 ]---
      

      All the servers have the same trace

      [111782.156099] sysrq: SysRq : Changing Loglevel
      [111782.156972] sysrq: Loglevel set to 8
      [111782.158967] sysrq: SysRq : Show State
      [111782.159637]   task                        PC stack   pid father
      [111782.160642] systemd         S    0     1      0 0x00000000
      [111782.161590] Call Trace:
      [111782.162092]  __schedule+0x2a6/0x700
      [111782.162759]  schedule+0x38/0xa0
      [111782.163307]  schedule_hrtimeout_range_clock+0xf7/0x110
      [111782.164210]  ? ep_scan_ready_list.constprop.22+0x20c/0x230
      [111782.165145]  ep_poll+0x3c8/0x410
      [111782.165741]  ? wake_up_q+0x80/0x80
      [111782.166329]  do_epoll_wait+0xb0/0xd0
      [111782.166954]  __x64_sys_epoll_wait+0x1a/0x20
      [111782.167718]  do_syscall_64+0x5b/0x1a0
      [111782.168363]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [111782.169239] RIP: 0033:0x7f75ca0b3257
      [111782.169878] Code: Bad RIP value.
      [111782.170433] RSP: 002b:00007fff37502d90 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
      [111782.171683] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f75ca0b3257
      [111782.172879] RDX: 0000000000000056 RSI: 00007fff37502dd0 RDI: 0000000000000004
      [111782.174060] RBP: 00007fff37502dd0 R08: 0000000000000000 R09: 7465677261742e79
      [111782.175249] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000056
      [111782.176432] R13: 00000000ffffffff R14: 00007fff37502dd0 R15: 0000000000000001
      

      Attachments

        Issue Links

          Activity

            People

              kevin.zhao Kevin Zhao
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: