Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13207

sanity-pfl test 16b crashes in “Oops: Kernel access of bad area”

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4
    • PPC Clients
    • 3
    • 9223372036854775807

    Description

      sanity-pfl test_16b the client crashes. Looking at a recent failure that skips test 16a, https://testing.whamcloud.com/test_sets/9833b176-47d8-11ea-b58e-52540065bddc, we see the following in the kernel-crash log

      [ 1250.515939] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-pfl test 16b: Verify setstripe\/getstripe with YAML config file + overstriping ============== 04:44:17 \(1580877857\)
      [ 1250.733732] Lustre: DEBUG MARKER: == sanity-pfl test 16b: Verify setstripe/getstripe with YAML config file + overstriping ============== 04:44:17 (1580877857)
      [ 1251.230177] LustreError: 1992:0:(pack_generic.c:2447:lustre_swab_lov_comp_md_v1()) Invalid magic 0x1
      [ 1251.232551] Unable to handle kernel paging request for data at address 0xe8f506000000c0
      [ 1251.232620] Faulting instruction address: 0xc0000000003675e4
      [ 1251.232676] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 1251.232711] SMP NR_CPUS=2048 NUMA pSeries
      [ 1251.232757] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc virtio_balloon ip_tables ext4 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
      [ 1251.233398] CPU: 0 PID: 10113 Comm: socknal_sd00_00 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.9.1.el7.ppc64 #1
      [ 1251.233479] task: c0000000b5b93320 ti: c0000000b5be8000 task.ti: c0000000b5be8000
      [ 1251.233532] NIP: c0000000003675e4 LR: c000000000367564 CTR: c0000000009ee780
      [ 1251.233586] REGS: c0000000b5beb160 TRAP: 0300   Tainted: G           OE  ------------    (3.10.0-1062.9.1.el7.ppc64)
      [ 1251.233658] MSR: 8000000100009032 <SF,EE,ME,IR,DR,RI>  CR: 24424122  XER: 20000000
      [ 1251.233778] CFAR: 0000000000002494 DAR: 00e8f506000000c0 DSISR: 40000000 SOFTE: 1 
      GPR00: c000000000367564 c0000000b5beb3e0 c000000001776200 0000000000000000 
      GPR04: 0000000000010250 ffffffffffffffff c0000000009d21d8 c000000003121da0 
      GPR08: 000000000003a1ef 0000000000000000 0000000002170000 d000000002d1e0f0 
      GPR12: 0000000024424122 c000000007b80000 c0000000b6808210 0000000000000001 
      GPR16: c0000000b679a200 c0000000bbfc2f00 0000000000000000 0000000000000001 
      GPR20: 000000000000fe88 0000000000000000 c0000000b679a260 00000000000005a8 
      GPR24: 0000000000000800 c0000000be01f400 c0000000009d21d8 ffffffffffffffff 
      GPR28: 0000000000000800 0000000000010250 00e8f506000000c0 c0000000be01f400 
      [ 1251.234543] NIP [c0000000003675e4] .__kmalloc_node_track_caller+0x234/0x470
      [ 1251.234589] LR [c000000000367564] .__kmalloc_node_track_caller+0x1b4/0x470
      [ 1251.234634] Call Trace:
      [ 1251.234653] [c0000000b5beb3e0] [c000000000367564] .__kmalloc_node_track_caller+0x1b4/0x470 (unreliable)
      [ 1251.234736] [c0000000b5beb490] [c000000000920b44] .__alloc_skb+0xb4/0x260
      [ 1251.234792] [c0000000b5beb540] [c0000000009d21d8] .sk_stream_alloc_skb+0x78/0x230
      [ 1251.234856] [c0000000b5beb5d0] [c0000000009d321c] .tcp_sendmsg+0x6cc/0xe50
      [ 1251.234911] [c0000000b5beb720] [c000000000a18abc] .inet_sendmsg+0x9c/0x170
      [ 1251.234966] [c0000000b5beb7b0] [c00000000090c700] .sock_sendmsg+0xf0/0x140
      [ 1251.235021] [c0000000b5beb970] [c00000000090c7b4] .kernel_sendmsg+0x64/0x90
      [ 1251.235088] [c0000000b5beba10] [d000000002d189b4] .ksocknal_lib_send_iov+0x114/0x180 [ksocklnd]
      [ 1251.235163] [c0000000b5bebae0] [d000000002d0d134] .ksocknal_process_transmit+0x3c4/0x1260 [ksocklnd]
      [ 1251.235238] [c0000000b5bebbc0] [d000000002d14378] .ksocknal_scheduler+0x408/0x14f0 [ksocklnd]
      [ 1251.235302] [c0000000b5bebd30] [c00000000013edb0] .kthread+0xf0/0x100
      [ 1251.235358] [c0000000b5bebe30] [c00000000000a628] .ret_from_kernel_thread+0x58/0x70
      [ 1251.235420] Instruction dump:
      [ 1251.235448] e9070008 7fc9502a e9270010 2fbe0000 2f290000 41defeb4 409afea4 4bfffeac 
      [ 1251.235541] 60000000 60000000 60420000 e93f0022 <7f1e482a> 39200000 88cd02a2 992d02a2 
      [ 1251.235649] ---[ end trace 42e7021bc48adc89 ]---
      [ 1251.237326] 
      [ 1251.237363] Sending IPI to other CPUs
      [ 1251.238398] IPI complete
      

      sanity-pfl test 16b crashes, hangs or fails for PPC client testing 100% of the time and started crashing on 30 JULY 2019 with Lustre 2.12.56.72 at https://testing.whamcloud.com/test_sets/11fc75f8-b37b-11e9-9f36-52540065bddc.

      Logs for other crashes are at
      https://testing.whamcloud.com/test_sets/07588ab8-2592-11ea-80b4-52540065bddc
      https://testing.whamcloud.com/test_sets/d840dd04-1dba-11ea-80b4-52540065bddc
      https://testing.whamcloud.com/test_sets/4c052244-fa17-11e9-a0ba-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: