Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14964

recovery-small: GPF in llog_exist after tests finished

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      there's a relatively new crash being observed in maloo testing on rhel8 testing for past several days in cleanup of recovery-small in review-dne-part-5

      [  753.814277] Lustre: DEBUG MARKER: == recovery-small test complete, duration 6075 sec ======= 10:32:41 (1629887561)
      [  783.518588] general protection fault: 0000 [#1] SMP PTI
      [  783.519513] CPU: 0 PID: 3045 Comm: mdt_rdpg00_000 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-240.22.1.el8_lustre.x86_64 #1
      [  783.521414] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  783.522494] RIP: 0010:llog_exist+0xd9/0x180 [obdclass]
      [  783.523265] Code: c7 05 7f 0f 0c 00 01 00 00 00 e8 a2 53 ee ff 5b c3 48 85 ff 0f 84 aa 00 00 00 48 8b 87 08 01 00 00 48 85 c0 0f 84 9a 00 00 00 <48> 8b 40 50 48 85 c0 74 53 48 89 df e8 b6 b3 77 fb f6 05 2b f6 f0
      [  783.526007] RSP: 0018:ffffae2500ea3ae8 EFLAGS: 00010206
      [  783.526776] RAX: 5a5a5a5a5a5a5a5a RBX: ffff9f95b14bd000 RCX: 0000000000000000
      [  783.527839] RDX: 0000000000000ba5 RSI: 0000000000000000 RDI: ffff9f959e040900
      [  783.528890] RBP: ffff9f959091f0d0 R08: 000000d823bc83f1 R09: 0000000000000bc0
      [  783.530782] R10: ffffae2500ea3ae8 R11: ffff9f95a7d08b6c R12: ffff9f95b04c2080
      [  783.532083] R13: ffff9f95b10d7ec0 R14: ffff9f959091f0d0 R15: ffff9f959132c000
      [  783.533175] FS:  0000000000000000(0000) GS:ffff9f95bfc00000(0000) knlGS:0000000000000000
      [  783.534423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  783.535314] CR2: 00007fd080ad6000 CR3: 000000008960a005 CR4: 00000000003606f0
      [  783.536412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  783.537505] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  783.538587] Call Trace:
      [  783.539040]  llog_cat_prep_log+0x4f/0x3c0 [obdclass]
      [  783.539833]  llog_cat_declare_add_rec+0x56/0x220 [obdclass]
      [  783.540700]  llog_declare_add+0x187/0x1d0 [obdclass]
      [  783.541925]  top_trans_start+0x212/0x940 [ptlrpc]
      [  783.542820]  mdd_attr_set+0x657/0xfe0 [mdd]
      [  783.543538]  ? panic_notifier+0x20/0x20 [libcfs]
      [  783.544400]  mdt_mfd_close+0x56c/0x8c0 [mdt]
      [  783.545087]  mdt_close_internal+0xc4/0x240 [mdt]
      [  783.545820]  mdt_close+0x47d/0x8b0 [mdt]
      [  783.546470]  tgt_request_handle+0xc90/0x1940 [ptlrpc]
      [  783.547300]  ptlrpc_server_handle_request+0x323/0xbc0 [ptlrpc]
      [  783.548246]  ptlrpc_main+0xba2/0x1490 [ptlrpc]
      [  783.548964]  ? __schedule+0x2cc/0x700
      [  783.549562]  ? ptlrpc_wait_event+0x500/0x500 [ptlrpc]
      [  783.550380]  kthread+0x112/0x130
      [  783.550894]  ? kthread_flush_work_fn+0x10/0x10
      [  783.551577]  ret_from_fork+0x35/0x40
      [  783.552136] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) dm_flakey ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sunrpc ib_core intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic ata_piix 8139too libata 8139cp crc32c_intel serio_raw virtio_blk mii 

      The 3 observed failures are:

      https://testing.whamcloud.com/test_sessions/d7bfd1a1-7ebf-40df-b80b-7e104a524f67

      https://testing.whamcloud.com/test_sets/f93ef5ed-4963-4851-ac10-3e0477e9543c

      and

      https://testing.whamcloud.com/test_sets/1976e1d2-dc91-45b3-b29f-691f921ddeff

      the RAX is suspicious value so probably accessing some free memory?

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: