Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12707

mds-survey test 1 crashes with a NULL pointer dereference in echo_object_free()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      mds-survey test_1 started crashing with a NULL pointer dereference in echo_object_free() on 16 August 2019.

      Looking at the kernel-crash log, we see

      [37813.223580] Lustre: DEBUG MARKER: == mds-survey test 1: Metadata survey with zero-stripe =============================================== 15:50:41 (1565970641)
      [37813.430124] Lustre: DEBUG MARKER: /usr/sbin/lctl dl
      [37814.598393] Lustre: Echo OBD driver; http://www.lustre.org/
      [37820.151690] LustreError: 14608:0:(echo_client.c:1821:echo_md_lookup()) lookup MDT0000-tests: rc = -2
      [37820.152651] LustreError: 14608:0:(echo_client.c:2055:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2
      [37820.674040] LustreError: 14674:0:(echo_client.c:1821:echo_md_lookup()) lookup MDT0000-tests2: rc = -2
      [37820.675015] LustreError: 14674:0:(echo_client.c:1821:echo_md_lookup()) Skipped 1 previous similar message
      [37820.675915] LustreError: 14674:0:(echo_client.c:2055:echo_md_destroy_internal()) Can't find child MDT0000-tests2: rc = -2
      [37820.676936] LustreError: 14674:0:(echo_client.c:2055:echo_md_destroy_internal()) Skipped 1 previous similar message
      [37822.418041] LustreError: 14873:0:(echo_client.c:1821:echo_md_lookup()) lookup MDT0002-tests: rc = -2
      [37822.419018] LustreError: 14873:0:(echo_client.c:1821:echo_md_lookup()) Skipped 1 previous similar message
      [37822.419931] LustreError: 14873:0:(echo_client.c:2055:echo_md_destroy_internal()) Can't find child MDT0002-tests: rc = -2
      [37822.421048] LustreError: 14873:0:(echo_client.c:2055:echo_md_destroy_internal()) Skipped 1 previous similar message
      [39010.965720] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [39010.966793] IP: [<ffffffffc102ad18>] echo_object_free+0x28/0x490 [obdecho]
      [39010.967550] PGD 8000000016975067 PUD 16976067 PMD 0 
      [39010.968086] Oops: 0000 [#1] SMP 
      [39010.968876] Modules linked in: obdecho(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common
      [39010.976913]  crc32c_intel 8139too serio_raw virtio_pci 8139cp virtio_ring virtio mii floppy
      [39010.977773] CPU: 0 PID: 16251 Comm: lctl Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.21.3.el7_lustre.x86_64 #1
      [39010.978869] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [39010.979407] task: ffff88d947864100 ti: ffff88d936efc000 task.ti: ffff88d936efc000
      [39010.980101] RIP: 0010:[<ffffffffc102ad18>]  [<ffffffffc102ad18>] echo_object_free+0x28/0x490 [obdecho]
      [39010.980993] RSP: 0018:ffff88d936effb18  EFLAGS: 00010246
      [39010.981484] RAX: 0000000000000000 RBX: ffff88d93aaa4cc0 RCX: dead000000000200
      [39010.982152] RDX: ffff88d936effb40 RSI: ffff88d93aaa4cc0 RDI: ffff88d917df92d8
      [39010.982815] RBP: ffff88d936effb30 R08: ffff88d93aaa4cd8 R09: 0000000000000010
      [39010.983481] R10: 0000000000000223 R11: 000000000007ffff R12: ffff88d93aaa4cd8
      [39010.984149] R13: ffff88d917df92d8 R14: ffff88d936effb40 R15: ffffa34d86641028
      [39010.984810] FS:  00007fd52a7ec740(0000) GS:ffff88d97fc00000(0000) knlGS:0000000000000000
      [39010.985566] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [39010.986113] CR2: 0000000000000040 CR3: 0000000044ab4000 CR4: 00000000000606f0
      [39010.986779] Call Trace:
      [39010.987156]  [<ffffffffc0891592>] lu_object_free.isra.32+0xf2/0x170 [obdclass]
      [39010.987857]  [<ffffffffc0895756>] lu_object_find_at+0x5f6/0xa60 [obdclass]
      [39010.988527]  [<ffffffffa9db91de>] ? filemap_fault+0x17e/0x490
      [39010.989080]  [<ffffffffc1034012>] echo_md_handler.isra.43+0x362/0x2c60 [obdecho]
      [39010.989769]  [<ffffffffc10377a8>] echo_client_iocontrol+0xe98/0x1860 [obdecho]
      [39010.990894]  [<ffffffffc085cbd0>] ? obd_ioctl_getdata+0x200/0x11b0 [obdclass]
      [39010.991583]  [<ffffffffc085f4aa>] class_handle_ioctl+0x192a/0x1e30 [obdclass]
      [39010.992268]  [<ffffffffa9dec27d>] ? handle_mm_fault+0x39d/0x9b0
      [39010.992836]  [<ffffffffa9ef8fde>] ? security_capable+0x1e/0x20
      [39010.993405]  [<ffffffffc085fa25>] obd_class_ioctl+0x75/0x170 [obdclass]
      [39010.994038]  [<ffffffffa9e569d0>] do_vfs_ioctl+0x3a0/0x5a0
      [39010.994572]  [<ffffffffa9e56c71>] SyS_ioctl+0xa1/0xc0
      [39010.995075]  [<ffffffffaa375d15>] ? system_call_after_swapgs+0xa2/0x146
      [39010.995695]  [<ffffffffaa375ddb>] system_call_fastpath+0x22/0x27
      [39010.996265]  [<ffffffffaa375d21>] ? system_call_after_swapgs+0xae/0x146
      [39010.996875] Code: 00 00 00 66 66 66 66 90 55 48 85 f6 48 89 e5 41 55 41 54 53 48 89 f3 0f 85 1d 04 00 00 f6 05 c7 22 65 ff 01 48 8b 83 98 00 00 00 <4c> 8b 60 40 0f 85 8e 01 00 00 8b 83 b8 00 00 00 85 c0 0f 85 c5 
      [39011.000093] RIP  [<ffffffffc102ad18>] echo_object_free+0x28/0x490 [obdecho]
      [39011.000758]  RSP <ffff88d936effb18>
      [39011.001102] CR2: 0000000000000040
      

      This test is crashing for ldiskfs, ZFS, DNE and non-DNE environments.

      Here are links to logs for several crashes:
      https://testing.whamcloud.com/test_sets/b53bc6f2-c07c-11e9-9fc9-52540065bddc
      https://testing.whamcloud.com/test_sets/7512ef7a-bfe0-11e9-90ad-52540065bddc
      https://testing.whamcloud.com/test_sets/b35271e6-c0e1-11e9-a2b6-52540065bddc
      https://testing.whamcloud.com/test_sets/25c95ba8-c15b-11e9-a2b6-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              shadow Alexey Lyashkov
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: