Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8316

BUG: unable to handle kernel NULL pointer dereference at tgt_free_reply_data+0x97/0x330

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      System crashed while testing under memory pressure:

      [432534.561808] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
      [432534.563083] Lustre: Skipped 3 previous similar messages
      [432534.593088] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [432534.594035] IP: [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc]
      [432534.594035] PGD 3c7cb067 PUD 3836e067 PMD 0 
      [432534.594035] Oops: 0002 [#1] SMP 
      [432534.594035] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) loop mbcache jbd2 sha512_generic netconsole sg dm_mirror dm_region_hash dm_log crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd serio_raw virtio_balloon virtio_console dm_mod intel_agp i2c_piix4 intel_gtt nfsd auth_rpcgss nfs_acl lockd sunrpc ip_tables xfs ata_generic libcrc32c virtio_net cirrus syscopyarea sysfillrect sysimgblt virtio_scsi drm_kms_helper virtio_blk ttm drm virtio_pci agpgart ata_piix virtio_ring libata virtio i2c_core [last unloaded: libcfs]
      [432534.594035] CPU: 1 PID: 5669 Comm: mdt01_003 Tainted: GF          O--------------   3.10.0-229.7.2.x86_64 #7
      [432534.594035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
      [432534.594035] task: ffff880025081580 ti: ffff88002da2c000 task.ti: ffff88002da2c000
      [432534.594035] RIP: 0010:[<ffffffffa07d31f7>]  [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc]
      [432534.594035] RSP: 0018:ffff88002da2fb90  EFLAGS: 00010293
      [432534.594035] RAX: 0000000000000001 RBX: ffff8800133fb8d8 RCX: 0000000000000000
      [432534.594035] RDX: 0000000000000000 RSI: ffff88001289f300 RDI: ffff8800133fb8d8
      [432534.594035] RBP: ffff88002da2fbd8 R08: ffff8800133fb8d8 R09: 0000000000000000
      [432534.594035] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [432534.594035] R13: ffff880000e65718 R14: ffff88001289f3f8 R15: ffff88001289f300
      [432534.594035] FS:  0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
      [432534.594035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [432534.594035] CR2: 0000000000bfc001 CR3: 000000003c289000 CR4: 00000000001406e0
      [432534.594035] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [432534.594035] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [432534.594035] Stack:
      [432534.594035]  0000000000000000 ffff88001289f3f8 ffffffff810c486d ffff88002da2fc30
      [432534.594035]  ffff8800133fb8d8 ffff88001289f300 ffff88001289ef60 ffff88001289f3f8
      [432534.594035]  ffff880000e65718 ffff88002da2fc30 ffffffffa07d34ee 0000000000000246
      [432534.594035] Call Trace:
      [432534.594035]  [<ffffffff810c486d>] ? trace_hardirqs_on+0xd/0x10
      [432534.594035]  [<ffffffffa07d34ee>] tgt_release_reply_data+0x5e/0x180 [ptlrpc]
      [432534.594035]  [<ffffffffa07dc128>] tgt_handle_received_xid+0x98/0xe0 [ptlrpc]
      [432534.594035]  [<ffffffffa07e1d38>] tgt_request_handle+0xb88/0x1330 [ptlrpc]
      [432534.594035]  [<ffffffffa078d591>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc]
      [432534.594035]  [<ffffffffa078be15>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [432534.594035]  [<ffffffffa0791790>] ptlrpc_main+0xab0/0x1e10 [ptlrpc]
      [432534.594035]  [<ffffffff810c486d>] ? trace_hardirqs_on+0xd/0x10
      [432534.594035]  [<ffffffff8109b842>] ? finish_task_switch+0x42/0x150
      [432534.594035]  [<ffffffffa0790ce0>] ? ptlrpc_register_service+0xe50/0xe50 [ptlrpc]
      [432534.594035]  [<ffffffff8109008a>] kthread+0xea/0xf0
      [432534.594035]  [<ffffffff8108ffa0>] ? kthread_create_on_node+0x140/0x140
      [432534.594035]  [<ffffffff81571258>] ret_from_fork+0x58/0x90
      [432534.594035]  [<ffffffff8108ffa0>] ? kthread_create_on_node+0x140/0x140
      [432534.594035] Code: c1 fa 1f c1 ea 0c c1 f9 14 41 8d 04 14 25 ff ff 0f 00 29 d0 83 f9 0f 0f 8f 72 02 00 00 49 8b 95 28 04 00 00 48 63 c9 48 8b 14 ca <f0> 0f b3 02 19 c0 85 c0 0f 84 8b 01 00 00 48 85 db 0f 84 1b 02 
      [432534.594035] RIP  [<ffffffffa07d31f7>] tgt_free_reply_data+0x97/0x330 [ptlrpc]
      [432534.594035]  RSP <ffff88002da2fb90>
      [432534.594035] CR2: 0000000000000000
      [432534.712915] ---[ end trace 26ac593d02d07dd0 ]---
      [432534.714120] Kernel panic - not syncing: Fatal exception
      
      

      This issue is caused by error return value in :

              /* reply_data is supported by MDT targets only for now */
              if (strncmp(obd->obd_type->typ_name, LUSTRE_MDT_NAME, 3) != 0)
                      RETURN(0);
      
              OBD_ALLOC(lut->lut_reply_bitmap,
                        LUT_REPLY_SLOTS_MAX_CHUNKS * sizeof(unsigned long *));
              if (lut->lut_reply_bitmap == NULL)
                      GOTO(out, rc);
      -----------------------------^^^
      
              memset(&attr, 0, sizeof(attr));
              attr.la_valid = LA_MODE;
      
      

      I'll push a patch for it.

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              ys Yang Sheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: