Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3772

Crash in ptlrpc_service_nrs_cleanup() when out of memory

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0, Lustre 2.5.1
    • None
    • None
    • 3
    • 9771

    Description

      LustreError: 25425:0:(service.c:156:ptlrpc_grow_req_bufs()) ost: Can't allocate request buffer
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffffa08dfc5c>] ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc]
      PGD 5f76c067 PUD 5f76d067 PMD 0 
      Oops: 0000 [#1] SMP 
      last sysfs file: /sys/devices/system/cpu/possible
      CPU 0 
      Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mgs(U) lquota(U) lfsck(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) exportfs jbd sha512_generic sha256_generic crc32c_intel ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) fuse zavl(P)(U) zunicode(P)(U) vmhgfs(U) spl(U) zlib_deflate vsock(U) dm_mirror dm_region_hash dm_log uinput ppdev parport_pc parport e1000 snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc sg vmware_balloon vmci(U) i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mod [last unloaded: libcfs]
      
      Pid: 25425, comm: mount.lustre Tainted: P           ---------------    2.6.32 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffffa08dfc5c>]  [<ffffffffa08dfc5c>] ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc]
      RSP: 0018:ffff88005f75d688  EFLAGS: 00010217
      RAX: 0000000000000000 RBX: ffff88003a3f68e0 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa097ee40
      RBP: ffff88005f75d6c8 R08: 0000000000000000 R09: 0000000000000002
      R10: ffff8800767d0000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff88005155a180 R14: ffff88003a3f6a18 R15: ffff88003a3f68e8
      FS:  00007ffc79049700(0000) GS:ffff88000c400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 000000005f77e000 CR4: 00000000000406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process mount.lustre (pid: 25425, threadinfo ffff88005f75c000, task ffff8800374a0040)
      Stack:
       ffff88003a3f6800 00ff88005f75d708 ffff88005f75d6c8 ffff88005155a180
      <d> 0000000000000000 ffff88003a3f6848 ffff88003a3f6a18 ffff88005f75d708
      <d> ffff88005f75d768 ffffffffa08a76e3 0000000000000000 ffff88003a3f6a04
      Call Trace:
       [<ffffffffa08a76e3>] ptlrpc_unregister_service+0x653/0xfc0 [ptlrpc]
       [<ffffffffa08aa791>] ? ptlrpc_grow_req_bufs+0x231/0x2a0 [ptlrpc]
       [<ffffffffa08aafc2>] ptlrpc_register_service+0x7c2/0x17b0 [ptlrpc]
       [<ffffffffa07f85f9>] ost_setup+0x199/0xc40 [ost]
       [<ffffffffa06f7804>] obd_setup+0x1b4/0x2e0 [obdclass]
       [<ffffffffa06d74bc>] ? class_new_export+0x72c/0x9a0 [obdclass]
       [<ffffffffa06f7b38>] class_setup+0x208/0x870 [obdclass]
       [<ffffffffa06ff48c>] class_process_config+0xc7c/0x1c30 [obdclass]
       [<ffffffffa07044d3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
       [<ffffffffa0704a29>] do_lcfg+0x149/0x480 [obdclass]
       [<ffffffffa0704df4>] lustre_start_simple+0x94/0x200 [obdclass]
       [<ffffffffa0739432>] server_start_targets+0x782/0x1ac0 [obdclass]
       [<ffffffffa0703f5c>] ? obd_set_info_async.clone.3+0xfc/0x3a0 [obdclass]
       [<ffffffffa06d3cd6>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0708de3>] ? lustre_start_mgc+0x493/0x1f50 [obdclass]
       [<ffffffffa073e21c>] server_fill_super+0xbbc/0x1a24 [obdclass]
       [<ffffffffa070aa78>] lustre_fill_super+0x1d8/0x530 [obdclass]
       [<ffffffffa070a8a0>] ? lustre_fill_super+0x0/0x530 [obdclass]
       [<ffffffff8117e2af>] get_sb_nodev+0x5f/0xa0
       [<ffffffffa07024b5>] lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8117df0b>] vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8117e0b2>] do_kern_mount+0x52/0x130
       [<ffffffff8119c7c2>] do_mount+0x2d2/0x8d0
       [<ffffffff8119ce50>] sys_mount+0x90/0xe0
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      Code: 0b f2 09 00 83 f8 01 0f 84 05 03 00 00 48 8b 45 c0 c6 45 cf 00 48 8d 98 e0 00 00 00 48 8b 43 08 4c 8d 7b 08 80 4b 5c 02 4c 39 f8 <4c> 8b 30 75 0d eb 27 0f 1f 44 00 00 4c 89 f0 49 89 d6 48 8b 70 
      RIP  [<ffffffffa08dfc5c>] ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc]
       RSP <ffff88005f75d688>
      CR2: 0000000000000000
      ---[ end trace f9ea4b26383b6592 ]---
      Kernel panic - not syncing: Fatal exception
      Pid: 25425, comm: mount.lustre Tainted: P      D    ---------------    2.6.32 #1
      Call Trace:
       [<ffffffff814fe08e>] ? panic+0xa0/0x168
       [<ffffffff81502224>] ? oops_end+0xe4/0x100
       [<ffffffff81043beb>] ? no_context+0xfb/0x260
       [<ffffffff8127abce>] ? number+0x2ee/0x320
       [<ffffffff81043e75>] ? __bad_area_nosemaphore+0x125/0x1e0
       [<ffffffff81043f9e>] ? bad_area+0x4e/0x60
       [<ffffffff81044750>] ? __do_page_fault+0x3d0/0x480
       [<ffffffff8127d316>] ? vsnprintf+0x2b6/0x5f0
       [<ffffffff8105b4c3>] ? perf_event_task_sched_out+0x33/0x80
       [<ffffffffa111327b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
       [<ffffffff81054c70>] ? __dequeue_entity+0x30/0x50
       [<ffffffff810097dc>] ? __switch_to+0x1ac/0x320
       [<ffffffff815041de>] ? do_page_fault+0x3e/0xa0
       [<ffffffff81501595>] ? page_fault+0x25/0x30
       [<ffffffffa08dfc5c>] ? ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc]
       [<ffffffffa08dfb95>] ? ptlrpc_service_nrs_cleanup+0x25/0x440 [ptlrpc]
       [<ffffffffa08a76e3>] ? ptlrpc_unregister_service+0x653/0xfc0 [ptlrpc]
       [<ffffffffa08aa791>] ? ptlrpc_grow_req_bufs+0x231/0x2a0 [ptlrpc]
       [<ffffffffa08aafc2>] ? ptlrpc_register_service+0x7c2/0x17b0 [ptlrpc]
       [<ffffffffa07f85f9>] ? ost_setup+0x199/0xc40 [ost]
       [<ffffffffa06f7804>] ? obd_setup+0x1b4/0x2e0 [obdclass]
       [<ffffffffa06d74bc>] ? class_new_export+0x72c/0x9a0 [obdclass]
       [<ffffffffa06f7b38>] ? class_setup+0x208/0x870 [obdclass]
       [<ffffffffa06ff48c>] ? class_process_config+0xc7c/0x1c30 [obdclass]
       [<ffffffffa07044d3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
       [<ffffffffa0704a29>] ? do_lcfg+0x149/0x480 [obdclass]
       [<ffffffffa0704df4>] ? lustre_start_simple+0x94/0x200 [obdclass]
       [<ffffffffa0739432>] ? server_start_targets+0x782/0x1ac0 [obdclass]
       [<ffffffffa0703f5c>] ? obd_set_info_async.clone.3+0xfc/0x3a0 [obdclass]
       [<ffffffffa06d3cd6>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa0708de3>] ? lustre_start_mgc+0x493/0x1f50 [obdclass]
       [<ffffffffa073e21c>] ? server_fill_super+0xbbc/0x1a24 [obdclass]
       [<ffffffffa070aa78>] ? lustre_fill_super+0x1d8/0x530 [obdclass]
       [<ffffffffa070a8a0>] ? lustre_fill_super+0x0/0x530 [obdclass]
       [<ffffffff8117e2af>] ? get_sb_nodev+0x5f/0xa0
       [<ffffffffa07024b5>] ? lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8117df0b>] ? vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8117e0b2>] ? do_kern_mount+0x52/0x130
       [<ffffffff8119c7c2>] ? do_mount+0x2d2/0x8d0
       [<ffffffff8119ce50>] ? sys_mount+0x90/0xe0
       [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      

      In ptlrpc_register_service(), if the ptlrpc_service_part_init() failed for -ENOMEM, the following ptlrpc_service_nrs_setup() will be skipped, however, ptlrpc_service_nrs_cleanup() will always be called on cleanup no matter if all nrs are intialized, which will operate on uninitialized spin lock & list.

      Attachments

        Issue Links

          Activity

            People

              keith Keith Mannthey (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: