Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.6.0
-
None
-
Lustre master v2_5_50_0-3-g6229525
Single node test setup, 1 MDT, 3 OST, client
RHEL6.3 2.6.32-279.5.1
-
3
-
11519
Description
Was running a memory-intensive workload on the same node and then mounted MDS. It failed an allocation during setup and then oopsed in the subsequent cleanup.
LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: mount.lustre: page allocation failure. order:1, mode:0x40 Pid: 6512, comm: mount.lustre Tainted: P D W --------------- 2.6.32-279.5.1.el6_lustre.g7f15218.x86_64 #1 Call Trace: [<ffffffff811276cf>] ? __alloc_pages_nodemask+0x77f/0x940 [<ffffffff81161e92>] ? kmem_getpages+0x62/0x170 [<ffffffff81162aaa>] ? fallback_alloc+0x1ba/0x270 [<ffffffff811624ff>] ? cache_grow+0x2cf/0x320 [<ffffffff81162829>] ? ____cache_alloc_node+0x99/0x160 [<ffffffffa10116c1>] ? cfs_cpt_malloc+0x31/0x60 [libcfs] [<ffffffff811636ef>] ? kmem_cache_alloc_node_notrace+0x6f/0x130 [<ffffffff8116392b>] ? __kmalloc_node+0x7b/0x100 [<ffffffffa10116c1>] ? cfs_cpt_malloc+0x31/0x60 [libcfs] [<ffffffffa0a54f88>] ? ptlrpc_alloc_rqbd+0x1e8/0x6d0 [ptlrpc] [<ffffffffa0a55555>] ? ptlrpc_grow_req_bufs+0xe5/0x2a0 [ptlrpc] [<ffffffffa0a55d25>] ? ptlrpc_register_service+0x615/0x17c0 [ptlrpc] [<ffffffffa0cee1a5>] ? mgs_init0+0x1285/0x1760 [mgs] [<ffffffffa0a9bb90>] ? tgt_request_handle+0x0/0xe40 [ptlrpc] [<ffffffffa0a6b610>] ? target_print_req+0x0/0xa0 [ptlrpc] [<ffffffffa0ce74e9>] ? mgs_type_start+0x19/0x20 [mgs] [<ffffffffa0cee78f>] ? mgs_device_alloc+0x10f/0x260 [mgs] [<ffffffffa0901a2f>] ? obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa0901d08>] ? class_setup+0x208/0x870 [obdclass] [<ffffffffa090954c>] ? class_process_config+0xc6c/0x1ad0 [obdclass] [<ffffffffa090e3d3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa090e929>] ? do_lcfg+0x149/0x480 [obdclass] [<ffffffffa090ecf4>] ? lustre_start_simple+0x94/0x200 [obdclass] [<ffffffffa0948479>] ? server_fill_super+0x1159/0x19ea [obdclass] [<ffffffffa09148f8>] ? lustre_fill_super+0x1d8/0x530 [obdclass] [<ffffffffa0914720>] ? lustre_fill_super+0x0/0x530 [obdclass] [<ffffffff8117e16f>] ? get_sb_nodev+0x5f/0xa0 [<ffffffffa090c425>] ? lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8117ddcb>] ? vfs_kern_mount+0x7b/0x1b0 [<ffffffff8117df72>] ? do_kern_mount+0x52/0x130 [<ffffffff8119c652>] ? do_mount+0x2d2/0x8d0 [<ffffffff8119cce0>] ? sys_mount+0x90/0xe0 LustreError: 6512:0:(service.c:156:ptlrpc_grow_req_bufs()) mgs: Can't allocate request buffer BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0a8ac5c>] ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc] PGD 1b078067 PUD 20d38067 PMD 0 Pid: 6512, comm: mount.lustre Tainted: P D W --------------- 2.6.32-279.5.1.el6_lustre.g7f15218.x86_64 #1 Dell Inc. Dell DXP051 /0FJ030 RIP: 0010:[<ffffffffa0a8ac5c>] [<ffffffffa0a8ac5c>] ptlrpc_service_nrs_cleanup+0xec/0x440 [ptlrpc] RSP: 0018:ffff88001fc536c8 EFLAGS: 00010217 RAX: 0000000000000000 RBX: ffff8800709834e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0b29640 RBP: ffff88001fc53708 R08: 0000000000000002 R09: 0000000000000000 R10: ffff8800244cc000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800adc70cc0 R14: ffff880070983618 R15: ffff8800709834e8 FS: 00007fb3066b0700(0000) GS:ffff880002280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000053c91000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process mount.lustre (pid: 6512, threadinfo ffff88001fc52000, task ffff880017014080) Stack: ffff880070983400 00ff880017014080 ffff88001fc53708 ffff8800adc70cc0 <d> ffff880070983400 ffff880070983448 ffff880070983618 ffff880017014080 <d> ffff88001fc537b8 ffffffffa0a52583 ffff88001fc53728 ffff8800adc70cc0 Call Trace: [<ffffffffa0a52583>] ptlrpc_unregister_service+0x673/0xff0 [ptlrpc] [<ffffffffa0a556a1>] ? ptlrpc_grow_req_bufs+0x231/0x2a0 [ptlrpc] [<ffffffffa0a55ee2>] ptlrpc_register_service+0x7d2/0x17c0 [ptlrpc] [<ffffffffa0cee1a5>] mgs_init0+0x1285/0x1760 [mgs] [rest of the stack is the same as above]
This resolves to:
(gdb) list *(ptlrpc_service_nrs_cleanup+0xec) 0x90c8c is in ptlrpc_service_nrs_cleanup_locked (/usr/src/lustre-head/lustre/ptlrpc/nrs.c:1030). 1025 1026 again: 1027 nrs = nrs_svcpt2nrs(svcpt, hp); 1028 nrs->nrs_stopping = 1; 1029 1030 cfs_list_for_each_entry_safe(policy, tmp, &nrs->nrs_policy_list, 1031 pol_list) { 1032 rc = nrs_policy_unregister(nrs, policy->pol_desc->pd_name); 1033 LASSERT(rc == 0); 1034 }
It looks like nrs_policy_list isn't initialized by the time this cleanup is called. Need to check something to see if this struct even needs to be cleaned up.