Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0
Affects Version/s: None
Labels:
None
Environment:
Hyperion iwc126 client

Severity:
3
Rank (Obsolete):
9099

During an ior fpp run a client paniced.
This was on:
Lustre: Lustre: Build Version: 2.4.52--PRISTINE-2.6.32-358.11.1.el6.x86_64

In the crashdump the following was seen:

LNetError: 23379:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.52/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208)
LNetError: 23379:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 199035354 total bytes allocated by lnet
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
IP: [<ffffffffa0990d5a>] ptlrpc_register_bulk+0x46a/0x9d0 [ptlrpc]
PGD 1a4e1b067 PUD 12fa01067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
CPU 7 
Modules linked in: lmv(U) fld(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sha512_generic sha256_generic crc32c_intel ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm dcdbas i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support i7core_edac edac_core ioatdma dca shpchp nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core e1000e be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]

Pid: 23379, comm: ptlrpcd_1 Tainted: G        W  ---------------    2.6.32-358.11.1.el6.x86_64 #1 Dell        XS23-TY     /XS23-TY     
RIP: 0010:[<ffffffffa0990d5a>]  [<ffffffffa0990d5a>] ptlrpc_register_bulk+0x46a/0x9d0 [ptlrpc]
RSP: 0018:ffff8803017dbb10  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff880084340000 RCX: 00051e3eebeda3b4
RDX: 0000000000000000 RSI: ffffffffa09fc2c0 RDI: ffffffffa0a3e520
RBP: ffff8803017dbbd0 R08: 0000000000000000 R09: 00000000fffffff4
R10: 0000000000000002 R11: 0000000000000000 R12: 00000000fffffff4
R13: 00051e3eebeda3b4 R14: 0000000000000000 R15: 00051e3eebeda3b4
FS:  0000000000000000(0000) GS:ffff8801c58c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000b8 CR3: 00000001bad21000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ptlrpcd_1 (pid: 23379, threadinfo ffff8803017da000, task ffff8803017d9500)
Stack:
 ffff8800843400a0 0000000100000100 00000102000000d2 ffff880084340058
<d> 0000000000000023 00000000a07625a0 ffff8801833b2400 00000001e8c219c4
<d> ffff8800843400a0 0000000100000100 00000102000000d2 ffff880084340058
Call Trace:
 [<ffffffffa0991fa2>] ptl_send_rpc+0x232/0xc40 [ptlrpc]
 [<ffffffff81281484>] ? snprintf+0x34/0x40
 [<ffffffffa0718fe1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa09876bb>] ptlrpc_send_new_req+0x45b/0x7a0 [ptlrpc]
 [<ffffffffa098b3a8>] ptlrpc_check_set+0x878/0x1b20 [ptlrpc]
 [<ffffffffa09b76cb>] ptlrpcd_check+0x53b/0x560 [ptlrpc]
 [<ffffffff8109705c>] ? remove_wait_queue+0x3c/0x50
 [<ffffffffa09b7b50>] ptlrpcd+0x190/0x380 [ptlrpc]
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffffa09b79c0>] ? ptlrpcd+0x0/0x380 [ptlrpc]
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: f0 48 c7 05 05 d8 0a 00 50 e5 a3 a0 c7 05 f3 d7 0a 00 00 00 02 00 4c 89 e9 48 8b 43 10 48 c7 c6 c0 c2 9f a0 48 c7 c7 20 e5 a3 a0 <48> 8b 90 b8 00 00 00 31 c0 48 83 c2 0c e8 34 82 d8 ff 48 8b 7d 
RIP  [<ffffffffa0990d5a>] ptlrpc_register_bulk+0x46a/0x9d0 [ptlrpc]
 RSP <ffff8803017dbb10>
CR2: 00000000000000b8

There we some non-fatal memory allocation errors in the log before those messages and I will attached the full console log.

This was 1 of 100 clients.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

hyper-console.out
78 kB
15/Jul/13 9:40 PM

is related to

LU-3598 Failure on test suite sanity-benchmark test_iozone: page allocation failure

Resolved

Assignee:: Amir Shehata (Inactive)

Reporter:: Keith Mannthey (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 15/Jul/13 9:33 PM

Updated:: 29/Apr/14 10:11 PM

Resolved:: 29/Jul/13 5:13 PM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates