Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0
-
None
-
SLES11 SP3 2.6 clients, CentOS 2.6 servers with striped directories.
-
3
-
14674
Description
During testing of 2.6 clients and servers (with striped directories), we lost a client to a null pointer dereference here:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa0801500>] ll_cl_find+0x50/0x80 [lustre]
PGD 4ef80f067 PUD 91ff68067 PMD 0
Oops: 0000 1 SMP
CPU 5
Modules linked in: xpmem dvspn(P) dvsof(P) dvsutil(P) dvsipc(P) dvsipc_lnet(P) d
vsproc(P) bpmcdmod nic_compat cmsr osc mgc lustre lov mdc fid lmv fld kgnilnd pt
lrpc obdclass lnet sha1_generic libcfs ib_core pcie_link_bw_monitor kdreg gpcd_a
ri ipogif_ari kgni_ari hwerr(P) rca hss_os(P) heartbeat simplex(P) ghal_ari cray
trace
Pid: 13639, comm: memfill3 Tainted: P 3.0.101-0.15.1_1.0502.8131-cray
_ari_c #1 Cray Inc. Cascade/Cascade
RIP: 0010:[<ffffffffa0801500>] [<ffffffffa0801500>] ll_cl_find+0x50/0x80 [lustr
e]
RSP: 0018:ffff8806e5c51ad8 EFLAGS: 00010217
RAX: ffff8804ef80c7f0 RBX: ffff880c611fdec0 RCX: ffff880c611fdf88
RDX: 0000000000000000 RSI: ffff880c611fbcc0 RDI: ffff880c611fdf84
RBP: ffff8806e5c51ae8 R08: 0000000000000000 R09: ffff8806e5c51c00
R10: 0000000000000320 R11: ffff880fb387bf68 R12: ffff880c611fdf84
R13: 0000000005d6df80 R14: 0000000000005d6d R15: 0000000000000000
FS: 00000000402569e0(0063) GS:ffff88107faa0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 00000004ef80e000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process memfill3 (pid: 13639, threadinfo ffff8806e5c50000, task ffff8804ef80c7f0
)
Stack:
ffff880c611fbcc0 ffff880fbfd149c0 ffff8806e5c51b78 ffffffffa08204d3
0000000000000000 ffff8806e5c51c00 0000008000000000 0000000000000000
0000000000000000 ffff880c7759cf48 00000f80e5c51ba8 ffffffffa0816c27
Call Trace:
[<ffffffffa08204d3>] ll_write_begin+0x83/0x760 [lustre]
[<ffffffff810fdede>] generic_file_buffered_write+0x10e/0x240
[<ffffffff81100cd9>] __generic_file_aio_write+0x259/0x450
[<ffffffff81100f29>] generic_file_aio_write+0x59/0xc0
[<ffffffffa08358ec>] vvp_io_write_start+0xfc/0x3e0 [lustre]
[<ffffffffa0377582>] cl_io_start+0x72/0x140 [obdclass]
[<ffffffffa037b134>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa07d0c02>] ll_file_io_generic+0x5a2/0x8d0 [lustre]
[<ffffffffa07d115f>] ll_file_aio_write+0x22f/0x290 [lustre]
[<ffffffffa07d1b25>] ll_file_write+0x1e5/0x270 [lustre]
[<ffffffff811586bb>] vfs_write+0xcb/0x180
[<ffffffff81158865>] sys_write+0x55/0x90
[<ffffffff81427bab>] system_call_fastpath+0x16/0x1b
[<0000000020035811>] 0x20035810
Code: c8 00 00 00 48 8d 8b c8 00 00 00 48 39 ca 74 2d 65 48 8b 04 25 c0 b5 00 00
48 39 42 10 75 12 eb 2a 66 2e 0f 1f 84 00 00 00 00 00
8> 39 42 10 74 1a 48 8b 12 48 39 ca 0f 1f 40 00 75 ee 31 c0 f0
RIP [<ffffffffa0801500>] ll_cl_find+0x50/0x80 [lustre]
RSP <ffff8806e5c51ad8>
CR2: 0000000000000010
--[ end trace 52e397e4bde9254f ]--
This is an untouched copy of master from last week, no other patches. The most recent commit:
Subject: LU-3188 osc: shorten IO calling path
Description:
By using osc_io_unplug_aync() for osc_queue_sync_pages() to shorten
the IO calling path, to reduce the chance of stack overflow.
This is revive of git commit 83ae17df2bdce837e62473aec27c03d67312c8ea.
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Change-Id: I2ac32866f7adbc4701370704612c849a18a5d1ac
Reviewed-on: http://review.whamcloud.com/10292
Attachments
Issue Links
- is related to
-
LU-5108 osc: Performance tune for LRU
- Resolved