Details
-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Lustre 2.8.0
-
lola
build: https://build.hpdd.intel.com/job/lustre-b2_8/12/
-
3
-
9223372036854775807
Description
Error occurred during soak testing of build '20160309' (b2_8 RC5) (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160309 also). DNE is enabled. MDTs had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active - active HA failover configuration. (For teset set-up configuration see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration)
Sequence of events:
- mds_restart : 2016-03-11 03:41:05,597 - 2016-03-11 03:54:49,109 lola-8
- 2016-03-11 03:56 Lustre client lola-32 crashed with the following error:
<1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa0a0241f>] ll_open_cleanup+0xaf/0x600 [lustre] <4>PGD 38a867067 PUD 775372067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:06:00.0/infiniband_mad/umad0/port <4>CPU 1 <4>Modules linked in: osc(U) mgc(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) nfsi] <4> <4>Pid: 201682, comm: simul Not tainted 2.6.32-504.30.3.el6.x86_64 #1 Intel Corporation S2600GZ/S2600GZ <4>RIP: 0010:[<ffffffffa0a0241f>] [<ffffffffa0a0241f>] ll_open_cleanup+0xaf/0x600 [lustre] <4>RSP: 0018:ffff8801d3f298e8 EFLAGS: 00010286 <4>RAX: 0000000000000000 RBX: ffff8807f7271a00 RCX: ffff88102dd13ca0 <4>RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff88102f395000 <4>RBP: ffff8801d3f29928 R08: ffff88034f47e9c0 R09: 0000000000000000 <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000 <4>R13: ffff88082dd02c00 R14: ffff88081a5fc800 R15: ffff8801d3f29988 <4>FS: 00007f72f89d3700(0000) GS:ffff880045e20000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 00000003b89f2000 CR4: 00000000001407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process simul (pid: 201682, threadinfo ffff8801d3f28000, task ffff8804c6d73520) <4>Stack: <4> ffff8801d3f298f8 0000000000000000 ffff8801d3f29928 0000000000000001 <4><d> ffff88082dd02c00 ffff88081a5fc800 ffff8809648898c0 ffff8801d3f29988 <4><d> ffff8801d3f299f8 ffffffffa0a09c4a fffffffffffffffb 00ff880a531788c0 <4>Call Trace: <4> [<ffffffffa0a09c4a>] ll_prep_inode+0x20a/0xc40 [lustre] <4> [<ffffffffa07804b2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] <4> [<ffffffffa0a214f0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] <4> [<ffffffffa0a21fe1>] ll_lookup_it_finish+0x321/0x12e0 [lustre] <4> [<ffffffffa0723dd0>] ? ldlm_lock_decref_internal+0x2e0/0xa80 [ptlrpc] <4> [<ffffffffa0569925>] ? class_handle2object+0x95/0x190 [obdclass] <4> [<ffffffff81174ab3>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 <4> [<ffffffffa0a1e439>] ? ll_i2suppgid+0x19/0x30 [lustre] <4> [<ffffffffa0a1e47e>] ? ll_i2gids+0x2e/0xd0 [lustre] <4> [<ffffffffa0a214f0>] ? ll_md_blocking_ast+0x0/0x7d0 [lustre] <4> [<ffffffffa0a23226>] ll_lookup_it+0x286/0xda0 [lustre] <4> [<ffffffffa0a23dc9>] ll_lookup_nd+0x89/0x4f0 [lustre] <4> [<ffffffff8119e055>] do_lookup+0x1a5/0x230 <4> [<ffffffff8119ece4>] __link_path_walk+0x7a4/0x1000 <4> [<ffffffff8119f7fa>] path_walk+0x6a/0xe0 <4> [<ffffffff8119fa0b>] filename_lookup+0x6b/0xc0 <4> [<ffffffff8122daa6>] ? security_file_alloc+0x16/0x20 <4> [<ffffffff811a0ee4>] do_filp_open+0x104/0xd20 <4> [<ffffffff81063c63>] ? perf_event_task_sched_out+0x33/0x70 <4> [<ffffffff8129943a>] ? strncpy_from_user+0x4a/0x90 <4> [<ffffffff811ae392>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8118b157>] do_sys_open+0x67/0x130 <4> [<ffffffff8118b260>] sys_open+0x20/0x30 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4>Code: ba 38 01 00 00 31 f6 e8 30 62 b6 ff f6 05 fd 6b a9 ff 10 74 0d 80 3d f0 6b a9 ff 00 0f 88 ba 01 00 00 48 85 db 0f 84 16 02 00 00 <49> 8b 04 24 48 89 03 49 <1>RIP [<ffffffffa0a0241f>] ll_open_cleanup+0xaf/0x600 [lustre] <4> RSP <ffff8801d3f298e8> <4>CR2: 0000000000000000
Attached client (lola-32) message, console and vmcore-dmesg.txt file.
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/19256
Subject: LU-7866 ptlrpc: save reply msg for replay
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d48a27d6e1a2b63dd9d2f2195dfed2e2d0595b78