Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
3
-
11940
Description
This issue was created by maloo for Oleg Drokin <green@whamcloud.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/d275a9d4-5eb4-11e3-ae30-52540035b04c.
The sub-test lustre-initialization_1 failed with the following error:
Test system failed to start single suite, so abandoning all hope and giving up
Info required for matching: lustre-initialization-1 lustre-initialization_1
Testing with DEBUG_PAGEALLOC enabled frequently OST fails to starup hitting an oops:
11:23:12:[ 289.592410] BUG: unable to handle kernel paging request at ffff880013a91e30 11:23:12:[ 289.593112] IP: [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd] 11:23:12:[ 289.593691] PGD 1a26063 PUD 1a2a063 PMD 19d067 PTE 13a91060 11:23:12:[ 289.594169] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC 11:23:12:[ 289.594557] last sysfs file: /sys/devices/system/cpu/possible 11:23:12:[ 289.595026] CPU 0 11:23:12:[ 289.595188] Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 11:23:12:[ 289.596009] 11:23:12:[ 289.596009] Pid: 5138, comm: ll_cfg_requeue Not tainted 2.6.32-358.23.2.el6_lustre.g3ddc521.x86_64 #1 Red Hat KVM 11:23:12:[ 289.596009] RIP: 0010:[<ffffffffa09ef395>] [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd] 11:23:12:[ 289.596009] RSP: 0018:ffff88002e315880 EFLAGS: 00010286 11:23:12:[ 289.596009] RAX: 0000000000000000 RBX: ffff880013a79ef0 RCX: 0000000000000009 11:23:12:[ 289.596009] RDX: ffff880002200000 RSI: 0000000000000292 RDI: ffffffffa09feb38 11:23:12:[ 289.596009] RBP: ffff88002e3158f0 R08: 0000000000000000 R09: 0000000000000000 11:23:12:[ 289.596009] R10: ffff8800148179e0 R11: 0000000000000000 R12: ffff880013a91df0 11:23:12:[ 289.596009] R13: 0000000000000001 R14: ffff880015c6e740 R15: 0000000000000148 11:23:12:[ 289.596009] FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 11:23:12:[ 289.596009] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 11:23:12:[ 289.596009] CR2: ffff880013a91e30 CR3: 000000001541c000 CR4: 00000000000006f0 11:23:12:[ 289.596009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 11:23:12:[ 289.596009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 11:23:12:[ 289.596009] Process ll_cfg_requeue (pid: 5138, threadinfo ffff88002e314000, task ffff88002e3121c0) 11:23:12:[ 289.596009] Stack: 11:23:12:[ 289.596009] ffff8800350d9001 0000000113a91df0 0000000000000000 ffff8800346a4df0 11:23:12:[ 289.596009] <d> 0000000000000000 000000e800000000 000200000a0a11cc 0000000000003039 11:23:12:[ 289.596009] <d> ffff88002e315910 ffff8800346a4df0 ffff880013a91df0 ffff880013a91df0 11:23:12:[ 289.596009] Call Trace: 11:23:12:[ 289.596009] [<ffffffffa0532dfb>] lnet_ni_send+0x4b/0xf0 [lnet] 11:23:12:[ 289.596009] [<ffffffffa0537005>] lnet_send+0x655/0xb80 [lnet] 11:23:12:[ 289.596009] [<ffffffffa053806a>] LNetPut+0x31a/0x860 [lnet] 11:23:12:[ 289.596009] [<ffffffffa0808dc0>] ptl_send_buf+0x1e0/0x550 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffff8103b82c>] ? kvm_clock_read+0x1c/0x20 11:23:12:[ 289.596009] [<ffffffffa080c2bd>] ptl_send_rpc+0x4dd/0xcc0 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa0800994>] ptlrpc_send_new_req+0x454/0x7c0 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa08061fe>] ptlrpc_set_wait+0x5ce/0x830 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa08101c6>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa08064e7>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa07e1ef5>] ldlm_cli_enqueue+0x365/0x790 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa07e6bd0>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa0db0080>] ? mgc_blocking_ast+0x0/0x810 [mgc] 11:23:12:[ 289.596009] [<ffffffffa0db5633>] mgc_process_log+0x283/0x17f0 [mgc] 11:23:12:[ 289.596009] [<ffffffff81080722>] ? del_timer_sync+0x22/0x30 11:23:12:[ 289.596009] [<ffffffffa0db0080>] ? mgc_blocking_ast+0x0/0x810 [mgc] 11:23:12:[ 289.596009] [<ffffffffa07e6bd0>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] 11:23:12:[ 289.596009] [<ffffffffa0487951>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 11:23:12:[ 289.596009] [<ffffffffa0db775a>] mgc_requeue_thread+0x31a/0x7a0 [mgc] 11:23:12:[ 289.596009] [<ffffffff8105bca0>] ? default_wake_function+0x0/0x20 11:23:12:[ 289.596009] [<ffffffffa0db7440>] ? mgc_requeue_thread+0x0/0x7a0 [mgc] 11:23:12:[ 289.596009] [<ffffffff81095696>] kthread+0x96/0xa0 11:23:12:[ 289.596009] [<ffffffff8100c10a>] child_rip+0xa/0x20 11:23:12:[ 289.596009] [<ffffffff81095600>] ? kthread+0x0/0xa0 11:23:12:[ 289.596009] [<ffffffff8100c100>] ? child_rip+0x0/0x20 11:23:12:[ 289.596009] Code: 00 c7 43 60 c1 00 00 00 48 89 de 48 c7 43 70 00 00 00 00 48 c7 43 68 00 00 00 00 48 8b 55 c0 8b 4d c8 48 8b 7d a8 e8 7b fa ff ff <41> f6 44 24 40 08 74 1b 8b 4d b4 85 c9 0f 84 c8 00 00 00 65 48 11:23:12:[ 289.596009] RIP [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd] 11:23:12:[ 289.596009] RSP <ffff88002e315880> 11:23:12:[ 289.596009] CR2: ffff880013a91e30