Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.7.0
-
None
-
server: 2.6.0
client: lustre-master build #2733
-
3
-
16494
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a99dab2e-68f9-11e4-9444-5254006e85c2.
The sub-test test_13 failed with the following error:
test failed to respond and timed out
15:56:02:Lustre: DEBUG MARKER: == sanity-scrub test 13: OI scrub can rebuild missed /O entries == 15:55:38 (1415404538) 15:56:02:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x196 15:56:02:Lustre: *** cfs_fail_loc=196, val=0*** 15:56:02:Lustre: Skipped 63 previous similar messages 15:56:02:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0 15:56:02:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts 15:56:03:Lustre: DEBUG MARKER: umount -d /mnt/ost1 15:56:03:Lustre: Failing over lustre-OST0000 15:56:03:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 15:56:03:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1 15:56:03:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1 15:56:03:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre -o user_xattr,noscrub /dev/lvm-Role_OSS/P1 /mnt/ost1 15:56:03:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 15:56:04:LustreError: 137-5: lustre-OST0000_UUID: not available for connect from 10.2.4.225@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. 15:56:04:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u 15:56:04:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 2>/dev/null 15:56:04:Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 2 clients reconnect 15:56:04:Lustre: lustre-OST0000: Denying connection for new client 02b0a6e6-f977-0e0b-772a-b2c3b66ac87b (at 10.2.4.220@tcp), waiting for all 2 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 0:59 15:56:04:Lustre: lustre-OST0000: Denying connection for new client 02b0a6e6-f977-0e0b-772a-b2c3b66ac87b (at 10.2.4.220@tcp), waiting for all 2 known clients (1 recovered, 0 in progress, and 0 evicted) to recover in 1:19 15:56:04:Lustre: lustre-OST0000: Recovery over after 0:06, of 2 clients 2 recovered and 0 were evicted. 15:56:04:Lustre: lustre-OST0000: deleting orphan objects from 0x0:738 to 0x0:769 15:56:04:LustreError: 11266:0:(ldlm_resource.c:1150:ldlm_resource_get()) lustre-OST0000: lvbo_init failed for resource 0x2d8:0x0: rc = -2 15:56:04:LustreError: 11266:0:(ldlm_resource.c:1150:ldlm_resource_get()) Skipped 63 previous similar messages 15:56:04:BUG: unable to handle kernel paging request at 000000005a5a5a5a 15:56:04:IP: [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc] 15:56:04:PGD 0 15:56:05:Oops: 0000 [#1] SMP 15:56:05:last sysfs file: /sys/devices/system/cpu/online 15:56:05:CPU 0 15:56:05:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: obdecho] 15:56:05: 15:56:05:Pid: 11261, comm: ll_ost00_014 Tainted: G W --------------- 2.6.32-431.20.3.el6_lustre.x86_64 #1 Red Hat KVM 15:56:06:RIP: 0010:[<ffffffffa07ea251>] [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc] 15:56:06:RSP: 0018:ffff88006c515c60 EFLAGS: 00010246 15:56:06:RAX: ffff88006ccc1358 RBX: ffff88006ccc1180 RCX: ffff880037a64880 15:56:06:RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006ccc1290 15:56:06:RBP: ffff88006c515cb0 R08: 0000000000000005 R09: 0000000000000000 15:56:06:R10: ffff88006ccc1180 R11: 0000000000000200 R12: ffff88006ccc1180 15:56:06:R13: ffffffffa0916ea0 R14: ffff88007ba5e400 R15: 000000005a5a5a5a 15:56:06:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 15:56:06:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 15:56:06:CR2: 000000005a5a5a5a CR3: 000000007db32000 CR4: 00000000000006f0 15:56:06:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 15:56:07:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 15:56:07:Process ll_ost00_014 (pid: 11261, threadinfo ffff88006c514000, task ffff88006f803540) 15:56:07:Stack: 15:56:07: ffff880000000010 0000000000000000 0000000b00000004 ffffffffa05d5e31 15:56:07:<d> ffff88007ce36738 ffff88006e0ed850 ffff88007ce36818 ffffffffa0916ea0 15:56:07:<d> ffff88007ba5e400 0000000000000001 ffff88006c515d20 ffffffffa0811c74 15:56:07:Call Trace: 15:56:07: [<ffffffffa05d5e31>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass] 15:56:07: [<ffffffffa0811c74>] ldlm_handle_enqueue0+0x174/0x11d0 [ptlrpc] 15:56:07: [<ffffffffa08947c2>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 15:56:08: [<ffffffffa0894b6c>] tgt_request_handle+0x23c/0xac0 [ptlrpc] 15:56:08: [<ffffffffa084426a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] 15:56:08: [<ffffffffa0843550>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] 15:56:08: [<ffffffff8109abf6>] kthread+0x96/0xa0 15:56:08: [<ffffffff8100c20a>] child_rip+0xa/0x20 15:56:09: [<ffffffff8109ab60>] ? kthread+0x0/0xa0 15:56:09: [<ffffffff8100c200>] ? child_rip+0x0/0x20 15:56:09:Code: 00 00 49 8d 84 24 d8 01 00 00 49 c7 84 24 90 00 00 00 00 00 00 00 ba 01 00 00 00 49 89 84 24 d8 01 00 00 49 89 84 24 e0 01 00 00 <49> 8b 07 48 8b 00 48 8b b8 88 01 00 00 e8 7d ba de ff 4d 89 24 15:56:09:RIP [<ffffffffa07ea251>] ldlm_lock_create+0x201/0xd70 [ptlrpc] 15:56:09: RSP <ffff88006c515c60> 15:56:09:CR2: 000000005a5a5a5a 15:56:09:Initializing cgroup subsys cpuset 15:56:09:Initializing cgroup subsys cpu
Info required for matching: sanity-scrub 13