[LU-2703] racer: BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404] - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 1.8.9
Affects Version/s: Lustre 1.8.9
Labels:
None
Environment:
Lustre Branch: b1_8
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/252
Distro/Arch: RHEL6.3 (client), RHEL5.9 (server)

Severity:
3
Rank (Obsolete):
6296

Description

While running racer test, the following issue occurred on one of the two clients:

00:11:31:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-28vm1,client-28vm2.lab.whamcloud.com DURATION=900 == 00:11:30 (1359447090)
00:11:32:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
00:24:12:BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]
00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
00:24:12:CPU 0 
00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
00:24:12:
00:24:12:Pid: 1404, comm: dd Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
00:24:12:RIP: 0010:[<ffffffff814ec53e>]  [<ffffffff814ec53e>] _spin_lock+0x1e/0x30
00:24:12:RSP: 0018:ffff880030ce16a8  EFLAGS: 00000206
00:24:13:RAX: 0000000000000001 RBX: ffff880030ce16a8 RCX: ffff8800783c6ba0
00:24:13:RDX: 0000000000000000 RSI: ffff88004ecec7c0 RDI: ffff88007b510a1c
00:24:13:RBP: ffffffff8100bb8e R08: 0000000000000102 R09: 0000000000000000
00:24:14:R10: 0000000003b5e000 R11: 000000000000000e R12: ffffffffa058315f
00:24:14:R13: ffff880030ce1638 R14: 0000000003b5e000 R15: 0000000003b5efff
00:24:14:FS:  00007f288786d700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
00:24:14:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
00:24:14:CR2: 00000036526cd710 CR3: 000000003d0ee000 CR4: 00000000000006f0
00:24:14:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
00:24:14:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
00:24:14:Process dd (pid: 1404, threadinfo ffff880030ce0000, task ffff88007d082aa0)
00:24:14:Stack:
00:24:14: ffff880030ce17f8 ffffffffa0275889 00000000002537dc ffffea000115a9f0
00:24:14:<d> ffff88003e877950 0000000000000000 0000000003b5e000 0000000003b5efff
00:24:14:<d> ffff880030ce1998 ffffffffa079bddd f869fda2cf0897a1 0000000000000000
00:24:14:Call Trace:
00:24:15: [<ffffffffa0275889>] ? osc_queue_async_io+0x399/0x1140 [osc]
00:24:15: [<ffffffffa079bddd>] ? ll_prepare_write+0x50d/0x1230 [lustre]
00:24:15: [<ffffffffa072adce>] ? lov_stripe_offset+0x28e/0x340 [lov]
00:24:15: [<ffffffffa072a8db>] ? lov_tgt_seq_show+0x26b/0x300 [lov]
00:24:16: [<ffffffffa070d0a9>] ? lov_queue_async_io+0x149/0x4a0 [lov]
00:24:16: [<ffffffffa0795780>] ? queue_or_sync_write+0x160/0xda0 [lustre]
00:24:16: [<ffffffffa07a2c2b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre]
00:24:16: [<ffffffffa079cde5>] ? ll_commit_write+0x2e5/0x750 [lustre]
00:24:16: [<ffffffffa07b4333>] ? ll_write_begin+0x83/0x210 [lustre]
00:24:16: [<ffffffffa07b4280>] ? ll_write_end+0x30/0x60 [lustre]
00:24:16: [<ffffffff811107fa>] ? generic_file_buffered_write+0x18a/0x2e0
00:24:16: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
00:24:16: [<ffffffff81112130>] ? __generic_file_aio_write+0x250/0x480
00:24:16: [<ffffffffa0765dba>] ? ll_file_get_tree_lock_iov+0x14a/0x810 [lustre]
00:24:16: [<ffffffff811123cf>] ? generic_file_aio_write+0x6f/0xe0
00:24:16: [<ffffffffa0772449>] ? ll_file_aio_write+0xa19/0x1c60 [lustre]
00:24:16: [<ffffffffa0773760>] ? ll_file_write+0xd0/0xf0 [lustre]
00:24:16: [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
00:24:16: [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
00:24:16: [<ffffffff8120ca26>] ? security_file_permission+0x16/0x20
00:24:16: [<ffffffff8117646d>] ? rw_verify_area+0x5d/0xc0
00:24:16: [<ffffffff81176588>] ? vfs_write+0xb8/0x1a0
00:24:16: [<ffffffff81176e81>] ? sys_write+0x51/0x90
00:24:16: [<ffffffff810d3a75>] ? __audit_syscall_exit+0x265/0x290
00:24:16: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
00:24:16:Code: 00 00 00 01 74 05 e8 72 8c d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 3e 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89

Maloo report: https://maloo.whamcloud.com/test_sets/83fa5d12-6a23-11e2-85d4-52540035b04c

Attachments

Issue Links

is related to

LU-2468 MDS out of memory, blocked in ldlm_pools_shrink()

Resolved

racer: BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]

Details

Description

Attachments

Issue Links

Activity

People

Dates