[LU-2703] racer: BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404] Created: 29/Jan/13  Updated: 18/Feb/13  Resolved: 18/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.9
Fix Version/s: Lustre 1.8.9

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Branch: b1_8
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/252
Distro/Arch: RHEL6.3 (client), RHEL5.9 (server)


Issue Links:
Related
is related to LU-2468 MDS out of memory, blocked in ldlm_po... Resolved
Severity: 3
Rank (Obsolete): 6296

 Description   

While running racer test, the following issue occurred on one of the two clients:

00:11:31:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-28vm1,client-28vm2.lab.whamcloud.com DURATION=900 == 00:11:30 (1359447090)
00:11:32:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
00:24:12:BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]
00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
00:24:12:CPU 0 
00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
00:24:12:
00:24:12:Pid: 1404, comm: dd Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
00:24:12:RIP: 0010:[<ffffffff814ec53e>]  [<ffffffff814ec53e>] _spin_lock+0x1e/0x30
00:24:12:RSP: 0018:ffff880030ce16a8  EFLAGS: 00000206
00:24:13:RAX: 0000000000000001 RBX: ffff880030ce16a8 RCX: ffff8800783c6ba0
00:24:13:RDX: 0000000000000000 RSI: ffff88004ecec7c0 RDI: ffff88007b510a1c
00:24:13:RBP: ffffffff8100bb8e R08: 0000000000000102 R09: 0000000000000000
00:24:14:R10: 0000000003b5e000 R11: 000000000000000e R12: ffffffffa058315f
00:24:14:R13: ffff880030ce1638 R14: 0000000003b5e000 R15: 0000000003b5efff
00:24:14:FS:  00007f288786d700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
00:24:14:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
00:24:14:CR2: 00000036526cd710 CR3: 000000003d0ee000 CR4: 00000000000006f0
00:24:14:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
00:24:14:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
00:24:14:Process dd (pid: 1404, threadinfo ffff880030ce0000, task ffff88007d082aa0)
00:24:14:Stack:
00:24:14: ffff880030ce17f8 ffffffffa0275889 00000000002537dc ffffea000115a9f0
00:24:14:<d> ffff88003e877950 0000000000000000 0000000003b5e000 0000000003b5efff
00:24:14:<d> ffff880030ce1998 ffffffffa079bddd f869fda2cf0897a1 0000000000000000
00:24:14:Call Trace:
00:24:15: [<ffffffffa0275889>] ? osc_queue_async_io+0x399/0x1140 [osc]
00:24:15: [<ffffffffa079bddd>] ? ll_prepare_write+0x50d/0x1230 [lustre]
00:24:15: [<ffffffffa072adce>] ? lov_stripe_offset+0x28e/0x340 [lov]
00:24:15: [<ffffffffa072a8db>] ? lov_tgt_seq_show+0x26b/0x300 [lov]
00:24:16: [<ffffffffa070d0a9>] ? lov_queue_async_io+0x149/0x4a0 [lov]
00:24:16: [<ffffffffa0795780>] ? queue_or_sync_write+0x160/0xda0 [lustre]
00:24:16: [<ffffffffa07a2c2b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre]
00:24:16: [<ffffffffa079cde5>] ? ll_commit_write+0x2e5/0x750 [lustre]
00:24:16: [<ffffffffa07b4333>] ? ll_write_begin+0x83/0x210 [lustre]
00:24:16: [<ffffffffa07b4280>] ? ll_write_end+0x30/0x60 [lustre]
00:24:16: [<ffffffff811107fa>] ? generic_file_buffered_write+0x18a/0x2e0
00:24:16: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
00:24:16: [<ffffffff81112130>] ? __generic_file_aio_write+0x250/0x480
00:24:16: [<ffffffffa0765dba>] ? ll_file_get_tree_lock_iov+0x14a/0x810 [lustre]
00:24:16: [<ffffffff811123cf>] ? generic_file_aio_write+0x6f/0xe0
00:24:16: [<ffffffffa0772449>] ? ll_file_aio_write+0xa19/0x1c60 [lustre]
00:24:16: [<ffffffffa0773760>] ? ll_file_write+0xd0/0xf0 [lustre]
00:24:16: [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
00:24:16: [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
00:24:16: [<ffffffff8120ca26>] ? security_file_permission+0x16/0x20
00:24:16: [<ffffffff8117646d>] ? rw_verify_area+0x5d/0xc0
00:24:16: [<ffffffff81176588>] ? vfs_write+0xb8/0x1a0
00:24:16: [<ffffffff81176e81>] ? sys_write+0x51/0x90
00:24:16: [<ffffffff810d3a75>] ? __audit_syscall_exit+0x265/0x290
00:24:16: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
00:24:16:Code: 00 00 00 01 74 05 e8 72 8c d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 3e 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89

Maloo report: https://maloo.whamcloud.com/test_sets/83fa5d12-6a23-11e2-85d4-52540035b04c



 Comments   
Comment by Jian Yu [ 29/Jan/13 ]

This is an regression issue after http://review.whamcloud.com/5132 was landed.

Comment by Peter Jones [ 30/Jan/13 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 30/Jan/13 ]

this specific case (https://maloo.whamcloud.com/test_sets/83fa5d12-6a23-11e2-85d4-52540035b04c) is due to MDS journal dead for some unknown reason.

dead process on MDS
01:17:06:kjournald     D ffff810002536420     0   392     23           412   383 (L-TLB)
01:17:06: ffff81007f10bca0 0000000000000046 0000000000011220 0000000000000046
01:17:06: 0000000000000220 000000000000000a ffff810037fe30c0 ffffffff8031db60
01:17:06: 0000178e3069efc5 000000000000edfb ffff810037fe32a8 000000008006e6b3
01:17:06:Call Trace:
01:17:06: [<ffffffff8006ed48>] do_gettimeofday+0x40/0x90
01:17:06: [<ffffffff800155b4>] sync_buffer+0x0/0x3f
01:17:06: [<ffffffff800637de>] io_schedule+0x3f/0x67
01:17:06: [<ffffffff800155ef>] sync_buffer+0x3b/0x3f
01:17:06: [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
01:17:06: [<ffffffff800155b4>] sync_buffer+0x0/0x3f
01:17:06: [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
01:17:06: [<ffffffff800a3c15>] wake_bit_function+0x0/0x23
01:17:06: [<ffffffff8803331c>] :jbd:journal_commit_transaction+0xa7f/0x132b
01:17:06: [<ffffffff8003dde5>] lock_timer_base+0x1b/0x3c
01:17:06: [<ffffffff88037489>] :jbd:kjournald+0xc1/0x213
01:17:06: [<ffffffff800a3be7>] autoremove_wake_function+0x0/0x2e
01:17:06: [<ffffffff800a39cf>] keventd_create_kthread+0x0/0xc4
01:17:06: [<ffffffff880373c8>] :jbd:kjournald+0x0/0x213
01:17:06: [<ffffffff800a39cf>] keventd_create_kthread+0x0/0xc4
01:17:06: [<ffffffff80032c45>] kthread+0xfe/0x132
01:17:06: [<ffffffff8005dfc1>] child_rip+0xa/0x11
01:17:06: [<ffffffff800a39cf>] keventd_create_kthread+0x0/0xc4
01:17:06: [<ffffffff80032b47>] kthread+0x0/0x132
01:17:06: [<ffffffff8005dfb7>] child_rip+0x0/0x11

01:17:07:syslogd       D ffff810037fe30c0     0  1607      1          1610  1585 (NOTLB)
01:17:07: ffff8100741bdd98 0000000000000082 00000000741bdd18 0000000000000001
01:17:07: 0000000000000000 0000000000000009 ffff81007ec287a0 ffff810037fe30c0
01:17:07: 0000178e2ecd14cd 000000000001708d ffff81007ec28988 0000000000000092
01:17:07:Call Trace:
01:17:07: [<ffffffff8002e4db>] __wake_up+0x38/0x4f
01:17:07: [<ffffffff8803682d>] :jbd:log_wait_commit+0xa3/0xf5
01:17:07: [<ffffffff800a3be7>] autoremove_wake_function+0x0/0x2e
01:17:07: [<ffffffff880307f8>] :jbd:journal_stop+0x22a/0x259
01:17:07: [<ffffffff8002ff63>] __writeback_single_inode+0x1dd/0x31c
01:17:07: [<ffffffff800e4b09>] do_readv_writev+0x26e/0x291
01:17:07: [<ffffffff800f8a7a>] sync_inode+0x24/0x33
01:17:07: [<ffffffff8804d52e>] :ext3:ext3_sync_file+0xce/0xf8
01:17:07: [<ffffffff800504b1>] do_fsync+0x52/0xa4
01:17:07: [<ffffffff800e5424>] __do_fsync+0x23/0x36
01:17:07: [<ffffffff8005d29e>] tracesys+0xd5/0xdf
01:17:07:
Comment by Johann Lombardi (Inactive) [ 31/Jan/13 ]

syslogd is waiting for kjournald and kjournald is waiting for I/O to complete. What do you mean by journal is dead?

From the original report, it seems that we are stuck on a spinlock for 67s, so i tend to think that someone is doing something it shouldn't (e.g. sleeping, locking ordering issue) while holding a spinlock.

Comment by Jian Yu [ 31/Jan/13 ]

Lustre Branch: b1_8
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/252
Distro/Arch: RHEL6.3 (client), RHEL5.9 (server)

This issue occurred constantly on Lustre b1_8 build #252: https://maloo.whamcloud.com/test_sets/0fe4438e-6bc7-11e2-b2a5-52540035b04c

Comment by Johann Lombardi (Inactive) [ 01/Feb/13 ]

hm, we are looping in osc_announce_cached() in the last report and it was osc_queue_async_io() in the original one. I suspect the issue is related to messing up with the cl_loi_list_lock. Unfortunately, sysrq-t output in maloo is full of soft lockup messages. I think we should try to collect a crash dump.

Comment by Zhenyu Xu [ 01/Feb/13 ]

For the record, I could not reproduce it for RHEL5.9 server and client (both use 2.6.18-348.1.1.el5)

Comment by Jian Yu [ 04/Feb/13 ]

Hi Bobi and Johann,

I reproduced this issue manually on Toro client-12vm[1-4] against Lustre b1_8 build #252 and got the kernel dump file of client-12vm1.

Client1: client-12vm1 (RHEL6.3 2.6.32-279.19.1.el6.x86_64)
Client2: client-12vm2 (RHEL6.3 2.6.32-279.19.1.el6.x86_64)
MGS/MDS: client-12vm3 (RHEL5.9 2.6.18-348.1.1.el5_lustre.g3480bb0)
OSS: client-12vm4     (RHEL5.9 2.6.18-348.1.1.el5_lustre.g3480bb0)  

The dump file is /scratch/logs/1.8.9/LU-2703/client-12vm1.dump on brent node.

I also put the kernel-debuginfo-* packages into that directory.

Comment by Jian Yu [ 04/Feb/13 ]

FYI, client-12vm[1-4] are remaining for debugging.

Comment by Zhenyu Xu [ 04/Feb/13 ]

osc_update_grant is trying spin_lock cl_loi_list_lock, but I haven't found where the lock has been hold at the moment.

crash> bt -a
PID: 3789 TASK: ffff88007c73a040 CPU: 0 COMMAND: "dd"
[exception RIP: _spin_lock+38]
RIP: ffffffff814ec546 RSP: ffff88003318b388 RFLAGS: 00000206
RAX: 0000000000000001 RBX: ffff88007ca5c180 RCX: 0000000000000030
RDX: 0000000000000000 RSI: ffff88007ca5c180 RDI: ffff880079b7069c
RBP: ffff88003318b388 R8: 0000000000000000 R9: 0000000000001000
R10: ffff880037528310 R11: 000000000000000f R12: ffff880079b705d0
R13: ffff880079b7069c R14: ffff880079b705d0 R15: 0000000000001000
CS: 0010 SS: 0018
#0 [ffff88003318b390] osc_update_grant at ffffffffa02667fe [osc]
#1 [ffff88003318b3e0] osc_brw_fini_request at ffffffffa027727f [osc]
#2 [ffff88003318b500] osc_brw at ffffffffa0278931 [osc]
#3 [ffff88003318b640] lov_brw at ffffffffa06879c9 [lov]
#4 [ffff88003318b6f0] ll_prepare_write at ffffffffa070a3fe [lustre]
#5 [ffff88003318b9a0] ll_write_begin at ffffffffa0722333 [lustre]
#6 [ffff88003318ba20] generic_file_buffered_write at ffffffff81110793
#7 [ffff88003318baf0] __generic_file_aio_write at ffffffff81112130
#8 [ffff88003318bbb0] generic_file_aio_write at ffffffff811123cf
#9 [ffff88003318bc00] ll_file_aio_write at ffffffffa06e0449 [lustre]
#10 [ffff88003318bdd0] ll_file_write at ffffffffa06e1760 [lustre]
#11 [ffff88003318bef0] vfs_write at ffffffff81176588
#12 [ffff88003318bf30] sys_write at ffffffff81176e81
#13 [ffff88003318bf80] system_call_fastpath at ffffffff8100b072
crash> dis osc_update_grant
0xffffffffa02667d0 <osc_update_grant>: push %rbp
0xffffffffa02667d1 <osc_update_grant+1>: mov %rsp,%rbp
0xffffffffa02667d4 <osc_update_grant+4>: sub $0x40,%rsp
0xffffffffa02667d8 <osc_update_grant+8>: mov %rbx,-0x18(%rbp)
0xffffffffa02667dc <osc_update_grant+12>: mov %r12,-0x10(%rbp)
0xffffffffa02667e0 <osc_update_grant+16>: mov %r13,-0x8(%rbp)
0xffffffffa02667e4 <osc_update_grant+20>: nopl 0x0(%rax,%rax,1)
0xffffffffa02667e9 <osc_update_grant+25>: lea 0xcc(%rdi),%r13
0xffffffffa02667f0 <osc_update_grant+32>: mov %rdi,%r12
0xffffffffa02667f3 <osc_update_grant+35>: mov %rsi,%rbx
0xffffffffa02667f6 <osc_update_grant+38>: mov %r13,%rdi
0xffffffffa02667f9 <osc_update_grant+41>: callq 0xffffffff814ec520 <_spin_lock> =====> client_obd_list_lock(&cli->cl_loi_list_lock);
0xffffffffa02667fe <osc_update_grant+46>: testb $0x20,0x14833f(%rip) # 0xffffffffa03aeb44
0xffffffffa0266805 <osc_update_grant+53>: je 0xffffffffa0266810
0xffffffffa0266807 <osc_update_grant+55>: testb $0x8,0x148332(%rip) # 0xffffffffa03aeb40
0xffffffffa026680e <osc_update_grant+62>: jne 0xffffffffa0266840
0xffffffffa0266810 <osc_update_grant+64>: mov (%rbx),%rax
0xffffffffa0266813 <osc_update_grant+67>: test $0x8000000,%eax
0xffffffffa0266818 <osc_update_grant+72>: je 0xffffffffa0266823
0xffffffffa026681a <osc_update_grant+74>: mov 0x48(%rbx),%rax

crash> ps | grep dd
2 0 0 ffff88007dc84aa0 IN 0.0 0 0 [kthreadd]
13 2 0 ffff88007dd55500 RU 0.0 0 0 [sync_supers]
14 2 0 ffff88007dd54aa0 RU 0.0 0 0 [bdi-default]
15 2 0 ffff88007dd54040 IN 0.0 0 0 [kintegrityd/0]
903 2 0 ffff880037b84080 RU 0.0 0 0 [ib_addr]
1415 1387 0 ffff880037978080 IN 0.1 20220 1076 hald-addon-inpu
1431 1387 0 ffff880037ffb500 IN 0.0 17804 1036 hald-addon-acpi
> 3789 6414 0 ffff88007c73a040 RU 0.0 105180 664 dd
6401 6394 0 ffff880079bdd500 IN 0.1 106096 1244 file_rename.sh
15786 6398 0 ffff88007bef1500 IN 0.0 105180 664 dd
17462 6406 0 ffff88007d7ed540 IN 0.0 105180 664 dd
18320 6404 0 ffff880079cdd500 UN 0.0 115700 992 ls
crash> bt 15786
PID: 15786 TASK: ffff88007bef1500 CPU: 0 COMMAND: "dd"
#0 [ffff880078d715e8] schedule at ffffffff814e9c02
#1 [ffff880078d716b0] osc_queue_async_io at ffffffffa0275e72 [osc]
#2 [ffff880078d71800] lov_queue_async_io at ffffffffa067b0a9 [lov]
#3 [ffff880078d71880] queue_or_sync_write at ffffffffa0703780 [lustre]
#4 [ffff880078d71960] ll_commit_write at ffffffffa070ade5 [lustre]
#5 [ffff880078d719f0] ll_write_end at ffffffffa0722280 [lustre]
#6 [ffff880078d71a20] generic_file_buffered_write at ffffffff811107fa
#7 [ffff880078d71af0] __generic_file_aio_write at ffffffff81112130
#8 [ffff880078d71bb0] generic_file_aio_write at ffffffff811123cf
#9 [ffff880078d71c00] ll_file_aio_write at ffffffffa06e0449 [lustre]
#10 [ffff880078d71dd0] ll_file_write at ffffffffa06e1760 [lustre]
#11 [ffff880078d71ef0] vfs_write at ffffffff81176588
#12 [ffff880078d71f30] sys_write at ffffffff81176e81
#13 [ffff880078d71f80] system_call_fastpath at ffffffff8100b072

crash> bt 17462
PID: 17462 TASK: ffff88007d7ed540 CPU: 0 COMMAND: "dd"
#0 [ffff8800784d35e8] schedule at ffffffff814e9c02
#1 [ffff8800784d36b0] osc_queue_async_io at ffffffffa0275e72 [osc]
#2 [ffff8800784d3800] lov_queue_async_io at ffffffffa067b0a9 [lov]
#3 [ffff8800784d3880] queue_or_sync_write at ffffffffa0703780 [lustre]
#4 [ffff8800784d3960] ll_commit_write at ffffffffa070ade5 [lustre]
#5 [ffff8800784d39f0] ll_write_end at ffffffffa0722280 [lustre]
#6 [ffff8800784d3a20] generic_file_buffered_write at ffffffff811107fa
#7 [ffff8800784d3af0] __generic_file_aio_write at ffffffff81112130
#8 [ffff8800784d3bb0] generic_file_aio_write at ffffffff811123cf
#9 [ffff8800784d3c00] ll_file_aio_write at ffffffffa06e0449 [lustre]
#10 [ffff8800784d3dd0] ll_file_write at ffffffffa06e1760 [lustre]
#11 [ffff8800784d3ef0] vfs_write at ffffffff81176588
#12 [ffff8800784d3f30] sys_write at ffffffff81176e81
#13 [ffff8800784d3f80] system_call_fastpath at ffffffff8100b072
RIP: 00000036d9edae60 RSP: 00007fffce21ace0 RFLAGS: 00000202
RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 00000036d9edae60
RDX: 0000000000000400 RSI: 0000000000ed9000 RDI: 0000000000000001
RBP: 0000000000ed9000 R8: 00000036da18eee8 R9: 0000000000000001
R10: 0000000000002403 R11: 0000000000000246 R12: 0000000000ed8fff
R13: 0000000000000000 R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

crash> dis 0xffffffffa0275e41 20
0xffffffffa0275e41 <osc_queue_async_io+2385>: mov -0xd8(%rbp),%rdx
0xffffffffa0275e48 <osc_queue_async_io+2392>: mov %rbx,-0xb8(%rbp)
0xffffffffa0275e4f <osc_queue_async_io+2399>: mov %rdx,%rbx
0xffffffffa0275e52 <osc_queue_async_io+2402>: mov $0x1,%eax
0xffffffffa0275e57 <osc_queue_async_io+2407>: xchg %rax,(%rbx)
0xffffffffa0275e5a <osc_queue_async_io+2410>: mov -0xa8(%rbp),%rdi
0xffffffffa0275e61 <osc_queue_async_io+2417>: mov %r13,%rsi
0xffffffffa0275e64 <osc_queue_async_io+2420>: callq 0xffffffffa0266890 <ocw_granted>
0xffffffffa0275e69 <osc_queue_async_io+2425>: test %eax,%eax
0xffffffffa0275e6b <osc_queue_async_io+2427>: jne 0xffffffffa0275e92
0xffffffffa0275e6d <osc_queue_async_io+2429>: callq 0xffffffff814e9850 <schedule> =====> l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);
0xffffffffa0275e72 <osc_queue_async_io+2434>: mov -0xa8(%rbp),%rdi
0xffffffffa0275e79 <osc_queue_async_io+2441>: mov %r13,%rsi
0xffffffffa0275e7c <osc_queue_async_io+2444>: callq 0xffffffffa0266890 <ocw_granted>
0xffffffffa0275e81 <osc_queue_async_io+2449>: test %eax,%eax
0xffffffffa0275e83 <osc_queue_async_io+2451>: jne 0xffffffffa0275e92
0xffffffffa0275e85 <osc_queue_async_io+2453>: callq 0xffffffffa039e250 <cfs_signal_pending>
0xffffffffa0275e8a <osc_queue_async_io+2458>: test %eax,%eax
0xffffffffa0275e8c <osc_queue_async_io+2460>: nopl 0x0(%rax)

Comment by Johann Lombardi (Inactive) [ 04/Feb/13 ]

Another possibility is that we forgot somewhere to unlock the spinlock ...

Comment by Keith Mannthey (Inactive) [ 04/Feb/13 ]

I had started on setting up a local setup to help get the crash but I see we have that first step. I am now working to debug the spinlock. It will be good to see who touched the lock last.

Comment by Keith Mannthey (Inactive) [ 04/Feb/13 ]

Sorry no hard data from my local debug setup yet, I wasted some time with a 1.8 build environment when I should have just grabbed the 1.8.9 rpms from the build. In general the plan for the Master Client is to ensure that when a spinlock is acquired the place and task is captured so that when everyone is waiting it lock can be examined to see who and where it was touched last. With any luck it is not a small window where a debug spinlock will not cause the issue to be skipped.

With the current state of master in such change is there a version of master that is known to work?

Comment by Zhenyu Xu [ 05/Feb/13 ]

I think I've found who holds the cl_loi_list_lock and sleeps away.

crash> ps | grep ptlrpcd
1808 2 0 ffff880037f0f540 RU 0.0 0 0 [ptlrpcd]
1809 2 0 ffff88007d3af500 RU 0.0 0 0 [ptlrpcd-recov]
crash> bt 1808
PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd"
#0 [ffff88007c881620] schedule at ffffffff814e9c02
#1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a
#2 [ffff88007c881708] _cond_resched at ffffffff814ea610
#3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0
#4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs]
#5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc]
#6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb [ptlrpc]
#7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d [ptlrpc]
#8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc]
#9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc]
#10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov]
#11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre]
#12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov]
#13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc] ====> here ptlrpcd has holden the cl_loi_list_lock
#14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc]
#15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc]
#16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc]
#17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc]
#18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca

Will push a patch for review.

Comment by Johann Lombardi (Inactive) [ 05/Feb/13 ]

I think master has a patch for this already (it prevents kmalloc on this path).

Comment by Zhenyu Xu [ 05/Feb/13 ]

this issue is a little related to LU-2468, they are all both involved in trying to allocate memory during memory freeing phase.

Comment by Zhenyu Xu [ 05/Feb/13 ]

patch tracking at http://review.whamcloud.com/5272

commit message
LU-2703 osc: ptlrpcd scheduled with a spinlock

When memory is stringent, there is a change that ptlrpcd calling path
reschedules waiting for memory available while holding a spinlock,
the backtrace is as follows:

PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd"
0 [ffff88007c881620] schedule at ffffffff814e9c02
1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a
2 [ffff88007c881708] _cond_resched at ffffffff814ea610
3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0
4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs]
5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc]
6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb
							   [ptlrpc]
7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d
[ptlrpc]
8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc]
9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc]
10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov]
11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre]
12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov]
13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc]
====> here ptlrpcd has holden the cl_loi_list_lock
14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc]
15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc]
16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc]
17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc]
18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca

This patch unlocks the spinlock before calling osc_ap_completion().
Comment by Johann Lombardi (Inactive) [ 05/Feb/13 ]

This issue was actually added by this commit 29309746c4049aa7da6cde4cb9a44ec0df2b1af3 (bugzilla 21252) which was a fix for bugzilla 16774. I think nobody uses this feature on 1.8 (which is disabled by default anyway). Maybe we could just revert bugzilla 21252 as well as some part of bugzilla 16774 to restore the default readahead locking behavior?

Comment by Hongchao Zhang [ 06/Feb/13 ]

the obd_cancel call in ll_ap_completion is too big to be included in a spin_lock for it is easy to cause such issue,
we could revert bugzilla 21252&16774 or move the obd_cancel out of *_ap_completion, will create a patch for the second options soon.

Comment by Hongchao Zhang [ 06/Feb/13 ]

the patch is tracked at http://review.whamcloud.com/#change,5285

Comment by Bruno Faccini (Inactive) [ 09/Feb/13 ]

I am not sure but seems to me that some of the test failures may come from memory-leak detected/indicated at obdclass module unload. In fact it is the memory accounting which seems to be wrong and found more memory freed ("leaked: 18446744073709551612", ie 0xFFFFFFFFFFFFFFFC).This problem may have been induced by last patch #5 where the new macro PPGA_SIZE() plays with pointer size to determine what unit-size to alloc/free in osc_build_ppga()/osc_release_ppga(). But this can fail (actually on archs, like i686 where pointer size is < sizeof(struct lustre_handle)=64-bits) if not all allocations use osc_build_ppga() like in osc_build_req().

Generated at Sat Feb 10 01:27:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.