[LU-1596] mds crash after recovery finished Created: 03/Jul/12  Updated: 20/Sep/12  Resolved: 20/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None
Environment:

2.1.2 server
2.1.1and2.1.2 clients


Attachments: File s160-crash-7.3.2012    
Severity: 3
Rank (Obsolete): 6368

 Description   

see attached console logs. MDS went through recover didn't finish it restarted recovery. At the end of send recovery cycle it crash

crash is at line 1252
BUG: scheduling while atomic: mdt_33/5450/0xffff8800^M
BUG: unable to handle kernel paging request at 0000000000015fc0^M
IP: [<ffffffff81523c5e>] _spin_lock+0xe/0x30^M
PGD 5cc604067 PUD 5cc608067 PMD 0 ^M
Oops: 0002 1 SMP ^M
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host1/rport-1:0-0/target1:0:0/1:0:0:1/state^M
^M
Entering kdb (current=0xffffffff81a2d020, pid 0) on processor 0 Oops: (null)^M
due to oops @ 0xffffffff81523c5e^M
r15 = 0x0000000000000003 r14 = 0x0000000000015fc0 ^M
r13 = 0xffff880040e03998 r12 = 0xffff8804432c9500 ^M
bp = 0xffff880040e03930 bx = 0x0000000000015fc0 ^M
r11 = 0x0000000000000028 r10 = 0x0000000000000000 ^M
r9 = 0xfbad0278679d3602 r8 = 0x0000000000000000 ^M
ax = 0x0000000000010000 cx = 0xffff880040e03ab0 ^M
dx = 0x0000000000000082 si = 0xffff880040e03998 ^M
di = 0x0000000000015fc0 orig_ax = 0xffffffffffffffff ^M
ip = 0xffffffff81523c5e cs = 0x0000000000000010 ^M
flags = 0x0000000000010006 sp = 0xffff880040e03920 ^M
ss = 0x0000000000000018 &regs = 0xffff880040e03888^M



 Comments   
Comment by Peter Jones [ 05/Jul/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 05/Jul/12 ]

The log shows it BUG on pid 5450:

BUG: scheduling while atomic: mdt_33/5450/0xffff8800

The backtrace for 5450 is as below:

mdt_33        D 0000000000000246     0  5450      2 0x3005ff40
 ffff8804432ca750 0000000000000018 ffff88061feaa378 0000000000000246
 ffff8804432ca780 ffffffffa00023ac ffff880621185400 ffff88042f278690
 ffff8804432ca7e0 ffff88061ffe8f20 ffff8804432ca7b0 ffffffff81254f74
Call Trace:
 [<ffffffffa00023ac>] ? dm_dispatch_request+0x3c/0x70 [dm_mod]
 [<ffffffff81254f74>] ? blk_unplug+0x34/0x70
 [<ffffffff811a7c90>] ? sync_buffer+0x0/0x50
 [<ffffffff81520d83>] ? printk+0x41/0x46
 [<ffffffff81056681>] ? __schedule_bug+0x41/0x70
 [<ffffffff81521978>] ? thread_return+0x638/0x760
 [<ffffffffa000420c>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
 [<ffffffff8109ac19>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff811a7c90>] ? sync_buffer+0x0/0x50
 [<ffffffff81521b13>] ? io_schedule+0x73/0xc0
 [<ffffffff811a7cd0>] ? sync_buffer+0x40/0x50
 [<ffffffff815224cf>] ? __wait_on_bit+0x5f/0x90
 [<ffffffff811a7c90>] ? sync_buffer+0x0/0x50
 [<ffffffff81522578>] ? out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81090030>] ? wake_bit_function+0x0/0x50
 [<ffffffff811a7c86>] ? __wait_on_buffer+0x26/0x30
 [<ffffffffa0abb63c>] ? ldiskfs_mb_init_cache+0x24c/0xa30 [ldiskfs]
 [<ffffffffa0abbf3e>] ? ldiskfs_mb_init_group+0x11e/0x210 [ldiskfs]
 [<ffffffffa0abc0fd>] ? ldiskfs_mb_good_group+0xcd/0x110 [ldiskfs]
 [<ffffffffa0abf0bb>] ? ldiskfs_mb_regular_allocator+0x19b/0x410 [ldiskfs]
 [<ffffffff8152286e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0abf762>] ? ldiskfs_mb_new_blocks+0x432/0x660 [ldiskfs]
 [<ffffffffa0aa61fe>] ? ldiskfs_ext_find_extent+0x2ce/0x330 [ldiskfs]
 [<ffffffffa0aa93b4>] ? ldiskfs_ext_get_blocks+0x1114/0x1a10 [ldiskfs]
 [<ffffffffa0a7f541>] ? __jbd2_journal_file_buffer+0xd1/0x220 [jbd2]
 [<ffffffffa0a8075f>] ? jbd2_journal_dirty_metadata+0xff/0x150 [jbd2]
 [<ffffffffa0aa3f3b>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [<ffffffffa0ab1665>] ? ldiskfs_get_blocks+0xf5/0x2a0 [ldiskfs]
 [<ffffffffa0ad8048>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [<ffffffffa0ab62b9>] ? ldiskfs_getblk+0x79/0x1f0 [ldiskfs]
 [<ffffffffa0ab6448>] ? ldiskfs_bread+0x18/0x80 [ldiskfs]
 [<ffffffffa0b5e2a7>] ? fsfilt_ldiskfs_write_handle+0x147/0x340 [fsfilt_ldiskfs]
 [<ffffffffa0b5e55c>] ? fsfilt_ldiskfs_write_record+0xbc/0x1d0 [fsfilt_ldiskfs]
 [<ffffffffa06382ec>] ? llog_lvfs_write_blob+0x2bc/0x460 [obdclass]
 [<ffffffffa06338b8>] ? llog_init_handle+0xa18/0xa70 [obdclass]
 [<ffffffffa0639b5a>] ? llog_lvfs_write_rec+0x40a/0xf00 [obdclass]
 [<ffffffffa0636a65>] ? llog_cat_add_rec+0xf5/0x840 [obdclass]
 [<ffffffffa063d306>] ? llog_obd_origin_add+0x56/0x190 [obdclass]
 [<ffffffffa063d4e1>] ? llog_add+0xa1/0x3c0 [obdclass]
 [<ffffffff8115c66a>] ? kmem_getpages+0xba/0x170
 [<ffffffffa09610cc>] ? lov_llog_origin_add+0xcc/0x5d0 [lov]
 [<ffffffffa063d4e1>] ? llog_add+0xa1/0x3c0 [obdclass]
 [<ffffffffa0b7449e>] ? mds_llog_origin_add+0xae/0x2e0 [mds]
 [<ffffffffa0ab0651>] ? __ldiskfs_get_inode_loc+0xf1/0x3b0 [ldiskfs]
 [<ffffffffa0995d0b>] ? lov_tgt_maxbytes+0x5b/0xb0 [lov]
 [<ffffffffa063d4e1>] ? llog_add+0xa1/0x3c0 [obdclass]
 [<ffffffffa0b74b12>] ? mds_llog_add_unlink+0x162/0x520 [mds]
 [<ffffffffa0b75206>] ? mds_log_op_unlink+0x196/0x9a0 [mds]
 [<ffffffffa0b9d7de>] ? mdd_unlink_log+0x4e/0x100 [mdd]
 [<ffffffffa0b973ab>] ? mdd_attr_get_internal+0x7ab/0xb10 [mdd]
 [<ffffffffa0b91a8e>] ? mdd_object_kill+0x14e/0x1b0 [mdd]
 [<ffffffffa0bab8ce>] ? mdd_finish_unlink+0x20e/0x2c0 [mdd]
 [<ffffffffa0baa780>] ? __mdd_ref_del+0x40/0xc0 [mdd]
 [<ffffffffa0bb704c>] ? mdd_rename+0x1ffc/0x2240 [mdd]
 [<ffffffffa0b96dcb>] ? mdd_attr_get_internal+0x1cb/0xb10 [mdd]
 [<ffffffffa0589caf>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs]
 [<ffffffffa0c7f219>] ? cmm_mode_get+0x109/0x320 [cmm]
 [<ffffffffa0c7fd3a>] ? cml_rename+0x33a/0xbb0 [cmm]
 [<ffffffffa058a337>] ? cfs_hash_bd_get+0x37/0x90 [libcfs]
 [<ffffffffa0c7f49d>] ? cmm_is_subdir+0x6d/0x2f0 [cmm]
 [<ffffffffa067d8e6>] ? lu_object_put+0x86/0x210 [obdclass]
 [<ffffffffa0c09426>] ? mdt_reint_rename+0x1fa6/0x2400 [mdt]
 [<ffffffffa05909db>] ? upcall_cache_get_entry+0x28b/0xa14 [libcfs]
 [<ffffffffa0c0197f>] ? mdt_rename_unpack+0x46f/0x6c0 [mdt]
 [<ffffffffa0bb96c6>] ? md_ucred+0x26/0x60 [mdd]
 [<ffffffffa0c01c0f>] ? mdt_reint_rec+0x3f/0x100 [mdt]
 [<ffffffffa07726e4>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
 [<ffffffffa0bfa004>] ? mdt_reint_internal+0x6d4/0x9f0 [mdt]
 [<ffffffffa0bef9f6>] ? mdt_reint_opcode+0x96/0x160 [mdt]
 [<ffffffffa0bfa36c>] ? mdt_reint+0x4c/0x120 [mdt]
 [<ffffffffa07721b8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
 [<ffffffffa0bedc65>] ? mdt_handle_common+0x8d5/0x1810 [mdt]
 [<ffffffffa076fe44>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
 [<ffffffffa0beec75>] ? mdt_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa0780b89>] ? ptlrpc_main+0xbb9/0x1990 [ptlrpc]
 [<ffffffffa077ffd0>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffffa077ffd0>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
 [<ffffffffa077ffd0>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

The backtrace looks normal (though the stack is a bit deep), but it reported schedule while atomic. Alex, could you give some hint?

Comment by Jay Lan (Inactive) [ 08/Aug/12 ]

We hit this bug again last night.
This time it was not during reboot as was described when we open this ticket.

Here are what shown on console:

Lustre: MGS: haven't heard from client af2ee3e4-cf7b-2e45-4650-a3f788979f14 (at 10.151.6.3@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8811f27af000, cur 1344393811 expire 1344393661 last 1344393584^M
Lustre: Skipped 277 previous similar messages^M
Lustre: MGS: haven't heard from client 587217b0-8285-b0d5-1696-6665442381c8 (at 10.151.5.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8811f57aa800, cur 1344393811 expire 1344393661 last 1344393584^M
LustreError: 9602:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff881032b9d8c0^M
BUG: scheduling while atomic: mdt_268/9115/0xffff8809^M
BUG: unable to handle kernel paging request at 0000000181c1fa20^M
IP: [<ffffffff81051ddd>] task_rq_lock+0x4d/0xa0^M
PGD 0 ^M
Oops: 0000 1 SMP ^M
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/0000:04:04.1/irq^M
^M
Entering kdb (current=0xffff881227f13500, pid 0) on processor 10 Oops: (null)^M
due to oops @ 0xffffffff81051ddd^M
r15 = 0x0000000000000003 r14 = 0x0000000000015fc0 ^M
r13 = 0xffff88095fc83978 r12 = 0xffff8811da902a80 ^M
bp = 0xffff88095fc83940 bx = 0x0000000000015fc0 ^M
r11 = 0x0000000000000028 r10 = 0x0000000000000000 ^M
r9 = 0xf9ffdd6e00faa202 r8 = 0x0000000000000000 ^M
ax = 0x0000000040010dc0 cx = 0xffff88095fc83a90 ^M
dx = 0x0000000000000082 si = 0xffff88095fc83978 ^M
di = 0xffff8811da902a80 orig_ax = 0xffffffffffffffff ^M
ip = 0xffffffff81051ddd cs = 0x0000000000000010 ^M
flags = 0x0000000000010082 sp = 0xffff88095fc83910 ^M
ss = 0x0000000000000018 &regs = 0xffff88095fc83878^M
[10]kdb>

The crash analysis showed the stack trace of the task that caused the crash:

— <NMI exception stack> —
#6 [ffff88095fc63ae8] oops_begin at ffffffff81524f1e
#7 [ffff88095fc63b00] no_context at ffffffff8104230c
#8 [ffff88095fc63b50] __bad_area_nosemaphore at ffffffff81042615
#9 [ffff88095fc63ba0] bad_area_nosemaphore at ffffffff810426e3
#10 [ffff88095fc63bb0] __do_page_fault at ffffffff81042d9d
#11 [ffff88095fc63cd0] do_page_fault at ffffffff81526e5e
#12 [ffff88095fc63d00] page_fault at ffffffff81524175
[exception RIP: update_curr+324]
RIP: ffffffff81051974 RSP: ffff88095fc63db8 RFLAGS: 00010086
RAX: ffff8811da902a80 RBX: 0000000040010dc0 RCX: ffff881227d103c0
RDX: 0000000000018b48 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88095fc63de8 R8: 0000000000000000 R9: 00000000001208e0
R10: 0000000000000010 R11: 00000000001208e0 R12: ffff88095fc76028
R13: 000001131ede6d18 R14: 00014b731e02665b R15: ffff8811da902a80
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#13 [ffff88095fc63df0] task_tick_fair at ffffffff81052bab
#14 [ffff88095fc63e20] scheduler_tick at ffffffff810568d1
#15 [ffff88095fc63e60] update_process_times at ffffffff8107b5e2
#16 [ffff88095fc63e90] tick_sched_timer at ffffffff8109ffe6
#17 [ffff88095fc63ec0] __run_hrtimer at ffffffff8109476e
#18 [ffff88095fc63f10] hrtimer_interrupt at ffffffff81094b16
#19 [ffff88095fc63f90] smp_apic_timer_interrupt at ffffffff815297cb
#20 [ffff88095fc63fb0] apic_timer_interrupt at ffffffff8100bc13
— <IRQ stack> —
#21 [ffff8811da8aa6a8] apic_timer_interrupt at ffffffff8100bc13
[exception RIP: vprintk+465]
RIP: ffffffff81069c61 RSP: ffff8811da8aa750 RFLAGS: 00000246
RAX: 0000000000010e28 RBX: ffff8811da8aa7e0 RCX: 0000000000001b45
RDX: ffff88095fc60000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffffffff8100bc0e R8: ffffffff81ba2580 R9: 0000000000000000
R10: 0000000000000007 R11: 000000000000000f R12: 00000000000b29d3
R13: ffffffff81068fa5 R14: ffff8811da8aa6e0 R15: 0000000000000046
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#22 [ffff8811da8aa7e8] printk at ffffffff81520d83
#23 [ffff8811da8aa848] __schedule_bug at ffffffff81056681
#24 [ffff8811da8aa868] thread_return at ffffffff81521978
#25 [ffff8811da8aa928] io_schedule at ffffffff81521b13
#26 [ffff8811da8aa948] sync_buffer at ffffffff811a7cd0
#27 [ffff8811da8aa958] __wait_on_bit at ffffffff815224cf
#28 [ffff8811da8aa9a8] out_of_line_wait_on_bit at ffffffff81522578
#29 [ffff8811da8aaa18] __wait_on_buffer at ffffffff811a7c86
#30 [ffff8811da8aaa28] ldiskfs_mb_init_cache at ffffffffa0b4f63c [ldiskfs]
#31 [ffff8811da8aab08] ldiskfs_mb_init_group at ffffffffa0b4ff3e [ldiskfs]
#32 [ffff8811da8aab58] ldiskfs_mb_good_group at ffffffffa0b500fd [ldiskfs]
#33 [ffff8811da8aab98] ldiskfs_mb_regular_allocator at ffffffffa0b530bb [ldiskfs]
#34 [ffff8811da8aac48] ldiskfs_mb_new_blocks at ffffffffa0b53762 [ldiskfs]
#35 [ffff8811da8aace8] ldiskfs_ext_get_blocks at ffffffffa0b3d3b4 [ldiskfs]
#36 [ffff8811da8aae58] ldiskfs_get_blocks at ffffffffa0b45665 [ldiskfs]
#37 [ffff8811da8aaed8] ldiskfs_getblk at ffffffffa0b4a2b9 [ldiskfs]
#38 [ffff8811da8aaf98] ldiskfs_bread at ffffffffa0b4a448 [ldiskfs]
#39 [ffff8811da8aafc8] fsfilt_ldiskfs_write_handle at ffffffffa0bf72a7 [fsfilt_ldiskfs]
#40 [ffff8811da8ab078] fsfilt_ldiskfs_write_record at ffffffffa0bf755c [fsfilt_ldiskfs]
#41 [ffff8811da8ab0f8] llog_lvfs_write_blob at ffffffffa06c32ec [obdclass]
#42 [ffff8811da8ab1a8] llog_lvfs_write_rec at ffffffffa06c502e [obdclass]
#43 [ffff8811da8ab288] llog_cat_add_rec at ffffffffa06c1a65 [obdclass]
#44 [ffff8811da8ab308] llog_obd_origin_add at ffffffffa06c8306 [obdclass]
#45 [ffff8811da8ab368] llog_add at ffffffffa06c84e1 [obdclass]
#46 [ffff8811da8ab3d8] lov_llog_origin_add at ffffffffa09f60cc [lov]
#47 [ffff8811da8ab488] llog_add at ffffffffa06c84e1 [obdclass]
#48 [ffff8811da8ab4f8] mds_llog_origin_add at ffffffffa0c1349e [mds]
#49 [ffff8811da8ab578] llog_add at ffffffffa06c84e1 [obdclass]
#50 [ffff8811da8ab5e8] mds_llog_add_unlink at ffffffffa0c13b12 [mds]
#51 [ffff8811da8ab668] mds_log_op_unlink at ffffffffa0c14206 [mds]
#52 [ffff8811da8ab6f8] mdd_unlink_log at ffffffffa0c3c7de [mdd]
#53 [ffff8811da8ab758] mdd_object_kill at ffffffffa0c30a8e [mdd]
#54 [ffff8811da8ab7b8] mdd_finish_unlink at ffffffffa0c4a8ce [mdd]
#55 [ffff8811da8ab838] mdd_rename at ffffffffa0c5604c [mdd]
#56 [ffff8811da8ab9b8] cml_rename at ffffffffa0d1ed3a [cmm]
#57 [ffff8811da8aba68] mdt_reint_rename at ffffffffa0ca8426 [mdt]
#58 [ffff8811da8abbc8] mdt_reint_rec at ffffffffa0ca0c0f [mdt]
#59 [ffff8811da8abc18] mdt_reint_internal at ffffffffa0c99004 [mdt]
#60 [ffff8811da8abca8] mdt_reint at ffffffffa0c9936c [mdt]
#61 [ffff8811da8abcf8] mdt_handle_common at ffffffffa0c8cc65 [mdt]
#62 [ffff8811da8abd78] mdt_regular_handle at ffffffffa0c8dc75 [mdt]
#63 [ffff8811da8abd88] ptlrpc_main at ffffffffa080bd49 [ptlrpc]
#64 [ffff8811da8abf48] kernel_thread at ffffffff8100c14a

Comment by Jay Lan (Inactive) [ 08/Aug/12 ]

Our lustre git source is at
https://github.com/jlan/lustre-nas/commits/nas-2.1.2

The original report was built on tag 2.1.2-1nasS. The crash last night was built on tag 2.1.2-2nasS.

Comment by Jay Lan (Inactive) [ 08/Aug/12 ]

Can we modify the summary to "BUG: scheduling while atomic"? The current summary does not describe the crash of last night.

Comment by Mahmoud Hanafi [ 09/Aug/12 ]

We hit this again. Can we increase this to critical

Comment by Lai Siyao [ 09/Aug/12 ]

There is a log message reporting -ENOSPC:

LustreError: 9602:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff881032b9d8c0^M

Could you verify it's true on MDS?

Comment by Jay Lan (Inactive) [ 09/Aug/12 ]

I saw the -ENOSPC LustreError immediately before the crash. However, when I analyzed the vmcore with crash, "kmem -i" showed MDS had plenty of memory for the system:

PAGES TOTAL PERCENTAGE
TOTAL MEM 18494890 70.6 GB ----
FREE 11790336 45 GB 63% of TOTAL MEM
USED 6704554 25.6 GB 36% of TOTAL MEM
SHARED 2663959 10.2 GB 14% of TOTAL MEM
BUFFERS 2603900 9.9 GB 14% of TOTAL MEM
CACHED 119807 468 MB 0% of TOTAL MEM
SLAB 2185165 8.3 GB 11% of TOTAL MEM

TOTAL SWAP 500013 1.9 GB ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 500013 1.9 GB 100% of TOTAL SWAP

Did we run out of some preallocated memory? Hmmm...

Comment by Jay Lan (Inactive) [ 09/Aug/12 ]

On the other hand, -ENOSPC (or -28) was not reported in the crash last night (the 2nd crash in two consecutive nights.) So, it should not be the reason.

Comment by Oleg Drokin [ 09/Aug/12 ]

The problem at hand seems to be stack overflow, so patches from lu-969 should help

Comment by Oleg Drokin [ 09/Aug/12 ]

so the three patches you need are:
http://review.whamcloud.com/#change,2668
then http://review.whamcloud.com/3034
and then http://review.whamcloud.com/3072

Comment by Peter Jones [ 09/Aug/12 ]

Following up on the action in the meeting earlier. I spoke to Oleg about whether it was possible to predict when stack overflows would occur and he confirmed that it was a combination of independent factors that resulted in a stack overflow so it was not possible to anticipate ahead of time.

Comment by Jay Lan (Inactive) [ 09/Aug/12 ]

Oleg, how do I decide it was a stack overflow?

PID: 9115 TASK: ffff8811da902a80 CPU: 9 COMMAND: "mdt_268"

"bt" command showed the stack started at
#64 [ffff8811da8abf48] kernel_thread at ffffffff8100c14a

and,

struct task_struct {
state = 2,
stack = 0xffff8811da8aa000,

The "RSP: ffff8811da8aa750" when interrupted, so I assumed it did not overflow
the per cpu stack yet?

And, when the PageFault EXCEPTION hit the IRQ stack,
the "RSP: ffff88095fc63db8" looks still within the IRQ stack?

How do I determine I have had a stack overflow? Is there a command in "crash"
to display stack sizes being used? Thanks!

Thanks!

Comment by Oleg Drokin [ 09/Aug/12 ]

The stacktrace is pretty long, as you can see, and that's not all of it, there was more as we dipped into device driver, and that's what stepped on the struct thread_info enough to just overwrite the flags but not spill over too much into other spaces.
And those drivers might be mighty fat in their stack usage too. (heck, even sync_buffer to schedule bug shaved 0x100 bytes off the stack and that's mere 3 functions in between).
there are a bit less than 0x6a8 bytes left (the interrupt stack is in a different place so does not play a role).

also note the crash happened in task_rq_lock in the interrupt after the warnign was already printed by the so happened interrupt. the task rq lock happened to be garbage already which further reinforces this theory.
We have seen many of cashes like this before.

There is no command in crash to catch a stack overflow that managed not to crash while in overflow and return back inside of the stack but merely corrupted thread info before retracting back, but you can dump struct_thread_info content and see a lot of it is garbage, I guess.

Comment by Jay Lan (Inactive) [ 13/Aug/12 ]

Since the 3 patches Oleg suggested on Aug 9 were quite extensive, would it be OK if we just apply the patches to MDS only? Would the patches require OSS'es also run the same set of patches?

Comment by Oleg Drokin [ 16/Aug/12 ]

You can apply them to MDS only if that's where you experience the problem, but they would be beneficial everywhere, we have seen stack overflows on clients too, usually when reexporting nfs.

BTW, I now have combined patch instead of the three I referenced, if that makes your life easier: http://review.whamcloud.com/3623

Comment by Jay Lan (Inactive) [ 20/Sep/12 ]

We have run with patch of LU-969 (commit 2d77b00).
Peter indicated 2.1.3 release contains a complete solution. In that case,
this ticket can be closed for duplicate of LU-969.

Comment by Peter Jones [ 20/Sep/12 ]

duplicate of LU-969

Generated at Sat Feb 10 06:08:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.