[LU-10709] OSS deadlock in 2.10.3 Created: 24/Feb/18 Updated: 12/Nov/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephane Thiell | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS 7.4 kernel 3.10.0-693.2.2.el7_lustre.pl1.x86_64 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We got another OSS deadlock last night on Oak. Likely to be a regression of 2.10.3. Since the upgrade to 2.10.3, these servers haven't been stable for more than 48h in general. This issue might be related to the OSS situation described in LU-10697. For latest MDS instabilities, sounds like it will be fixed in In this case, OSS deadlock of oak-io2-s1, OSTs from its partner (oak-io2-s2) were already migrated to it (oak-io2-s1) due to a previous deadlock/issue, so 48 OSTs were mounted. Timeframe overview: Attaching the following files:
https://stanford.box.com/s/n8ft8quvr6ubuvd12ukdsoarmrz4uixr We decided to downgrade all servers to 2.10.2 on this system because this has had a significant impact on production lately. Thanks much! Stephane
|
| Comments |
| Comment by Bruno Faccini (Inactive) [ 26/Feb/18 ] |
|
Hello Stephane, |
| Comment by Stephane Thiell [ 26/Feb/18 ] |
|
Hi Bruno! Thanks much for taking a look at this! Let me know if I can help somehow or if you need more information... we do have several scripts periodically checking sysfs entries, mdraid status, etc. on the servers, but these have always been there. We think a change in 2.10.3 has triggered this situation. We have downgraded to 2.10.2 and I'll update this ticket of course if this problem happens again, but so far the rest of the weekend (in 2.10.2) has been stable. |
| Comment by Stephane Thiell [ 26/Feb/18 ] |
|
Bruno, After your comment, I can see indeed that kswapd0 is in qsd_op_adjust() from lu_cache_shrink() and has a lock on lu_sites_guard, blocking any other lu_cache_shrink(). Meanwhile I checked what could be the possible differences between 2.10.2 and 2.10.3, and I noticed something weird in one patch modifying osd-ldiskfs/osd_quota.c:
On master, this landed as a046e879fcadd601c9a19fd906f82ecbd2d4efd5 On 2.10 LTS, this landed as c5d4599cc56729281ce6d4fc59d17694c84333c8 diff --git a/lustre/osd-ldiskfs/osd_quota.c b/lustre/osd-ldiskfs/osd_quota.c index 2345501..2ff5770 100644 --- a/lustre/osd-ldiskfs/osd_quota.c +++ b/lustre/osd-ldiskfs/osd_quota.c @@ -625,8 +625,8 @@ int osd_declare_inode_qid(const struct lu_env *env, qid_t uid, qid_t gid, { struct osd_thread_info *info = osd_oti_get(env); struct lquota_id_info *qi = &info->oti_qi; - int rcu, rcg, rcp; /* user & group & project rc */ - int force = osd_qid_declare_flags & OSD_QID_FORCE; + int rcu, rcg, rcp = 0; /* user & group & project rc */ + bool force = !!(osd_qid_declare_flags & OSD_QID_FORCE); ENTRY; /* let's start with user quota */ @@ -655,24 +655,20 @@ int osd_declare_inode_qid(const struct lu_env *env, qid_t uid, qid_t gid, if (force && (rcg == -EDQUOT || rcg == -EINPROGRESS)) /* as before, ignore EDQUOT & EINPROGRESS for root */ rcg = 0; + +#ifdef HAVE_PROJECT_QUOTA if (rcg && (rcg != -EDQUOT || flags == NULL)) RETURN(rcg); /* and now project quota */ - qi->lqi_id.qid_gid = projid; - qi->lqi_type = PRJQUOTA; + qi->lqi_id.qid_projid = projid; + qi->lqi_type = PRJQUOTA; rcp = osd_declare_qid(env, oh, qi, obj, true, flags); if (force && (rcp == -EDQUOT || rcp == -EINPROGRESS)) /* as before, ignore EDQUOT & EINPROGRESS for root */ rcp = 0; +#endif - if (rcu) - RETURN(rcu); - if (rcg) - RETURN(rcg); - if (rcp) - RETURN(rcp); - - RETURN(0); + RETURN(rcu ? rcu : (rcg ? rcg : rcp)); } The added "#ifdef HAVE_PROJECT_QUOTA" doesn't seem at the right place to me. And if you check the very first versions of this patch, it seemed to be at the right place: https://review.whamcloud.com/#/c/27093/1..40/lustre/osd-ldiskfs/osd_quota.c Do you think this could be related? We're using group quotas, and I didn't enable project quota in our 2.10.3, so maybe the following lines were simply removed at compile time, leading to a wrong behavior? if (rcg && (rcg != -EDQUOT || flags == NULL)) Thanks! |
| Comment by Bruno Faccini (Inactive) [ 27/Feb/18 ] |
|
Well, according to my current crash-dump analysis results (details to come in a next comment) I don't think this could be related. |
| Comment by Bruno Faccini (Inactive) [ 27/Feb/18 ] |
|
Here is the detail of my analysis concerning the deadlock scenario. sas_counters/35683 owns sysfs_mutex and waits for lu_sites_guard after it has triggered memory reclaim : PID: 35683 TASK: ffff8813743c5ee0 CPU: 4 COMMAND: "sas_counters"
#0 [ffff882d764df4d8] __schedule at ffffffff816a8f65
#1 [ffff882d764df540] schedule at ffffffff816a94e9
#2 [ffff882d764df550] rwsem_down_read_failed at ffffffff816aab1d
#3 [ffff882d764df5d8] call_rwsem_down_read_failed at ffffffff81332018
#4 [ffff882d764df628] down_read at ffffffff816a8780
#5 [ffff882d764df640] lu_cache_shrink at ffffffffc0a296fa [obdclass]
#6 [ffff882d764df690] shrink_slab at ffffffff81195359
#7 [ffff882d764df730] do_try_to_free_pages at ffffffff81198572
#8 [ffff882d764df7a8] try_to_free_pages at ffffffff8119878c
#9 [ffff882d764df840] __alloc_pages_slowpath at ffffffff8169fb2b
#10 [ffff882d764df930] __alloc_pages_nodemask at ffffffff8118cd85
#11 [ffff882d764df9e0] alloc_pages_current at ffffffff811d1108
#12 [ffff882d764dfa28] new_slab at ffffffff811dbe15
#13 [ffff882d764dfa60] ___slab_alloc at ffffffff811dd71c
#14 [ffff882d764dfb70] kmem_cache_alloc at ffffffff811df6b3
#15 [ffff882d764dfbb0] alloc_inode at ffffffff8121c851
#16 [ffff882d764dfbd0] iget_locked at ffffffff8121dafb
#17 [ffff882d764dfc10] sysfs_get_inode at ffffffff8127fcf7
#18 [ffff882d764dfc30] sysfs_lookup at ffffffff81281ea1 <<<<<<<<<<<<<<
#19 [ffff882d764dfc60] lookup_real at ffffffff8120b46d
#20 [ffff882d764dfc80] __lookup_hash at ffffffff8120bd42
#21 [ffff882d764dfcb0] lookup_slow at ffffffff816a1342
#22 [ffff882d764dfce8] path_lookupat at ffffffff8120f2eb
#23 [ffff882d764dfd80] filename_lookup at ffffffff8120f34b
#24 [ffff882d764dfdb8] user_path_at_empty at ffffffff81212ec7
#25 [ffff882d764dfe88] user_path_at at ffffffff81212f31
#26 [ffff882d764dfe98] vfs_fstatat at ffffffff81206473
#27 [ffff882d764dfee8] SYSC_newstat at ffffffff812069de
#28 [ffff882d764dff70] sys_newstat at ffffffff81206cbe
#29 [ffff882d764dff80] system_call_fastpath at ffffffff816b5009
RIP: 00007f7729938105 RSP: 00007ffd1494fe30 RFLAGS: 00010206
RAX: 0000000000000004 RBX: ffffffff816b5009 RCX: 0000000000000007
RDX: 00007ffd1494ff30 RSI: 00007ffd1494ff30 RDI: 00007f771d95b520
RBP: 00000000ffffff9c R8: 000000000155c1ab R9: 00007ffd1494fce0
R10: 00007f772aac81d8 R11: 0000000000000246 R12: ffffffff81206cbe
R13: ffff882d764dff78 R14: 00007ffd14950040 R15: 000000001dddb163
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
kswapd0 owns lu_sites_guard and waits for &dqopt->dqio_mutex : PID: 264 TASK: ffff881ffb7c2f70 CPU: 10 COMMAND: "kswapd0" #0 [ffff881ffac6f870] __schedule at ffffffff816a8f65 #1 [ffff881ffac6f8d8] schedule_preempt_disabled at ffffffff816aa409 #2 [ffff881ffac6f8e8] __mutex_lock_slowpath at ffffffff816a8337 #3 [ffff881ffac6f948] mutex_lock at ffffffff816a774f <<<<<<<<<<<<<<< waiting on (&dqopt->dqio_mutex) #4 [ffff881ffac6f960] dquot_acquire at ffffffff81265e8a #5 [ffff881ffac6f998] ldiskfs_acquire_dquot at ffffffffc1004766 [ldiskfs] #6 [ffff881ffac6f9b8] dqget at ffffffff81267814 #7 [ffff881ffac6fa18] dquot_get_dqblk at ffffffff812685f4 #8 [ffff881ffac6fa38] osd_acct_index_lookup at ffffffffc111e8df [osd_ldiskfs] #9 [ffff881ffac6fa70] lquota_disk_read at ffffffffc108d214 [lquota] #10 [ffff881ffac6faa0] qsd_refresh_usage at ffffffffc1094bfa [lquota] #11 [ffff881ffac6fad8] qsd_op_adjust at ffffffffc10a3881 [lquota] #12 [ffff881ffac6fb18] osd_object_delete at ffffffffc10e7c50 [osd_ldiskfs] #13 [ffff881ffac6fb58] lu_object_free at ffffffffc0a27e3d [obdclass] #14 [ffff881ffac6fbb0] lu_site_purge_objects at ffffffffc0a28a9e [obdclass] #15 [ffff881ffac6fc58] lu_cache_shrink at ffffffffc0a298e9 [obdclass] #16 [ffff881ffac6fca8] shrink_slab at ffffffff81195413 #17 [ffff881ffac6fd48] balance_pgdat at ffffffff81199081 #18 [ffff881ffac6fe20] kswapd at ffffffff81199323 #19 [ffff881ffac6fec8] kthread at ffffffff810b098f #20 [ffff881ffac6ff50] ret_from_fork at ffffffff816b4f58 ll_ost_io00_018 owns &dqopt->dqio_mutex and waits for BH_Lock to be cleared in buffer_head at 0xffff8811e3343380 for journal on device /dev/md18 (/mnt/oak/ost/66) : PID: 251676 TASK: ffff881fefed8000 CPU: 16 COMMAND: "ll_ost_io00_018" #0 [ffff881ae323b1c0] __schedule at ffffffff816a8f65 #1 [ffff881ae323b228] schedule at ffffffff816a94e9 #2 [ffff881ae323b238] schedule_timeout at ffffffff816a6ff9 #3 [ffff881ae323b2e8] io_schedule_timeout at ffffffff816a8b6d #4 [ffff881ae323b318] io_schedule at ffffffff816a8c08 #5 [ffff881ae323b328] bit_wait_io at ffffffff816a7621 #6 [ffff881ae323b340] __wait_on_bit_lock at ffffffff816a733f <<<<< waiting for clear of BH_Lock for buffer_head at 0xffff8811e3343380 #7 [ffff881ae323b380] out_of_line_wait_on_bit_lock at ffffffff816a7421 #8 [ffff881ae323b3f8] __lock_buffer at ffffffff81236612 #9 [ffff881ae323b408] do_get_write_access at ffffffffc094ae6f [jbd2] #10 [ffff881ae323b490] jbd2_journal_get_write_access at ffffffffc094af27 [jbd2] #11 [ffff881ae323b4b0] __ldiskfs_journal_get_write_access at ffffffffc0fce2bb [ldiskfs] #12 [ffff881ae323b4e8] ldiskfs_quota_write at ffffffffc1007306 [ldiskfs] #13 [ffff881ae323b540] qtree_write_dquot at ffffffff8126b4f9 #14 [ffff881ae323b590] v2_write_dquot at ffffffff8126a2eb #15 [ffff881ae323b5a0] dquot_commit at ffffffff81267127 #16 [ffff881ae323b5d0] ldiskfs_write_dquot at ffffffffc100481c [ldiskfs] #17 [ffff881ae323b5f0] ldiskfs_mark_dquot_dirty at ffffffffc100487f [ldiskfs] #18 [ffff881ae323b608] __dquot_alloc_space at ffffffff8126a0f2 #19 [ffff881ae323b6b8] ldiskfs_mb_new_blocks at ffffffffc0fd5b41 [ldiskfs] #20 [ffff881ae323b798] ldiskfs_ext_map_blocks at ffffffffc101f186 [ldiskfs] #21 [ffff881ae323b880] ldiskfs_map_blocks at ffffffffc0fee2b3 [ldiskfs] #22 [ffff881ae323b908] osd_ldiskfs_map_inode_pages at ffffffffc110a573 [osd_ldiskfs] #23 [ffff881ae323b9a8] osd_write_commit at ffffffffc110b9b2 [osd_ldiskfs] #24 [ffff881ae323ba28] ofd_commitrw_write at ffffffffc122ac83 [ofd] #25 [ffff881ae323baa8] ofd_commitrw at ffffffffc122e5a9 [ofd] #26 [ffff881ae323bb30] obd_commitrw at ffffffffc0ce7eb7 [ptlrpc] #27 [ffff881ae323bb98] tgt_brw_write at ffffffffc0cbad91 [ptlrpc] #28 [ffff881ae323bd00] tgt_request_handle at ffffffffc0cb6da5 [ptlrpc] #29 [ffff881ae323bd48] ptlrpc_server_handle_request at ffffffffc0c5fb16 [ptlrpc] #30 [ffff881ae323bde8] ptlrpc_main at ffffffffc0c63252 [ptlrpc] #31 [ffff881ae323bec8] kthread at ffffffff810b098f #32 [ffff881ae323bf50] ret_from_fork at ffffffff816b4f58 MD driver service threads associated with “md18” are mainly stuck waiting on sysfs_mutex to complete RAID device check/recovery operations : PID: 131341 TASK: ffff881fb0342f70 CPU: 37 COMMAND: "md18_raid6" #0 [ffff881eaa6e7bb0] __schedule at ffffffff816a8f65 #1 [ffff881eaa6e7c18] schedule_preempt_disabled at ffffffff816aa409 #2 [ffff881eaa6e7c28] __mutex_lock_slowpath at ffffffff816a8337 #3 [ffff881eaa6e7c80] mutex_lock at ffffffff816a774f <<<<<<<<<<<<<<<<<< #4 [ffff881eaa6e7c98] sysfs_notify at ffffffff8127ffc4 #5 [ffff881eaa6e7cc0] md_update_sb at ffffffff815150c6 #6 [ffff881eaa6e7d40] md_check_recovery at ffffffff81515c5a #7 [ffff881eaa6e7d60] raid5d at ffffffffc08570f2 [raid456] #8 [ffff881eaa6e7e50] md_thread at ffffffff8150e2f5 #9 [ffff881eaa6e7ec8] kthread at ffffffff810b098f #10 [ffff881eaa6e7f50] ret_from_fork at ffffffff816b4f58 PID: 132014 TASK: ffff883ea0f61fa0 CPU: 3 COMMAND: "kmmpd-md18" #0 [ffff883ed979bbb0] __schedule at ffffffff816a8f65 #1 [ffff883ed979bc18] schedule at ffffffff816a94e9 #2 [ffff883ed979bc28] schedule_timeout at ffffffff816a6ff9 #3 [ffff883ed979bcd0] io_schedule_timeout at ffffffff816a8b6d #4 [ffff883ed979bd00] io_schedule at ffffffff816a8c08 #5 [ffff883ed979bd10] bit_wait_io at ffffffff816a7621 #6 [ffff883ed979bd28] __wait_on_bit_lock at ffffffff816a733f #7 [ffff883ed979bd68] out_of_line_wait_on_bit_lock at ffffffff816a7421 #8 [ffff883ed979bde0] __lock_buffer at ffffffff81236612 #9 [ffff883ed979bdf0] write_mmp_block at ffffffffc0ff6c82 [ldiskfs] #10 [ffff883ed979be38] kmmpd at ffffffffc0ff6ea8 [ldiskfs] #11 [ffff883ed979bec8] kthread at ffffffff810b098f #12 [ffff883ed979bf50] ret_from_fork at ffffffff816b4f58 PID: 132015 TASK: ffff883ea0f62f70 CPU: 17 COMMAND: "jbd2/md18-8" #0 [ffff883e8d62fc30] __schedule at ffffffff816a8f65 #1 [ffff883e8d62fc98] schedule at ffffffff816a94e9 #2 [ffff883e8d62fca8] jbd2_journal_commit_transaction at ffffffffc094c27c [jbd2] #3 [ffff883e8d62fe48] kjournald2 at ffffffffc0952a79 [jbd2] #4 [ffff883e8d62fec8] kthread at ffffffff810b098f #5 [ffff883e8d62ff50] ret_from_fork at ffffffff816b4f58 PID: 281573 TASK: ffff88004e840000 CPU: 29 COMMAND: "md18_resync" #0 [ffff8808e68e7ac0] __schedule at ffffffff816a8f65 #1 [ffff8808e68e7b28] schedule_preempt_disabled at ffffffff816aa409 #2 [ffff8808e68e7b38] __mutex_lock_slowpath at ffffffff816a8337 #3 [ffff8808e68e7b98] mutex_lock at ffffffff816a774f <<<<<<<<<<<<<< #4 [ffff8808e68e7bb0] sysfs_notify at ffffffff8127ffc4 #5 [ffff8808e68e7bd8] bitmap_cond_end_sync at ffffffff8151bd60 #6 [ffff8808e68e7c38] raid5_sync_request at ffffffffc0857f96 [raid456] #7 [ffff8808e68e7ca8] md_do_sync at ffffffff81511f24 #8 [ffff8808e68e7e50] md_thread at ffffffff8150e2f5 #9 [ffff8808e68e7ec8] kthread at ffffffff810b098f #10 [ffff8808e68e7f50] ret_from_fork at ffffffff816b4f58 and there a lot of threads hung in MD driver awaiting for md18 service threads to complete their current work in order to start write operations with very similar stacks than the 2 following : PID: 344894 TASK: ffff881ff776cf10 CPU: 6 COMMAND: "kworker/u769:1" #0 [ffff8832e51af658] __schedule at ffffffff816a8f65 #1 [ffff8832e51af6c0] schedule at ffffffff816a94e9 #2 [ffff8832e51af6d0] md_write_start at ffffffff81510fa5 #3 [ffff8832e51af728] raid5_make_request at ffffffffc08535d0 [raid456] #4 [ffff8832e51af810] md_make_request at ffffffff8150a744 #5 [ffff8832e51af888] generic_make_request at ffffffff812f9085 #6 [ffff8832e51af8e0] submit_bio at ffffffff812f9300 #7 [ffff8832e51af938] _submit_bh at ffffffff81237867 #8 [ffff8832e51af968] __block_write_full_page at ffffffff81237ae2 #9 [ffff8832e51af9d0] block_write_full_page at ffffffff81237eae #10 [ffff8832e51afa08] blkdev_writepage at ffffffff8123d768 #11 [ffff8832e51afa18] __writepage at ffffffff8118d1d3 #12 [ffff8832e51afa30] write_cache_pages at ffffffff8118dc91 #13 [ffff8832e51afb48] generic_writepages at ffffffff8118df5d #14 [ffff8832e51afba8] blkdev_writepages at ffffffff8123d725 #15 [ffff8832e51afbb8] do_writepages at ffffffff8118effe #16 [ffff8832e51afbc8] __writeback_single_inode at ffffffff8122d980 #17 [ffff8832e51afc08] writeback_sb_inodes at ffffffff8122e5c4 #18 [ffff8832e51afcb0] __writeback_inodes_wb at ffffffff8122e92f #19 [ffff8832e51afcf8] wb_writeback at ffffffff8122f163 #20 [ffff8832e51afd70] bdi_writeback_workfn at ffffffff8122f60b #21 [ffff8832e51afe20] process_one_work at ffffffff810a881a #22 [ffff8832e51afe68] worker_thread at ffffffff810a94e6 #23 [ffff8832e51afec8] kthread at ffffffff810b098f #24 [ffff8832e51aff50] ret_from_fork at ffffffff816b4f58 …………………….. PID: 360837 TASK: ffff881ff99abf40 CPU: 23 COMMAND: "ll_ost_io01_047" #0 [ffff8819a7eff568] __schedule at ffffffff816a8f65 #1 [ffff8819a7eff5d0] schedule at ffffffff816a94e9 #2 [ffff8819a7eff5e0] md_write_start at ffffffff81510fa5 #3 [ffff8819a7eff638] raid5_make_request at ffffffffc08535d0 [raid456] #4 [ffff8819a7eff720] md_make_request at ffffffff8150a744 #5 [ffff8819a7eff798] generic_make_request at ffffffff812f9085 #6 [ffff8819a7eff7f0] submit_bio at ffffffff812f9300 #7 [ffff8819a7eff848] osd_submit_bio at ffffffffc110888c [osd_ldiskfs] #8 [ffff8819a7eff858] osd_do_bio at ffffffffc110ad60 [osd_ldiskfs] #9 [ffff8819a7eff9a8] osd_write_commit at ffffffffc110b9fc [osd_ldiskfs] #10 [ffff8819a7effa28] ofd_commitrw_write at ffffffffc122ac83 [ofd] #11 [ffff8819a7effaa8] ofd_commitrw at ffffffffc122e5a9 [ofd] #12 [ffff8819a7effb30] obd_commitrw at ffffffffc0ce7eb7 [ptlrpc] #13 [ffff8819a7effb98] tgt_brw_write at ffffffffc0cbad91 [ptlrpc] #14 [ffff8819a7effd00] tgt_request_handle at ffffffffc0cb6da5 [ptlrpc] #15 [ffff8819a7effd48] ptlrpc_server_handle_request at ffffffffc0c5fb16 [ptlrpc] #16 [ffff8819a7effde8] ptlrpc_main at ffffffffc0c63252 [ptlrpc] #17 [ffff8819a7effec8] kthread at ffffffff810b098f #18 [ffff8819a7efff50] ret_from_fork at ffffffff816b4f58 A lot of these threads are hung since about 11000s, the impacted MD/Raid device/OST is "md18" (/mnt/oak/ost/66), and has it appears that the check/recovery operations have been started thru some admin commands or a programmed procedure (instead of due to some error handling) for all devices/OSTs, "md18" is the only that never reported “md: md18: data-check done“ msg like all others did. At the moment, I have still not been able to prove that the BH_Lock'ed buffer_head causing thread ll_ost_io00_018/251676 to be hung, is also caused by MD/Raid operations being stuck waiting for last superblock update step upon end of check. But we seems to end-up with a very complex and unlikely scenario that requires concurrent MD/Raid OSTs/devices (with running check/recovery), Lustre-quotas on, memory-reclaim to be triggered with lu_cache_shrink shrinker run requiring quota update upon object purge/delete, and a User process to read stuff from /sysfs. |
| Comment by Stephane Thiell [ 28/Feb/18 ] |
|
Hi Bruno, Wow. Great analysis! That looks like a complex issue indeed, doh! Thanks!
I did rebuild Lustre for the same Kernel. We're running 3.10.0-693.2.2.el7_lustre.pl1.x86_64 since the beginning of October 2017. This kernel is based on 3.10.0-693.2.2.el7.x86_64 with lustre patches + one mdraid patch (see -- This Kernel itself with Lustre 2.10.x has worked great since October. We had several Lustre issues with 2.10.0 and 2.10.1 (eg. nodemap), but most of them have been fixed in 2.10.2.
Exactly, we're using mdadm's raid-check cron script, which starts a batch of md checks every Sunday, and then usually the second batch starts on next Wednesday, which was the case for md18. So you are correct, it's started by a programmed procedure, but this is normal and not something new on this system.
Well doesn't look like so unlikely to us! oak-io1-s1 2018-02-19-13:54:27 oak-io2-s2 2018-02-21-11:10:27 And it occurred on different I/O cells (oak-io1 and oak-io2, different hardware), so probably not related to a single md issue. One thing I changed on this system is vm.zone_reclaim_mode from 1 to 0 on all OSS on February 8 (we were already in 2.10.3). Don't know if that can lead to this kind of scenario, and if it does, I should see the same issue on 2.10.2 soon... Thanks again! Stephane |
| Comment by Bruno Faccini (Inactive) [ 01/Mar/18 ] |
|
Having a look in Kernel source code to understand the impact of setting vm.zone_reclaim_mode=0, it looks like that, regarding to the deadlock scenario I have described before and the different pieces of code being involved, this will allow to run the memory allocation slow-path, and thus to run the shrinkers, much more frequently. Instead, with vm.zone_reclaim_mode=1, should allow to get new pages from the current/preferred node+zone (where it seems there plenty, at least at the time of the crash-dump). What is the reason you have decided to change this parameter? On the other hand, I have analyzed the 2 others crash-dumps you have provided : _ vmcore-oak-io2-s2-2018-02-21-11-10-27.gz, shows about the same dead-lock situation, with one less thread being involved, as it has been able to itself grab 2 of the locks/mutexes, but the overall scenario is the same with thread sas_counters holding the sysfs_mutex and then to block for memory_reclaim (either on lu_sites_guard trying to execute lu_cache_shrink(), or on dqio_mutex during lu_cache_shrink() execution), and the other thread owning dqio_mutex being stuck waiting for a buffer_head to be unlock'ed after I/O completion on an OST/MD-Raid device where this will never happen due to associated service threads waiting on sysfs_mutex. So could you try to run with vm.zone_reclaim_mode=1 and/or without sas_counters ? |
| Comment by Stephane Thiell [ 01/Mar/18 ] |
|
Hi Bruno, Thanks, this is super helpful!!! > What is the reason you have decided to change this parameter? Changing zone_reclaim_mode to 0 wasn't specifically done for an issue on the OSS, but more for the MDS, as recommended by Andreas in
My current plan is to stay a bit with 2.10.2 and zone_reclaim_mode=0. It has been working smoothly since last Saturday. I would like to see the same issue happening in 2.10.2, just to confirm it is not related to a change in 2.10.3. If/when the issue occurs, I'll upgrade to 2.10.3 and switch zone_reclaim_mode back to 1 on all OSS. sas_counters is really useful to us, as it monitors the large sas fabric for any errors. It "just" periodically checks sysfs entries (I wrote it Again, thanks much for your time! |
| Comment by Stephane Thiell [ 05/Mar/18 ] |
|
Hi Bruno, We found an OSS unresponsive this morning (running 2.10.2 and zone_reclaim_mode=0). I believe the problem is the same, although I am not able to open the crash dump (still, it should be complete): crash: invalid structure size: tnt
FILE: kernel.c LINE: 10220 FUNCTION: show_kernel_taints()
[/usr/bin/crash] error trace: 4e45f5 => 4d6c04 => 4d33cc => 52ab62
52ab62: SIZE_verify+130
4d33cc: (undetermined)
4d6c04: (undetermined)
4e45f5: display_sys_stats+2213
I'm attaching the vmcore-dmesg.txt as oak-io1-s2-2018-03-05-09-33-02_vmcore-dmesg.txt, which shows some lustre stack traces that look (somehow) similar to the ones seen in 2.10.3. I'll first switch zone_reclaim_mode to 1 on all OSS, and when possible I'll upgrade Lustre to 2.10.3 again.
|
| Comment by Stephane Thiell [ 05/Mar/18 ] |
|
Hey Bruno, In your opinion, do you think it's more a md issue or a Lustre issue? I see two possible arguments here:
Last question, do you think this deadlock could be related to the recent procfs to sysfs migration in Lustre? I have no memory of such deadlock in 2.9 for example.
|
| Comment by Stephane Thiell [ 06/Mar/18 ] |
|
Hi Bruno, Just got another occurrence on the failover partner that took over the OSTs after the first issue. This time the crash dump is valid, I was able to open it, so in case you want to double check:
Note: oak-io1-s1 was running Lustre 2.10.2 and vm.zone_reclaim_mode=1 (but changed to 1 on the fly, not 1 since boot time).
|
| Comment by Bruno Faccini (Inactive) [ 06/Mar/18 ] |
|
Hello Stephane, Yes, the stacks in oak-io1-s2-2018-03-05-09-33-02_vmcore-dmesg.txt clearly indicate that this is the same problem/dead-lock. I am still working/thinking to find the main reason of the deadlock, but at the moment I guess that it could be fixed if the memory reclaim, that occurs when sas_counters owns sysfs_mutex (due to gathering its stats from there I presume), could be done by avoiding file-system operations (!GFP_FS) and thus can't block anymore to release mutex and let MD operations able to continue. I will have a look on this new crash-dump. |
| Comment by Bruno Faccini (Inactive) [ 07/Mar/18 ] |
|
Stephane, |
| Comment by Stephane Thiell [ 07/Mar/18 ] |
|
Hey Bruno, Yeah, the last one is about 29GB and that is too big for Box (which has a file size limit of 15GB). Hmm, I could split it in two, which should work with Box (but won't be until tomorrow). |
| Comment by Stephane Thiell [ 08/Mar/18 ] |
|
Hello Bruno, Please find the last (and big) crash dump below, in two parts: MD5 (vmcore_oak-io1-s1-2018-03-05-14_06_57.1) = 60c66a81c9acc1675a41722d6016efcc https://stanford.box.com/s/nsdwy6bind6l48spesjg76uv2tb9jmrp MD5 (vmcore_oak-io1-s1-2018-03-05-14_06_57.2) = a17323c3afbdf8fc970d430c35ac864c https://stanford.box.com/s/3idy1mv956cf0l1a9dj6c9otulmtmnd4 Simply use cat to aggregate the parts: cat vmcore_oak-io1-s1-2018-03-05-14_06_57.1 vmcore_oak-io1-s1-2018-03-05-14_06_57.2 > vmcore_oak-io1-s1-2018-03-05-14_06_57 you should get: MD5 (vmcore_oak-io1-s1-2018-03-05-14_06_57) = 14752f1c982d5011c0375fcca8c3ebbe Thanks!! |
| Comment by Bruno Faccini (Inactive) [ 12/Mar/18 ] |
|
Got it! Thanks and sorry for the extra work. PID: 39539 TASK: ffff880066253f40 CPU: 37 COMMAND: "sas_counters"
#0 [ffff88102f92b1f0] __schedule at ffffffff816a8f65
#1 [ffff88102f92b258] schedule_preempt_disabled at ffffffff816aa409
#2 [ffff88102f92b268] __mutex_lock_slowpath at ffffffff816a8337
#3 [ffff88102f92b2c8] mutex_lock at ffffffff816a774f
#4 [ffff88102f92b2e0] dquot_acquire at ffffffff81265e8a
#5 [ffff88102f92b318] ldiskfs_acquire_dquot at ffffffffc0f4f766 [ldiskfs]
#6 [ffff88102f92b338] dqget at ffffffff81267814
#7 [ffff88102f92b398] dquot_get_dqblk at ffffffff812685f4
#8 [ffff88102f92b3b8] osd_acct_index_lookup at ffffffffc10968bf [osd_ldiskfs]
#9 [ffff88102f92b3f0] lquota_disk_read at ffffffffc0fef214 [lquota]
#10 [ffff88102f92b420] qsd_refresh_usage at ffffffffc0ff6bfa [lquota]
#11 [ffff88102f92b458] qsd_op_adjust at ffffffffc1005881 [lquota]
#12 [ffff88102f92b498] osd_object_delete at ffffffffc105fc50 [osd_ldiskfs]
#13 [ffff88102f92b4d8] lu_object_free at ffffffffc099ee9d [obdclass]
#14 [ffff88102f92b530] lu_site_purge_objects at ffffffffc099fafe [obdclass]
#15 [ffff88102f92b5d8] lu_cache_shrink at ffffffffc09a0949 [obdclass]
#16 [ffff88102f92b628] shrink_slab at ffffffff81195413
#17 [ffff88102f92b6c8] zone_reclaim at ffffffff81198091
#18 [ffff88102f92b770] get_page_from_freelist at ffffffff8118c264
#19 [ffff88102f92b880] __alloc_pages_nodemask at ffffffff8118caf6
#20 [ffff88102f92b930] alloc_pages_current at ffffffff811d1108
#21 [ffff88102f92b978] new_slab at ffffffff811dbe15
#22 [ffff88102f92b9b0] ___slab_alloc at ffffffff811dd71c
#23 [ffff88102f92ba80] __slab_alloc at ffffffff816a10ee
#24 [ffff88102f92bac0] kmem_cache_alloc at ffffffff811df6b3
#25 [ffff88102f92bb00] alloc_inode at ffffffff8121c851
#26 [ffff88102f92bb20] iget_locked at ffffffff8121dafb
#27 [ffff88102f92bb60] sysfs_get_inode at ffffffff8127fcf7
#28 [ffff88102f92bb80] sysfs_lookup at ffffffff81281ea1
#29 [ffff88102f92bbb0] lookup_real at ffffffff8120b46d
#30 [ffff88102f92bbd0] __lookup_hash at ffffffff8120bd42
#31 [ffff88102f92bc00] lookup_slow at ffffffff816a1342
#32 [ffff88102f92bc38] link_path_walk at ffffffff8120e9df
#33 [ffff88102f92bce8] path_lookupat at ffffffff8120ebdb
#34 [ffff88102f92bd80] filename_lookup at ffffffff8120f34b
#35 [ffff88102f92bdb8] user_path_at_empty at ffffffff81212ec7
#36 [ffff88102f92be88] user_path_at at ffffffff81212f31
#37 [ffff88102f92be98] vfs_fstatat at ffffffff81206473
#38 [ffff88102f92bee8] SYSC_newlstat at ffffffff81206a41
#39 [ffff88102f92bf70] sys_newlstat at ffffffff81206cce
#40 [ffff88102f92bf80] system_call_fastpath at ffffffff816b5009
RIP: 00007f3374dc21a5 RSP: 00007ffe70743110 RFLAGS: 00010202
RAX: 0000000000000006 RBX: ffffffff816b5009 RCX: 00007ffe70743120
RDX: 00007ffe707425b0 RSI: 00007ffe707425b0 RDI: 00007f3366adea10
RBP: 00000000ffffff9c R8: 00007f3366adea10 R9: 312d74726f702f65
R10: 617078652f303a32 R11: 0000000000000246 R12: ffffffff81206cce
R13: ffff88102f92bf78 R14: 00007ffe707426b0 R15: 000000008217f4f9
ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b
So, I think that at the moment if you want to prevent this problem to happen again, you will need to remove one of the involved actor/feature, i.e. either sas_counters, or MD devices regular check/sync, or Lustre quotas. And on the other hand, I wonder if the final solution could not be to have a new sysfs_alloc_inode() method be available in "struct super_operations sysfs_ops", where the kmem_cache_alloc() will occur with GFP_NOFS, this to avoid fair shrinkers testing it (like in lu_cache_shrink() !!) to do their job. |
| Comment by Stephane Thiell [ 12/Mar/18 ] |
|
Thanks Bruno! Does that mean that the MD layer would then have to use this new sysfs_alloc_inode() method? Or everyone? But yes, we'd be very interested to test such a patch!
|
| Comment by Bruno Faccini (Inactive) [ 14/Mar/18 ] |
|
> Does that mean that the MD layer would then have to use this new sysfs_alloc_inode() method? Or everyone? Everyone. BTW, in the deadlock scenario, this is sas_counters user-land thread that triggers the memory reclaim during sysfs inode allocation. > But yes, we'd be very interested to test such a patch! Attached sysfs_alloc_inode_GFP_NOFS.patch file. sysfs_alloc_inode_GFP_NOFS.patch
|
| Comment by Stephane Thiell [ 14/Mar/18 ] |
|
OK. Excellent, thank you!! I just built a new kernel with this patch. No kernel update, just the same as before plus this patch added (new version is kernel-3.10.0-693.2.2.el7_lustre.pl2.x86_64). I'll perform the kernel change on all Oak servers tomorrow early morning (Pacific time), when fewer users are connected to the system and report back. |
| Comment by Stephane Thiell [ 21/Mar/18 ] |
|
Hey Bruno, Quick status update.. patch was deployed last Sunday morning (3/18) only due to production constraints before. sas_counters is running quite frequently again and I started the mdraid checks manually (because usually they start on Saturday night). So far, no issue to report, looking good, but we need more time to be sure (at least a week). Will keep you posted!
|
| Comment by Bruno Faccini (Inactive) [ 21/Mar/18 ] |
|
Stephane, thanks for the update, and let's cross the fingers now... |
| Comment by Stephane Thiell [ 29/Mar/18 ] |
|
Hi Bruno, The system has been very stable lately with the patch. I think we can consider the issue fixed by next week (just to be sure Few questions for you when you have time (no rush):
Thanks!! Stephane |
| Comment by Bruno Faccini (Inactive) [ 13/Apr/18 ] |
|
Stephane, |
| Comment by Stephane Thiell [ 13/Apr/18 ] |
|
Bruno, Great, I'll follow that with much attention. Thank you again, your patch has really saved us. Stephane |
| Comment by Bruno Faccini (Inactive) [ 02/May/18 ] |
|
Hello Stephane, Last, recent 4.x Kernels code seems to indicate that problem is still there, but now in kernfs instead of sysfs, as the latter uses the former's methods internally but where the same potential dead-lock seems to exist around kernfs_mutex . |
| Comment by Stephane Thiell [ 04/May/18 ] |
|
Hey Bruno, Great, thanks! It would definitely be nice to get some feedback from the kernel developers and/or have this patch integrated upstream. Our Oak system has been rock solid since this patch. Right now ~45 days uptime without any server crash, and the filesystem is still very busy, and mdraid checks are running almost all the time with sas_counters launched every minute on all OSS Note: I can't find your email to linux-raid@, maybe it didn't go through? Thanks! Stephane |
| Comment by Stephane Thiell [ 04/Sep/19 ] |
|
We upgraded our Kernel on Oak from CentOS 7.4-patched-by-Bruno to CentOS 7.6 (3.10.0-957.27.2.el7.x86_64 + Lustre patches = 3.10.0-957.27.2.el7_lustre.pl1.x86_64). After one or two days, a similar deadlock occurred. It looks like the kernfs interface still has the same issue.
I know this is a kernel bug but I wanted to update this ticket for the sake of completeness, and the deadlock is somehow triggered by Lustre through lu_cache_shrink. User tool accessing the kernfs interface and triggering lu_cache_shrink: PID: 254093 TASK: ffff9f16acadd140 CPU: 30 COMMAND: "sas_counters"
#0 [ffff9f1de48af3d8] __schedule at ffffffffa096aa72
#1 [ffff9f1de48af460] schedule at ffffffffa096af19
#2 [ffff9f1de48af470] rwsem_down_read_failed at ffffffffa096c54d
#3 [ffff9f1de48af4f8] call_rwsem_down_read_failed at ffffffffa0588bf8
#4 [ffff9f1de48af548] down_read at ffffffffa096a200
#5 [ffff9f1de48af560] lu_cache_shrink at ffffffffc0e5ee7a [obdclass]
#6 [ffff9f1de48af5b0] shrink_slab at ffffffffa03cb08e
#7 [ffff9f1de48af650] do_try_to_free_pages at ffffffffa03ce412
#8 [ffff9f1de48af6c8] try_to_free_pages at ffffffffa03ce62c
#9 [ffff9f1de48af760] __alloc_pages_slowpath at ffffffffa09604ef
#10 [ffff9f1de48af850] __alloc_pages_nodemask at ffffffffa03c2524
#11 [ffff9f1de48af900] alloc_pages_current at ffffffffa040f438
#12 [ffff9f1de48af948] new_slab at ffffffffa041a4c5
#13 [ffff9f1de48af980] ___slab_alloc at ffffffffa041bf2c
#14 [ffff9f1de48afa58] __slab_alloc at ffffffffa096190c
#15 [ffff9f1de48afa98] kmem_cache_alloc at ffffffffa041d7cb
#16 [ffff9f1de48afad8] alloc_inode at ffffffffa045eee1
#17 [ffff9f1de48afaf8] iget_locked at ffffffffa046025b
#18 [ffff9f1de48afb38] kernfs_get_inode at ffffffffa04c9c17
#19 [ffff9f1de48afb58] kernfs_iop_lookup at ffffffffa04ca93b
#20 [ffff9f1de48afb80] lookup_real at ffffffffa044d573
#21 [ffff9f1de48afba0] __lookup_hash at ffffffffa044df92
#22 [ffff9f1de48afbd0] lookup_slow at ffffffffa0961de1
#23 [ffff9f1de48afc08] link_path_walk at ffffffffa045289f
#24 [ffff9f1de48afcb8] path_lookupat at ffffffffa0452aaa
#25 [ffff9f1de48afd50] filename_lookup at ffffffffa045330b
#26 [ffff9f1de48afd88] user_path_at_empty at ffffffffa04552f7
#27 [ffff9f1de48afe58] user_path_at at ffffffffa0455361
#28 [ffff9f1de48afe68] vfs_fstatat at ffffffffa0448223
#29 [ffff9f1de48afeb8] SYSC_newlstat at ffffffffa0448641
#30 [ffff9f1de48aff40] sys_newlstat at ffffffffa0448aae
#31 [ffff9f1de48aff50] system_call_fastpath at ffffffffa0977ddb
RIP: 00007fdc07510ab5 RSP: 00007ffe9a9e7b30 RFLAGS: 00010202
RAX: 0000000000000006 RBX: 00000000ffffff9c RCX: 00007ffe9a9e7b30
RDX: 00007ffe9a9e6b50 RSI: 00007ffe9a9e6b50 RDI: 00007fdbf86babd0
RBP: 00000000012d2ca0 R8: 0000000000000001 R9: 0000000000000001
R10: 00007fdc0834be97 R11: 0000000000000246 R12: 00007ffe9a9e6b50
R13: 0000000000000001 R14: 00007fdc087ade08 R15: 00007fdbfba9c1d0
ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b
mdraid task blocked on kernfs too: PID: 283550 TASK: ffff9f35c54a0000 CPU: 19 COMMAND: "md0_raid6" #0 [ffff9f35c5423b68] __schedule at ffffffffa096aa72 #1 [ffff9f35c5423bf8] schedule_preempt_disabled at ffffffffa096be39 #2 [ffff9f35c5423c08] __mutex_lock_slowpath at ffffffffa0969db7 #3 [ffff9f35c5423c60] mutex_lock at ffffffffa096919f #4 [ffff9f35c5423c78] kernfs_find_and_get_ns at ffffffffa04ca883 #5 [ffff9f35c5423ca0] sysfs_notify at ffffffffa04cd00b #6 [ffff9f35c5423cc8] md_update_sb at ffffffffa0795a89 #7 [ffff9f35c5423d48] md_check_recovery at ffffffffa079681a #8 [ffff9f35c5423d68] raid5d at ffffffffc0d9a466 [raid456] #9 [ffff9f35c5423e50] md_thread at ffffffffa078dedd #10 [ffff9f35c5423ec8] kthread at ffffffffa02c2e81 The original kernel report (https://bugzilla.kernel.org/show_bug.cgi?id=199589) has been dismissed and I'm not sure this was reported to Red Hat actually. |
| Comment by Stephane Thiell [ 12/Nov/19 ] |
|
This deadlock is still not fixed upstream, and we hit the same issue with CentOS 7.6. We have opened a new case at Red Hat (case 02514526) but my expectation is low. Meanwhile, I've ported Bruno's kernel patch from sysfs to the newer kernfs interface (it's very similar - unless I missed something). We have it on two updated OSS on Oak right now and it seems to work fine. As a reminder, the problem is complex but solved by allocating sysfs/kernfs inodes using GFP_NOFS so that they don't trigger lu_cache_shrink() while holding a sysfs/kernfs inode lock. I'm not sure where to draw the line, ie. if it's a kernel or a lustre bug at this point. |
| Comment by Stephane Thiell [ 12/Nov/19 ] |
|
I've attached foreach_bt_oak-io4-s1.log In my understanding, this task is part of the deadlock: PID: 308960 TASK: ffff944ae769a080 CPU: 17 COMMAND: "sas_discover"
#0 [ffff946c547f32d8] __schedule at ffffffffb2d6aa72
#1 [ffff946c547f3360] schedule at ffffffffb2d6af19
#2 [ffff946c547f3370] wait_transaction_locked at ffffffffc0739085 [jbd2]
#3 [ffff946c547f33c8] add_transaction_credits at ffffffffc0739368 [jbd2]
#4 [ffff946c547f3428] start_this_handle at ffffffffc07395e1 [jbd2]
#5 [ffff946c547f34c0] jbd2__journal_start at ffffffffc0739a93 [jbd2]
#6 [ffff946c547f3508] __ldiskfs_journal_start_sb at ffffffffc149c189 [ldiskfs]
#7 [ffff946c547f3548] ldiskfs_release_dquot at ffffffffc149379c [ldiskfs]
#8 [ffff946c547f3568] dqput at ffffffffb28afc1d
#9 [ffff946c547f3590] __dquot_drop at ffffffffb28b12d5
#10 [ffff946c547f35c8] dquot_drop at ffffffffb28b1345
#11 [ffff946c547f35d8] ldiskfs_clear_inode at ffffffffc14980e2 [ldiskfs]
#12 [ffff946c547f35f0] ldiskfs_evict_inode at ffffffffc14bb2bf [ldiskfs]
#13 [ffff946c547f3630] evict at ffffffffb285ff34
#14 [ffff946c547f3658] dispose_list at ffffffffb286003e
#15 [ffff946c547f3680] prune_icache_sb at ffffffffb286104c
#16 [ffff946c547f36e8] prune_super at ffffffffb28453b3
#17 [ffff946c547f3720] shrink_slab at ffffffffb27cb155
#18 [ffff946c547f37c0] zone_reclaim at ffffffffb27cdf31
#19 [ffff946c547f3868] get_page_from_freelist at ffffffffb27c1f2b
#20 [ffff946c547f3980] __alloc_pages_nodemask at ffffffffb27c2296
#21 [ffff946c547f3a30] alloc_pages_current at ffffffffb280f438
#22 [ffff946c547f3a78] new_slab at ffffffffb281a4c5
#23 [ffff946c547f3ab0] ___slab_alloc at ffffffffb281bf2c
#24 [ffff946c547f3b88] __slab_alloc at ffffffffb2d6190c
#25 [ffff946c547f3bc8] kmem_cache_alloc at ffffffffb281d7cb
#26 [ffff946c547f3c08] alloc_inode at ffffffffb285eee1
#27 [ffff946c547f3c28] iget_locked at ffffffffb286025b
#28 [ffff946c547f3c68] kernfs_get_inode at ffffffffb28c9c17
#29 [ffff946c547f3c88] kernfs_iop_lookup at ffffffffb28ca93b
#30 [ffff946c547f3cb0] lookup_real at ffffffffb284d573
#31 [ffff946c547f3cd0] do_last at ffffffffb285153a
#32 [ffff946c547f3d70] path_openat at ffffffffb2853a27
#33 [ffff946c547f3e08] do_filp_open at ffffffffb285542d
#34 [ffff946c547f3ee0] do_sys_open at ffffffffb2841587
#35 [ffff946c547f3f40] sys_openat at ffffffffb28416c4
#36 [ffff946c547f3f50] system_call_fastpath at ffffffffb2d77ddb
RIP: 00007f93ec8f8f70 RSP: 00007ffdf42ef080 RFLAGS: 00010202
RAX: 0000000000000101 RBX: 00007f93e1287630 RCX: 00007f93dfde5000
RDX: 0000000000090800 RSI: 00007f93dfeb9b18 RDI: ffffffffffffff9c
RBP: 00007f93dfeb9b18 R8: 00007f93e1287640 R9: 00007f93ed7334c2
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000076ac90
R13: 0000000000000001 R14: 00007f93edb95e08 R15: 00007f93e1040390
ORIG_RAX: 0000000000000101 CS: 0033 SS: 002b
As Bruno explained before, ( retranscribed in my less good understanding here), this involves an external tool accessing sysfs/kernfs like the task above (although I've seen systemd accessing it too), under memory pressure -> zone_reclaim -> shrink_slab -> flush I/O due to quotas PID: 318783 TASK: ffff9460e2cd4100 CPU: 15 COMMAND: "md24_raid6" #0 [ffff946ca83ebb68] __schedule at ffffffffb2d6aa72 #1 [ffff946ca83ebbf8] schedule_preempt_disabled at ffffffffb2d6be39 #2 [ffff946ca83ebc08] __mutex_lock_slowpath at ffffffffb2d69db7 #3 [ffff946ca83ebc60] mutex_lock at ffffffffb2d6919f #4 [ffff946ca83ebc78] kernfs_find_and_get_ns at ffffffffb28ca883 #5 [ffff946ca83ebca0] sysfs_notify at ffffffffb28cd00b #6 [ffff946ca83ebcc8] md_update_sb at ffffffffb2b95a89 #7 [ffff946ca83ebd48] md_check_recovery at ffffffffb2b9681a #8 [ffff946ca83ebd68] raid5d at ffffffffc1511466 [raid456] #9 [ffff946ca83ebe50] md_thread at ffffffffb2b8dedd #10 [ffff946ca83ebec8] kthread at ffffffffb26c2e81 #11 [ffff946ca83ebf50] ret_from_fork_nospec_begin at ffffffffb2d77c1d
|