[LU-11302] schedule while atomic with lfsck code Created: 30/Aug/18  Updated: 26/Nov/18  Resolved: 26/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: easy
Environment:

RHEL 7.4 debug kernel


Issue Links:
Duplicate
duplicates LU-11620 BUG: sleeping function called from in... Resolved
Related
is related to LU-3336 LFSCK II: MDT-OST OST orphan handling Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 2008.407262] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[ 2008.410052] in_atomic(): 1, irqs_disabled(): 0, pid: 11038, name: mdt_out00_002
[ 2008.412781] INFO: lockdep is turned off.
[ 2008.414459] CPU: 0 PID: 11038 Comm: mdt_out00_002 Tainted: G        W  OE  ------------   3.10.0-neo-7.4+ #0
[ 2008.417577] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 2008.421118] Call Trace:
[ 2008.422546]  [<ffffffff816f4591>] dump_stack+0x19/0x1b
[ 2008.424525]  [<ffffffff810c91e9>] __might_sleep+0xe9/0x110
[ 2008.426546]  [<ffffffff816faa8a>] down_read+0x2a/0xb0
[ 2008.428465]  [<ffffffff810c8971>] ? finish_task_switch+0x81/0x1a0
[ 2008.431076]  [<ffffffffc10ac6cc>] ldiskfs_xattr_get+0x5c/0x2e0 [ldiskfs]
[ 2008.433385]  [<ffffffffc10cdeaa>] ldiskfs_xattr_trusted_get+0x2a/0x30 [ldiskfs]
[ 2008.435819]  [<ffffffff81248255>] generic_getxattr+0x55/0x80
[ 2008.438162]  [<ffffffffc115959d>] osd_xattr_get+0x18d/0x850 [osd_ldiskfs]
[ 2008.440646]  [<ffffffff816fe400>] ? _raw_spin_unlock+0x20/0x40
[ 2008.442850]  [<ffffffffc0fa9ff9>] lfsck_orphan_it_next+0x809/0xc90 [lfsck]
[ 2008.445289]  [<ffffffffc0faa4ee>] lfsck_orphan_it_load+0x6e/0x160 [lfsck]
[ 2008.447732]  [<ffffffffc0a0be18>] dt_index_walk+0xf8/0x450 [obdclass]
[ 2008.450000]  [<ffffffffc0a0c170>] ? dt_index_walk+0x450/0x450 [obdclass]
[ 2008.452239]  [<ffffffffc0a0ca24>] dt_index_read+0x444/0x6a0 [obdclass]
[ 2008.454439]  [<ffffffffc0cd30a2>] tgt_obd_idx_read+0x612/0x860 [ptlrpc]
[ 2008.456637]  [<ffffffffc0cd8ae0>] tgt_request_handle+0x940/0x13f0 [ptlrpc]
[ 2008.458861]  [<ffffffffc0c7abae>] ptlrpc_server_handle_request+0x26e/0xb10 [ptlrpc]
[ 2008.461267]  [<ffffffffc0c7ede3>] ptlrpc_main+0xad3/0x1fb0 [ptlrpc]
[ 2008.463506]  [<ffffffff810c8971>] ? finish_task_switch+0x81/0x1a0
[ 2008.465626]  [<ffffffffc0c7e310>] ? ptlrpc_register_service+0xec0/0xec0 [ptlrpc]
[ 2008.468064]  [<ffffffff810bacdf>] kthread+0xef/0x100
[ 2008.469795]  [<ffffffff810babf0>] ? kthread_create_on_node+0x140/0x140
[ 2008.471925]  [<ffffffff8170989d>] ret_from_fork+0x5d/0xb0
[ 2008.473791]  [<ffffffff810babf0>] ? kthread_create_on_node+0x140/0x140

bug caused a long aged bug when rb tree was locked while created and never unlocked until sleep functions called.

77eea1985bb (Fan Yong 2014-02-12 17:21:32 +0800 6993) /* read lock the rbtree when init, and unlock when fini */
77eea1985bb (Fan Yong 2014-02-12 17:21:32 +0800 6994) read_lock(&llsd->llsd_rb_lock);



 Comments   
Comment by Alexey Lyashkov [ 30/Aug/18 ]

[ 2008.514110] CPU: 0 PID: 11038 Comm: mdt_out00_002 Tainted: G W OE ------------ 3.10.0-neo-7.4+ #0
[ 2008.517137] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 2008.520376] Call Trace:
[ 2008.521675] [<ffffffff816f4591>] dump_stack+0x19/0x1b
[ 2008.523622] [<ffffffff816ee328>] __schedule_bug+0x70/0x7f
[ 2008.525713] [<ffffffff816fba1e>] __schedule+0xa2e/0xab0
[ 2008.527837] [<ffffffff810cc996>] __cond_resched+0x26/0x30
[ 2008.530084] [<ffffffff816fbd9a>] _cond_resched+0x3a/0x50
[ 2008.532262] [<ffffffff816faa8f>] down_read+0x2f/0xb0
[ 2008.534240] [<ffffffff810c8971>] ? finish_task_switch+0x81/0x1a0
[ 2008.536354] [<ffffffffc10ac6cc>] ldiskfs_xattr_get+0x5c/0x2e0 [ldiskfs]
[ 2008.538912] [<ffffffffc10cdeaa>] ldiskfs_xattr_trusted_get+0x2a/0x30 [ldiskfs]
[ 2008.541490] [<ffffffff81248255>] generic_getxattr+0x55/0x80
[ 2008.543663] [<ffffffffc115959d>] osd_xattr_get+0x18d/0x850 [osd_ldiskfs]
[ 2008.546086] [<ffffffff816fe400>] ? _raw_spin_unlock+0x20/0x40
[ 2008.548173] [<ffffffffc0fa9ff9>] lfsck_orphan_it_next+0x809/0xc90 [lfsck]
[ 2008.550584] [<ffffffffc0faa4ee>] lfsck_orphan_it_load+0x6e/0x160 [lfsck]
[ 2008.552941] [<ffffffffc0a0be18>] dt_index_walk+0xf8/0x450 [obdclass]
[ 2008.555164] [<ffffffffc0a0c170>] ? dt_index_walk+0x450/0x450 [obdclass]
[ 2008.557501] [<ffffffffc0a0ca24>] dt_index_read+0x444/0x6a0 [obdclass]
[ 2008.559782] [<ffffffffc0cd30a2>] tgt_obd_idx_read+0x612/0x860 [ptlrpc]
[ 2008.562074] [<ffffffffc0cd8ae0>] tgt_request_handle+0x940/0x13f0 [ptlrpc]
[ 2008.564438] [<ffffffffc0c7abae>] ptlrpc_server_handle_request+0x26e/0xb10 [ptlrpc]
[ 2008.567074] [<ffffffffc0c7ede3>] ptlrpc_main+0xad3/0x1fb0 [ptlrpc]
[ 2008.569292] [<ffffffff810c8971>] ? finish_task_switch+0x81/0x1a0
[ 2008.571488] [<ffffffffc0c7e310>] ? ptlrpc_register_service+0xec0/0xec0 [ptlrpc]
[ 2008.573957] [<ffffffff810bacdf>] kthread+0xef/0x100
[ 2008.575825] [<ffffffff810babf0>] ? kthread_create_on_node+0x140/0x140
[ 2008.578150] [<ffffffff8170989d>] ret_from_fork+0x5d/0xb0
[ 2008.580230] [<ffffffff810babf0>] ? kthread_create_on_node+0x140/0x140

etc...

Comment by Andreas Dilger [ 12/Sep/18 ]

It looks like replacing rwlock_t llsd_rb_lock with rw_semaphore llsd_rb_sem and changing the users would fix this problem.

Comment by Andreas Dilger [ 26/Nov/18 ]

LU-11620 has a patch

Generated at Sat Feb 10 02:42:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.