[LU-11620] BUG: sleeping function called from invalid context at mm/slub.c:940 Created: 05/Nov/18  Updated: 19/Mar/19  Resolved: 04/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5
Fix Version/s: Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

Lustre 2.10.5_2.chaos
kernel 3.10.0-862.14.4.1chaos.ch6.x86_64


Issue Links:
Duplicate
is duplicated by LU-11302 schedule while atomic with lfsck code Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

OSS console log reports

BUG: sleeping function called from invalid context at mm/slub.c:940
in_atomic(): 1, irqs_disabled(): 0, pid: 152563, name: lfsck
CPU: 9 PID: 152563 Comm: lfsck Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-862.14.4.1chaos.ch6.x86_64 #1
Hardware name: CRAY CRAY-GB512X-CN/S2600JF, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
Call Trace:
 [<ffffffff9f334f01>] dump_stack+0x19/0x1b
 [<ffffffff9eccf439>] __might_sleep+0xd9/0x100
 [<ffffffff9ee06363>] kmem_cache_alloc+0x43/0x240
 [<ffffffffc16918e1>] ? ofd_object_alloc+0x51/0x240 [ofd]
 [<ffffffffc16918e1>] ofd_object_alloc+0x51/0x240 [ofd]
 [<ffffffffc1176464>] lu_object_alloc+0x54/0x320 [obdclass]
 [<ffffffffc1173cb3>] ? htable_lookup+0x163/0x180 [obdclass]
 [<ffffffffc1176910>] lu_object_find_at+0x180/0x2b0 [obdclass]
 [<ffffffffc1177a98>] dt_locate_at+0x18/0xb0 [obdclass]
 [<ffffffffc15fadb2>] lfsck_layout_slave_prep+0x392/0x5b0 [lfsck]
 [<ffffffffc15d1fe6>] lfsck_master_engine+0x196/0x1450 [lfsck]
 [<ffffffffc15d1e50>] ? lfsck_master_oit_engine+0x11a0/0x11a0 [lfsck]
 [<ffffffff9ecc12d1>] kthread+0xd1/0xe0
 [<ffffffff9ecc1200>] ? insert_kthread_work+0x40/0x40
 [<ffffffff9f347837>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff9ecc1200>] ? insert_kthread_work+0x40/0x40

and about a day later, a different stack

BUG: sleeping function called from invalid context at mm/slub.c:940
in_atomic(): 1, irqs_disabled(): 0, pid: 154333, name: ll_ost_out01_00
CPU: 8 PID: 154333 Comm: ll_ost_out01_00 Kdump: loaded Tainted: P        W  OE  ------------ T 3.10.0-862.14.4.1chaos.ch6.x86_64 #1
Hardware name: CRAY CRAY-GB512X-CN/S2600JF, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
Call Trace:
 [<ffffffff9f334f01>] dump_stack+0x19/0x1b
 [<ffffffff9eccf439>] __might_sleep+0xd9/0x100
 [<ffffffff9ee06363>] kmem_cache_alloc+0x43/0x240
 [<ffffffffc16918e1>] ? ofd_object_alloc+0x51/0x240 [ofd]
 [<ffffffffc16918e1>] ofd_object_alloc+0x51/0x240 [ofd]
 [<ffffffffc1176464>] lu_object_alloc+0x54/0x320 [obdclass]
 [<ffffffffc1173cb3>] ? htable_lookup+0x163/0x180 [obdclass]
 [<ffffffffc1176910>] lu_object_find_at+0x180/0x2b0 [obdclass]
 [<ffffffffc1176e7f>] lu_object_find_slice+0x1f/0x90 [obdclass]
 [<ffffffffc16074ce>] lfsck_orphan_it_next+0x17e/0xc90 [lfsck]
 [<ffffffffc160804e>] lfsck_orphan_it_load+0x6e/0x160 [lfsck]
 [<ffffffffc1178d28>] dt_index_walk+0xf8/0x450 [obdclass]
 [<ffffffffc1179080>] ? dt_index_walk+0x450/0x450 [obdclass]
 [<ffffffffc117993c>] dt_index_read+0x44c/0x6b0 [obdclass]
 [<ffffffffc13b47e2>] tgt_obd_idx_read+0x612/0x860 [ptlrpc]
 [<ffffffffc13b653a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
 [<ffffffffc135db5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
 [<ffffffffc135b26b>] ? ptlrpc_wait_event+0xab/0x350 [ptlrpc]
 [<ffffffff9ecd6492>] ? default_wake_function+0x12/0x20
 [<ffffffff9eccb87b>] ? __wake_up_common+0x5b/0x90
 [<ffffffffc1361c70>] ptlrpc_main+0xae0/0x1e90 [ptlrpc]
 [<ffffffffc1361190>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
 [<ffffffff9ecc12d1>] kthread+0xd1/0xe0
 [<ffffffff9ecc1200>] ? insert_kthread_work+0x40/0x40
 [<ffffffff9f347837>] ret_from_fork_nospec_begin+0x21/0x21
 [<ffffffff9ecc1200>] ? insert_kthread_work+0x40/0x40


 Comments   
Comment by Peter Jones [ 05/Nov/18 ]

?

Comment by Olaf Faaland [ 05/Nov/18 ]

Peter Jones writes:

> ?

Wrong window had focus.  Argh!

Comment by Olaf Faaland [ 05/Nov/18 ]

See https://github.com/LLNL/lustre for the patch stack.

Comment by Olaf Faaland [ 05/Nov/18 ]

The second stack looks very similar to https://jira.whamcloud.com/browse/LU-11302

Comment by Olaf Faaland [ 05/Nov/18 ]

Several other stacks, but they all start the same way:

lfsck_layout_slave_prep()>dt_locate_at()>lu_object_find_at()

Comment by Peter Jones [ 06/Nov/18 ]

Lai

Can you please investigate?

Thanks

Peter

Comment by Lai Siyao [ 07/Nov/18 ]

yes, I think it's duplicate of LU-11302. I'll cook a fix soon.

Comment by Gerrit Updater [ 07/Nov/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33603
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0e7523e371a8b04aa8a109eacb2cde5c5802f922

Comment by Gerrit Updater [ 04/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33603/
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 925ce153979d6ac793a65e193181ec14a8281640

Comment by Peter Jones [ 04/Jan/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 07/Jan/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33979
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 1c8ea78eb2bc25e1305c17bd9e54756cb9e17e3c

Comment by Gerrit Updater [ 15/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33979/
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 97f895ea37468513987c2048b8b413be27b267b0

Comment by Gerrit Updater [ 25/Feb/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34303
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 5c525791dd29a3f9f1dc2e1629b569d26ecc452a

Comment by Gerrit Updater [ 19/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34303/
Subject: LU-11620 lfsck: change llsd_rb_lock to rwsemaphore
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 813a626c837ecfbf19f59e98d119ab963bb8dd6e

Generated at Sat Feb 10 02:45:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.