Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
lustre-2.8.0_5.chaos-2.ch6.x86_64
kernel 3.10.0-514.0.0.1chaos.ch6.x86_64
-
3
-
9223372036854775807
Description
Console reports first this:
LustreError: 7526:0:(cl_object.c:735:cl_env_attach()) ASSERTION( rc == 0 ) failed: LustreError: 7526:0:(cl_object.c:735:cl_env_attach()) LBUG Pid: 7526, comm: ldlm_bl_02 Call Trace: libcfs_debug_dumpstack+0x53/0x80 [libcfs] lbug_with_loc+0x45/0xc0 [libcfs] cl_env_percpu_get+0xc2/0xd0 [obdclass] ll_invalidatepage+0x41/0x170 [lustre] vvp_page_discard+0xbd/0x160 [lustre] cl_page_invoid+0x68/0x170 [obdclass] cl_page_discard+0x13/0x20 [obdclass] discard_cb+0x67/0x190 [osc] osc_page_gang_lookup+0x1e0/0x320 [osc] ? discard_cb+0x0/0x190 [osc] osc_lock_discard_pages+0x119/0x22d [osc] ? discard_cb+0x0/0x190 [osc] osc_lock_flush+0x89/0x280 [osc] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] ldlm_cancel_callback+0x8a/0x2e0 [ptlrpc] ? dequeue_entity+0x11c/0x5d0 ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc] ldlm_cli_cancel+0xab/0x3d0 [ptlrpc] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] ? __schedule+0x3b8/0x9c0 ldlm_handle_bl_callback+0xcf/0x410 [ptlrpc] ldlm_bl_thread_main+0x531/0x700 [ptlrpc] ? default_wake_function+0x0/0x20 ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc] kthread+0xcf/0xe0 ? kthread+0x0/0xe0 ret_from_fork+0x58/0x90 ? kthread+0x0/0xe0
is followed by
BUG: sleeping function called from invalid context at mm/slub.c:941 in_atomic(): 1, irqs_disabled(): 0, pid: 7526, name: ldlm_bl_02 CPU: 23 PID: 7526 Comm: ldlm_bl_02 Tainted: G OE ------------ 3.10.0-514.0.0.1chaos.ch6.x86_64 #1 Call Trace: dump_stack+0x19/0x1b __might_sleep+0xd9/0x100 kmem_cache_alloc_trace+0x4a/0x250 ? call_usermodehelper_setup+0x3f/0xa0 call_usermodehelper_setup+0x3f/0xa0 call_usermodehelper+0x31/0x60 libcfs_run_upcall+0x9e/0x3b0 [libcfs] ? snprintf+0x49/0x70 libcfs_run_lbug_upcall+0x7d/0x100 [libcfs] lbug_with_loc+0x57/0xc0 [libcfs] cl_env_percpu_get+0xc2/0xd0 [obdclass] ll_invalidatepage+0x41/0x170 [lustre] vvp_page_discard+0xbd/0x160 [lustre] cl_page_invoid+0x68/0x170 [obdclass] cl_page_discard+0x13/0x20 [obdclass] discard_cb+0x67/0x190 [osc] osc_page_gang_lookup+0x1e0/0x320 [osc] ? check_and_discard_cb+0x150/0x150 [osc] osc_lock_discard_pages+0x119/0x22d [osc] ? check_and_discard_cb+0x150/0x150 [osc] osc_lock_flush+0x89/0x280 [osc] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] ldlm_cancel_callback+0x8a/0x2e0 [ptlrpc] ? dequeue_entity+0x11c/0x5d0 ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc] ldlm_cli_cancel+0xab/0x3d0 [ptlrpc] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] ? __schedule+0x3b8/0x9c0 ldlm_handle_bl_callback+0xcf/0x410 [ptlrpc] ldlm_bl_thread_main+0x531/0x700 [ptlrpc] ? wake_up_state+0x20/0x20 ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc] kthread+0xcf/0xe0 ? kthread_create_on_node+0x140/0x140 ret_from_fork+0x58/0x90 ? kthread_create_on_node+0x140/0x140
I'm not certain whether there is a particular workload that triggers this. We've been running concurrent mdtest and ior jobs, using remote directories but not striped directories.
The frequency is high; running on 300 clients for about 2 hours triggered this bug in 1/3 of the nodes.
Attachments
Issue Links
- duplicates
-
LU-8509 drop_caches hangs in cl_inode_fini()
- Resolved