|
Hi,
On our 3 big customer clusters running Lustre 2.1, we have quite frequent situations where Lustre Clients are evicted by OSSs servers and never reconnect. As a consequence, those nodes have to be rebooted.
When the scenario occurs, usually some parallel application is running on the affected Client node, and some of its threads are found stuck since quite a long time in a non-interruptible state, along with Lustre threads, like following :
==========================================
crash> ps | grep UN
17353 2 2 ffff880fd60560c0 UN 0.0 0 0 [ldlm_bl_13]
17355 2 6 ffff880f287a1100 UN 0.0 0 0 [ldlm_bl_15]
17356 2 11 ffff880ce88cf850 UN 0.0 0 0 [ldlm_bl_16]
17357 2 31 ffff880e434b1790 UN 0.0 0 0 [ldlm_bl_17]
17359 2 1 ffff880e434b1040 UN 0.0 0 0 [ldlm_bl_18]
17366 2 7 ffff880be74b2080 UN 0.0 0 0 [ldlm_bl_20]
17368 2 18 ffff880c92ed2080 UN 0.0 0 0 [ldlm_bl_22]
17369 2 6 ffff8802b3311850 UN 0.0 0 0 [ldlm_bl_23]
17370 2 11 ffff88035fd7c850 UN 0.0 0 0 [ldlm_bl_24]
17372 2 0 ffff88087d691080 UN 0.0 0 0 [ldlm_bl_25]
33850 2 31 ffff881055105790 UN 0.0 0 0 [ldlm_bl_02]
33851 2 15 ffff88047c9db0c0 UN 0.0 0 0 [ldlm_bl_03]
33852 2 30 ffff88047d788040 UN 0.0 0 0 [ldlm_bl_04]
33853 2 3 ffff88105506a7d0 UN 0.0 0 0 [ldlm_bl_05]
33854 2 6 ffff8802c8d38810 UN 0.0 0 0 [ldlm_bl_06]
33856 2 30 ffff8803f7682850 UN 0.0 0 0 [ldlm_bl_08]
38029 2 26 ffff880bdf7e87d0 UN 0.0 0 0 [ldlm_bl_10]
67452 67447 16 ffff88087ae3b040 UN 1.9 5984848 1325628 %%U657_malheur
67453 67447 8 ffff8805232cc850 UN 1.9 5977308 1319328 %%U657_malheur
67454 67447 24 ffff88087c67e080 UN 1.9 5981192 1322480 %%U657_malheur
67455 67447 4 ffff8806734140c0 UN 1.9 5992684 1331788 %%U657_malheur
67456 67447 20 ffff88087d8c9850 UN 1.9 6001336 1338540 %%U657_malheur
67457 67447 12 ffff88057b7e3790 UN 2.0 6026500 1360764 %%U657_malheur
67458 67447 28 ffff88087d8c9100 UN 2.0 6053684 1380308 %%U657_malheur
67459 67447 1 ffff88062ba0c7d0 UN 2.0 6053928 1380104 %%U657_malheur
67460 67447 17 ffff88087cc0e850 UN 2.0 6026372 1360408 %%U657_malheur
67461 67447 9 ffff8806a4e8d790 UN 1.9 6001772 1338920 %%U657_malheur
67462 67447 25 ffff8806a4e8d040 UN 1.9 5993776 1332416 %%U657_malheur
67463 67447 5 ffff8804ad4bf7d0 UN 1.9 5981100 1322596 %%U657_malheur
67464 67447 21 ffff8804ad4bf080 UN 2.0 6035776 1367864 %%U657_malheur
67465 67447 13 ffff8807b3b89810 UN 2.0 6041500 1371540 %%U657_malheur
69081 2 7 ffff8807f7a26790 UN 0.0 0 0 [ldlm_bl_11]
69082 2 4 ffff88086c1287d0 UN 0.0 0 0 [ldlm_bl_12]
80827 2 2 ffff880c7bd9e850 UN 0.0 0 0 [ptlrpcd_14]
80864 2 4 ffff88107cd4a040 UN 0.0 0 0 [ldlm_bl_00]
80865 2 3 ffff88107d2a5080 UN 0.0 0 0 [ldlm_bl_01]
==========================================
and these threads stacks look like :
====================================
PID: 17353 TASK: ffff880fd60560c0 CPU: 2 COMMAND: "ldlm_bl_13"
#0 [ffff881054bd7bf0] schedule at ffffffff8147dddc
#1 [ffff881054bd7cb8] __mutex_lock_slowpath at ffffffff8147f33e
#2 [ffff881054bd7d38] mutex_lock at ffffffff8147f1cb
#3 [ffff881054bd7d58] cl_lock_mutex_get at ffffffffa04a4a68 [obdclass]
#4 [ffff881054bd7d98] osc_ldlm_blocking_ast at ffffffffa0756e19 [osc]
#5 [ffff881054bd7e28] ldlm_handle_bl_callback at ffffffffa05636fd [ptlrpc]
#6 [ffff881054bd7e88] ldlm_bl_thread_main at ffffffffa0563b51 [ptlrpc]
#7 [ffff881054bd7f48] kernel_thread at ffffffff810041aa
or
PID: 67453 TASK: ffff8805232cc850 CPU: 8 COMMAND: "%%U657_malheur"
#0 [ffff880683fff988] schedule at ffffffff8147dddc
#1 [ffff880683fffa50] __mutex_lock_slowpath at ffffffff8147f33e
#2 [ffff880683fffad0] mutex_lock at ffffffff8147f1cb
#3 [ffff880683fffaf0] cl_lock_mutex_get at ffffffffa04a4a68 [obdclass]
#4 [ffff880683fffb30] cl_lock_hold_mutex at ffffffffa04a6dfa [obdclass]
#5 [ffff880683fffbf0] cl_lock_request at ffffffffa04a88ae [obdclass]
#6 [ffff880683fffc80] cl_glimpse_lock at ffffffffa084d170 [lustre]
#7 [ffff880683fffd10] cl_glimpse_size at ffffffffa084d9d4 [lustre]
#8 [ffff880683fffd90] ll_inode_revalidate_it at ffffffffa08032af [lustre]
#9 [ffff880683fffdf0] ll_getattr_it at ffffffffa0803479 [lustre]
#10 [ffff880683fffe20] ll_getattr at ffffffffa08035d7 [lustre]
#11 [ffff880683fffe60] vfs_getattr at ffffffff81163041
#12 [ffff880683fffea0] vfs_fstatat at ffffffff811630d0
#13 [ffff880683fffee0] vfs_stat at ffffffff8116321b
#14 [ffff880683fffef0] sys_newstat at ffffffff81163244
#15 [ffff880683ffff80] system_call_fastpath at ffffffff81003172
RIP: 0000003b7bcd82e5 RSP: 00007fffb146b860 RFLAGS: 00000246
RAX: 0000000000000004 RBX: ffffffff81003172 RCX: fffffffffffffffa
RDX: 00007fffb146a258 RSI: 00007fffb146a258 RDI: 000000001324d0d8
RBP: 0000000016800060 R8: 0000000000000077 R9: 00007fffb146a420
R10: 0000000003bfa080 R11: 0000000000000246 R12: 00000000131e4c50
R13: 0000000000000077 R14: ffffffff8104d9ee R15: ffff880683ffff78
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
====================================
Usually there is also another thread, not among the non-interruptible ones, with the following stack :
====================================
crash> ps | grep %%U657_malheur | grep -v UN
67451 67660 0 ffff880673414810 IN 1.9 6012000 1347004 %%U657_malheur
crash> bt ffff880673414810
PID: 67451 TASK: ffff880673414810 CPU: 0 COMMAND: "%%U657_malheur"
#0 [ffff88064de93548] schedule at ffffffff8147dddc
#1 [ffff88064de93610] cfs_waitq_wait at ffffffffa03be75e [libcfs]
#2 [ffff88064de93620] cl_sync_io_wait at ffffffffa04ac18f [obdclass]
#3 [ffff88064de936f0] cl_io_submit_sync at ffffffffa04ac387 [obdclass]
#4 [ffff88064de93750] cl_lock_page_out at ffffffffa04a7c3f [obdclass]
#5 [ffff88064de93800] osc_lock_flush at ffffffffa075607f [osc]
#6 [ffff88064de93850] osc_lock_cancel at ffffffffa075611a [osc]
#7 [ffff88064de938d0] cl_lock_cancel0 at ffffffffa04a2a6d [obdclass]
#8 [ffff88064de93920] cl_lock_hold_release at ffffffffa04a40b5 [obdclass]
#9 [ffff88064de93960] cl_lock_unhold at ffffffffa04a5e70 [obdclass]
#10 [ffff88064de939b0] lov_sublock_release at ffffffffa07b3b16 [lov]
#11 [ffff88064de93a30] lov_lock_enqueue at ffffffffa07b65b8 [lov]
#12 [ffff88064de93b00] cl_enqueue_try at ffffffffa04a6a4b [obdclass]
#13 [ffff88064de93b80] cl_enqueue_locked at ffffffffa04a8550 [obdclass]
#14 [ffff88064de93bf0] cl_lock_request at ffffffffa04a88ce [obdclass]
#15 [ffff88064de93c80] cl_glimpse_lock at ffffffffa084d170 [lustre]
#16 [ffff88064de93d10] cl_glimpse_size at ffffffffa084d9d4 [lustre]
#17 [ffff88064de93d90] ll_inode_revalidate_it at ffffffffa08032af [lustre]
#18 [ffff88064de93df0] ll_getattr_it at ffffffffa0803479 [lustre]
#19 [ffff88064de93e20] ll_getattr at ffffffffa08035d7 [lustre]
#20 [ffff88064de93e60] vfs_getattr at ffffffff81163041
#21 [ffff88064de93ea0] vfs_fstatat at ffffffff811630d0
#22 [ffff88064de93ee0] vfs_stat at ffffffff8116321b
#23 [ffff88064de93ef0] sys_newstat at ffffffff81163244
#24 [ffff88064de93f80] system_call_fastpath at ffffffff81003172
RIP: 0000003b7bcd82e5 RSP: 00007fffcb1cc638 RFLAGS: 00000202
RAX: 0000000000000004 RBX: ffffffff81003172 RCX: 0000003b7bccd177
RDX: 00007fffcb1cc8d8 RSI: 00007fffcb1cc8d8 RDI: 0000000014583ba8
RBP: 0000000016800060 R8: 0000000000000077 R9: 00007fffcb1ccaa0
R10: 0000000003bfa080 R11: 0000000000000246 R12: 000000001426cb00
R13: 0000000000000077 R14: ffffffff8104d9ee R15: ffff88064de93f78
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
====================================
This last thread is the current "owner" of the cl_lock struct(s), causing the other threads to be hung. The last associated logs are :
====================================================
Lustre: DEBUG MARKER: Mon Jun 18 15:50:01 2012
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) failure -116 inode 144150960170616438
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) Skipped 259 previous similar messages
Lustre: DEBUG MARKER: Mon Jun 18 15:55:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 16:00:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 16:05:01 2012
LustreError: 67451:0:(file.c:2186:ll_inode_revalidate_fini()) failure -116 inode 144150960170616931
LustreError: 67451:0:(file.c:2186:ll_inode_revalidate_fini()) Skipped 187 previous similar messages
Lustre: DEBUG MARKER: Mon Jun 18 16:10:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 16:15:01 2012
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) failure -116 inode 144150960170617447
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) Skipped 143 previous similar messages
Lustre: DEBUG MARKER: Mon Jun 18 16:20:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 16:25:01 2012
........
Lustre: DEBUG MARKER: Mon Jun 18 17:10:01 2012
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) failure -116 inode 144150960170620220
LustreError: 67467:0:(file.c:2186:ll_inode_revalidate_fini()) Skipped 51 previous similar messages
Lustre: DEBUG MARKER: Mon Jun 18 17:15:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:20:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:25:01 2012
LustreError: 67451:0:(cl_io.c:1700:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 36 remaining pages
LustreError: 67451:0:(cl_io.c:965:cl_io_cancel()) Canceling ongoing page trasmission
Lustre: DEBUG MARKER: Mon Jun 18 17:30:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:35:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:40:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:45:01 2012
Lustre: DEBUG MARKER: Mon Jun 18 17:50:01 2012
================================================================
This thread is awaiting for some pages still missing their flush status from the server.
Unfortunately, no matching logs have been found on the server side.
TIA,
Sebastien.
|