Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.0, Lustre 2.10.7, Lustre 2.12.1
-
3
-
9223372036854775807
Description
Looking at the logs for https://testing.whamcloud.com/test_sets/ada6008e-b433-11e8-b86b-52540065bddc, we see that racer test_1 hangs in in ls, dir_create.sh. On both clients, we see several call to dir_create.sh, multiple ls calls, getfattr, ln and others blocked/hung.
From the first client’s (vm9) console log, we see many calls hung ending with “lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4”
[ 7722.701554] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7722.706903] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7726.780425] 3[19667]: segfault at 0 ip 0000000000403e5f sp 00007fff05523690 error 6 in 3 (deleted)[400000+6000] [ 7738.946240] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007163:0x167:0x0], use llapi_layout_get_by_path() [ 7858.011870] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection to lustre-OST0004 (at 10.9.4.117@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 7858.015216] LustreError: 167-0: lustre-OST0004-osc-ffff8e6e2480f800: This client was evicted by lustre-OST0004; in progress operations using this service will fail. [ 7858.019049] LustreError: 1426:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-OST0004-osc-ffff8e6e2480f800: namespace resource [0x832e:0x0:0x0].0x0 (ffff8e6e0ce026c0) refcount nonzero (2) after lock cleanup; forcing cleanup. [ 7920.249204] INFO: task dir_create.sh:15717 blocked for more than 120 seconds. [ 7920.250063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.250808] dir_create.sh D ffff8e6e02b14f10 0 15717 15658 0x00000080 [ 7920.251566] Call Trace: [ 7920.251882] [<ffffffffc0db47e2>] ? ll_dcompare+0x72/0x2e0 [lustre] [ 7920.252506] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.253246] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.253901] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.254400] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.254914] [<ffffffffa422927f>] link_path_walk+0x80f/0x8b0 [ 7920.255597] [<ffffffffa422b75d>] ? do_last+0x66d/0x12c0 [ 7920.256108] [<ffffffffa422c465>] path_openat+0xb5/0x640 [ 7920.256614] [<ffffffffa422e01d>] do_filp_open+0x4d/0xb0 [ 7920.257118] [<ffffffffa423b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.257637] [<ffffffffa421a327>] do_sys_open+0x137/0x240 [ 7920.258159] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.258783] [<ffffffffa421a44e>] SyS_open+0x1e/0x20 [ 7920.259260] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.259819] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.260552] INFO: task cp:32379 blocked for more than 120 seconds. [ 7920.261136] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.261877] cp D ffff8e6e2611af70 0 32379 15806 0x00000080 [ 7920.262581] Call Trace: [ 7920.262857] [<ffffffffc0df4cc5>] ? ll_i2suppgid+0x15/0x40 [lustre] [ 7920.263466] [<ffffffffc0df4d14>] ? ll_i2gids+0x24/0xb0 [lustre] [ 7920.264031] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.264635] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.265222] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.265711] [<ffffffffc0de785c>] ll_setattr_raw+0xb6c/0xee0 [lustre] [ 7920.266332] [<ffffffffc0de7c3c>] ll_setattr+0x6c/0xd0 [lustre] [ 7920.266897] [<ffffffffa4239f14>] notify_change+0x2c4/0x420 [ 7920.267428] [<ffffffffa424fab9>] utimes_common+0xd9/0x1c0 [ 7920.267952] [<ffffffffa424fcde>] do_utimes+0x13e/0x180 [ 7920.268452] [<ffffffffa424fe24>] SyS_utimensat+0x64/0xb0 [ 7920.268966] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.269585] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.270156] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.270775] INFO: task cp:434 blocked for more than 120 seconds. [ 7920.271344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.272068] cp D ffff8e6df7cb8000 0 434 15776 0x00000080 [ 7920.272778] Call Trace: [ 7920.273028] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.273636] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.274225] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.274721] [<ffffffffc0e14bc3>] vvp_io_write_start+0x603/0x810 [lustre] [ 7920.275506] [<ffffffffc0978377>] ? cl_lock_request+0x67/0x1f0 [obdclass] [ 7920.276191] [<ffffffffc097a228>] cl_io_start+0x68/0x130 [obdclass] [ 7920.276803] [<ffffffffc097c611>] cl_io_loop+0x131/0xc70 [obdclass] [ 7920.277417] [<ffffffffc0dc9db2>] ll_file_io_generic+0x4e2/0xd10 [lustre] [ 7920.278064] [<ffffffffc0dcab32>] ll_file_aio_write+0x372/0x540 [lustre] [ 7920.278699] [<ffffffffc0dcada4>] ll_file_write+0xa4/0x170 [lustre] [ 7920.279300] [<ffffffffa421b490>] vfs_write+0xc0/0x1f0 [ 7920.279776] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.280408] [<ffffffffa421c2bf>] SyS_write+0x7f/0xf0 [ 7920.280894] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.281512] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.282096] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.282719] INFO: task getfattr:1159 blocked for more than 120 seconds. [ 7920.283347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.284090] getfattr D ffff8e6df815af70 0 1159 15746 0x00000080 [ 7920.284799] Call Trace: [ 7920.285054] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.285660] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.286244] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.286726] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.287231] [<ffffffffa42291d4>] link_path_walk+0x764/0x8b0 [ 7920.287770] [<ffffffffc094c943>] ? class_handle2object+0xc3/0x1c0 [obdclass] [ 7920.288453] [<ffffffffa422a24a>] path_lookupat+0x7a/0x8b0 [ 7920.288981] [<ffffffffa41f7b9b>] ? kmem_cache_alloc+0x19b/0x1f0 [ 7920.289547] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.290096] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.290624] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.291185] [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20 [ 7920.291740] [<ffffffffa422142f>] ? cp_new_stat+0x14f/0x180 [ 7920.292278] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.292775] [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0 [ 7920.293282] [<ffffffffa422148e>] SYSC_newstat+0x2e/0x60 [ 7920.293776] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.294407] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.295029] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.295645] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.296272] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.296895] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.297512] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.298132] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.298746] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.299373] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.300000] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.300616] [<ffffffffa422176e>] SyS_newstat+0xe/0x10 [ 7920.301103] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.301668] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.302299] INFO: task getfattr:1656 blocked for more than 120 seconds. [ 7920.302919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.303653] getfattr D ffff8e6e06b6eeb0 0 1656 15679 0x00000080 [ 7920.304366] Call Trace: [ 7920.304606] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.305221] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.305790] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.306292] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.306784] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.307321] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.307887] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.308418] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.308946] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.309494] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.309998] [<ffffffffa4243f68>] SyS_listxattr+0x48/0xb0 [ 7920.310506] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.311125] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.311749] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.312320] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.312939] INFO: task chmod:1674 blocked for more than 120 seconds. [ 7920.313533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.314270] chmod D ffff8e6e02b12f70 0 1674 15675 0x00000080 [ 7920.314990] Call Trace: [ 7920.315238] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.315829] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.316424] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.316921] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.317417] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.317955] [<ffffffffa406814e>] ? kvm_clock_get_cycles+0x1e/0x20 [ 7920.318549] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.319125] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.319657] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.320204] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.320749] [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.321316] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.321814] [<ffffffffa4219d07>] SyS_fchmodat+0x47/0xc0 [ 7920.322322] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.322942] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.323558] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.324124] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.324743] INFO: task getfattr:1834 blocked for more than 120 seconds. [ 7920.325372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.326101] getfattr D ffff8e6df7068000 0 1834 15711 0x00000080 [ 7920.326802] Call Trace: [ 7920.327060] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.327661] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.328249] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.328745] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.329254] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.329778] [<ffffffffa4242b12>] ? generic_getxattr+0x52/0x70 [ 7920.330338] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.331029] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.331651] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.332278] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.332952] [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.333913] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.335027] [<ffffffffa4243d2a>] SyS_getxattr+0x4a/0xc0 [ 7920.336126] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.336980] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.337720] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.338301] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.338928] INFO: task rm:1977 blocked for more than 120 seconds. [ 7920.339503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.340232] rm D ffff8e6e05499fa0 0 1977 15734 0x00000080 [ 7920.340937] Call Trace: [ 7920.341181] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.341784] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.342374] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.342857] [<ffffffffa422d265>] do_rmdir+0x165/0x220 [ 7920.343352] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.343972] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.344586] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.345204] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.345816] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.346444] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.347063] [<ffffffffa422e555>] SyS_unlinkat+0x25/0x40 [ 7920.347565] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.348130] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.348747] INFO: task mv:2144 blocked for more than 120 seconds. [ 7920.349323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.350049] mv D ffff8e6e05498000 0 2144 15700 0x00000080 [ 7920.350756] Call Trace: [ 7920.351005] [<ffffffffa422a2f2>] ? path_lookupat+0x122/0x8b0 [ 7920.351547] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.352148] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.352728] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.353222] [<ffffffffa42276e5>] lock_rename+0x55/0xe0 [ 7920.353713] [<ffffffffa422d81f>] SYSC_renameat2+0x22f/0x5a0 [ 7920.354255] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.354871] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.355489] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.356110] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.356732] [<ffffffffa422ea0e>] SyS_renameat2+0xe/0x10 [ 7920.357247] [<ffffffffa422ea4e>] SyS_rename+0x1e/0x20 [ 7920.357731] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.358303] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.358930] INFO: task ln:2148 blocked for more than 120 seconds. [ 7920.359504] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.360254] ln D ffff8e6df7cbeeb0 0 2148 15720 0x00000080 [ 7920.360973] Call Trace: [ 7920.361221] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.361812] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.362402] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.362896] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.363390] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.363924] [<ffffffffa41944fb>] ? unlock_page+0x2b/0x30 [ 7920.364444] [<ffffffffa41c13f1>] ? do_read_fault.isra.60+0x121/0x1a0 [ 7920.365048] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.365610] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.366146] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.366671] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.367232] [<ffffffffa41c7c3d>] ? handle_mm_fault+0x39d/0x9b0 [ 7920.367777] [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20 [ 7920.368329] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.368821] [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0 [ 7920.369344] [<ffffffffa42214f1>] SYSC_newlstat+0x31/0x60 [ 7920.369857] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.370495] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.371115] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.371737] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.372363] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.372987] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.373604] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.374250] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.374872] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.375495] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.376114] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.376736] [<ffffffffa422177e>] SyS_newlstat+0xe/0x10 [ 7920.377240] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.377806] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 8022.948194] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection restored to 10.9.4.117@tcp (at 10.9.4.117@tcp) [ 8023.080282] LustreError: 4670:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8023.080306] LustreError: 4664:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
From the second client’s (vm10) console log, we see similar calls as in the first client’s call stack
[ 7722.357867] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-10vm10,trevis-10vm9.trevis.whamcloud.com DURATION=300 ====== 12:19:23 (1536495563) [ 7722.541747] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7722.542686] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7726.494344] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007164:0x60:0x0], use llapi_layout_get_by_path() [ 7920.259874] INFO: task dir_create.sh:14445 blocked for more than 120 seconds. [ 7920.260874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.261634] dir_create.sh D ffff9bac39240fd0 0 14445 14433 0x00000080 [ 7920.262414] Call Trace: [ 7920.262749] [<ffffffffc0e427e2>] ? ll_dcompare+0x72/0x2e0 [lustre] [ 7920.263384] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.264027] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.264623] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.265151] [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7 [ 7920.265692] [<ffffffff88a2927f>] link_path_walk+0x80f/0x8b0 [ 7920.266253] [<ffffffff88a2b75d>] ? do_last+0x66d/0x12c0 [ 7920.266782] [<ffffffff88a2c465>] path_openat+0xb5/0x640 [ 7920.267314] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.267848] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.268392] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.268943] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.269595] [<ffffffff88a1a44e>] SyS_open+0x1e/0x20 [ 7920.270104] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.270693] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.271383] INFO: task ls:23458 blocked for more than 120 seconds. [ 7920.271998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.272770] ls D ffff9bac1fd82f70 0 23458 14490 0x00000080 [ 7920.273503] Call Trace: [ 7920.273774] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.274406] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.275021] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.275525] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.276048] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.276582] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.277251] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.277784] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.278318] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.278858] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.279515] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.280033] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.280614] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.281265] INFO: task ls:23462 blocked for more than 120 seconds. [ 7920.281868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.282636] ls D ffff9bac1fc13f40 0 23462 14490 0x00000080 [ 7920.283377] Call Trace: [ 7920.283638] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.284279] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.284885] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.285399] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.285922] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.286448] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.287110] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.287631] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.288173] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.288717] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.289361] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.289878] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.290470] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.291141] INFO: task ls:23464 blocked for more than 120 seconds. [ 7920.291746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.292497] ls D ffff9bac20318000 0 23464 14490 0x00000080 [ 7920.293256] Call Trace: [ 7920.293500] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.294150] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.294767] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.295280] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.295790] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.296327] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.296990] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.297510] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.298056] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.298587] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.299256] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.299775] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.300367] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.301015] INFO: task ls:23466 blocked for more than 120 seconds. [ 7920.301612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.302395] ls D ffff9bac2031af70 0 23466 14490 0x00000080 [ 7920.303136] Call Trace: [ 7920.303397] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.304035] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.304632] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.305155] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.305673] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.306200] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.306762] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.307431] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.307963] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.308493] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.309037] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.309683] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.310196] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.310786] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.311447] INFO: task getfattr:1379 blocked for more than 120 seconds. [ 7920.312100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.312876] getfattr D ffff9bac1d229fa0 0 1379 14538 0x00000080 [ 7920.313611] Call Trace: [ 7920.313884] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.314513] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.315128] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.315631] [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7 [ 7920.316162] [<ffffffff88a2aa08>] path_lookupat+0x838/0x8b0 [ 7920.316722] [<ffffffff88a42b12>] ? generic_getxattr+0x52/0x70 [ 7920.317294] [<ffffffff88a42f6a>] ? vfs_getxattr+0x8a/0xb0 [ 7920.317855] [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0 [ 7920.318436] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.319001] [<ffffffff88a2aaab>] filename_lookup+0x2b/0xc0 [ 7920.319544] [<ffffffff88a2dee7>] user_path_at_empty+0x67/0xc0 [ 7920.320131] [<ffffffff88f1b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.320718] [<ffffffff88a2df51>] user_path_at+0x11/0x20 [ 7920.321247] [<ffffffff88a43d2a>] SyS_getxattr+0x4a/0xc0 [ 7920.321777] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.322439] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.323091] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.323689] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.324352] INFO: task ln:2464 blocked for more than 120 seconds. [ 7920.324955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.325711] ln D ffff9bac21380fd0 0 2464 14485 0x00000080 [ 7920.326458] Call Trace: [ 7920.326712] [<ffffffff88a2a2f2>] ? path_lookupat+0x122/0x8b0 [ 7920.327296] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.327932] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.328532] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.329053] [<ffffffff88a2abc5>] filename_create+0x85/0x180 [ 7920.329608] [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0 [ 7920.330201] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.330775] [<ffffffff88a2cdc4>] ? getname_flags+0xc4/0x1a0 [ 7920.331338] [<ffffffff88a2d071>] user_path_create+0x41/0x60 [ 7920.331898] [<ffffffff88a2e5e8>] SyS_symlinkat+0x58/0x110 [ 7920.332441] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.333091] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.333755] [<ffffffff88a2e6b6>] SyS_symlink+0x16/0x20 [ 7920.334279] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.334869] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.335512] INFO: task ls:2582 blocked for more than 120 seconds. [ 7920.336114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.336890] ls D ffff9bac1d329fa0 0 2582 14455 0x00000080 [ 7920.337619] Call Trace: [ 7920.337894] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.338519] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.339135] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.339640] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.340160] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.340700] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.341373] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.341902] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.342438] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.342981] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.343618] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.344142] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.344757] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.345402] INFO: task ls:2583 blocked for more than 120 seconds. [ 7920.346003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.346774] ls D ffff9bac1d32dee0 0 2583 14455 0x00000080 [ 7920.347511] Call Trace: [ 7920.347787] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.348417] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.349033] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.349539] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.350059] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.350587] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.351271] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.351805] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.352339] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.352881] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.353524] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.354050] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.354631] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.355301] INFO: task ls:2584 blocked for more than 120 seconds. [ 7920.355903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.356685] ls D ffff9bac1e540fd0 0 2584 14455 0x00000080 [ 7920.357413] Call Trace: [ 7920.357688] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.358315] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.358927] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.359432] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.359955] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.360484] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.361166] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.361698] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.362231] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.362768] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.363411] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.363931] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.364512] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 8022.975327] LustreError: 4302:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8022.975389] LustreError: 4303:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8022.975622] LustreError: 4305:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
Description continued in the first comment due to JIRA description line limits.
Attachments
Issue Links
- is duplicated by
-
LU-12091 DNE/DOM: client evictions -108
- Resolved
-
LU-12121 DOM: BRW to missing obj?
- Resolved
- is related to
-
LU-12037 Possible DNE issue leading to hung filesystem
- Resolved
-
LU-12321 Unlink speed needs to be improved in DoM configuration
- Resolved
-
LU-6758 racer test_1: test failed to respond and timed out
- Closed
-
LU-11751 racer deadlocks due to DOM glimpse request
- Closed
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...