Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.0, Lustre 2.10.7, Lustre 2.12.1
-
3
-
9223372036854775807
Description
Looking at the logs for https://testing.whamcloud.com/test_sets/ada6008e-b433-11e8-b86b-52540065bddc, we see that racer test_1 hangs in in ls, dir_create.sh. On both clients, we see several call to dir_create.sh, multiple ls calls, getfattr, ln and others blocked/hung.
From the first client’s (vm9) console log, we see many calls hung ending with “lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4”
[ 7722.701554] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7722.706903] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7726.780425] 3[19667]: segfault at 0 ip 0000000000403e5f sp 00007fff05523690 error 6 in 3 (deleted)[400000+6000] [ 7738.946240] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007163:0x167:0x0], use llapi_layout_get_by_path() [ 7858.011870] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection to lustre-OST0004 (at 10.9.4.117@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 7858.015216] LustreError: 167-0: lustre-OST0004-osc-ffff8e6e2480f800: This client was evicted by lustre-OST0004; in progress operations using this service will fail. [ 7858.019049] LustreError: 1426:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-OST0004-osc-ffff8e6e2480f800: namespace resource [0x832e:0x0:0x0].0x0 (ffff8e6e0ce026c0) refcount nonzero (2) after lock cleanup; forcing cleanup. [ 7920.249204] INFO: task dir_create.sh:15717 blocked for more than 120 seconds. [ 7920.250063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.250808] dir_create.sh D ffff8e6e02b14f10 0 15717 15658 0x00000080 [ 7920.251566] Call Trace: [ 7920.251882] [<ffffffffc0db47e2>] ? ll_dcompare+0x72/0x2e0 [lustre] [ 7920.252506] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.253246] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.253901] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.254400] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.254914] [<ffffffffa422927f>] link_path_walk+0x80f/0x8b0 [ 7920.255597] [<ffffffffa422b75d>] ? do_last+0x66d/0x12c0 [ 7920.256108] [<ffffffffa422c465>] path_openat+0xb5/0x640 [ 7920.256614] [<ffffffffa422e01d>] do_filp_open+0x4d/0xb0 [ 7920.257118] [<ffffffffa423b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.257637] [<ffffffffa421a327>] do_sys_open+0x137/0x240 [ 7920.258159] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.258783] [<ffffffffa421a44e>] SyS_open+0x1e/0x20 [ 7920.259260] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.259819] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.260552] INFO: task cp:32379 blocked for more than 120 seconds. [ 7920.261136] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.261877] cp D ffff8e6e2611af70 0 32379 15806 0x00000080 [ 7920.262581] Call Trace: [ 7920.262857] [<ffffffffc0df4cc5>] ? ll_i2suppgid+0x15/0x40 [lustre] [ 7920.263466] [<ffffffffc0df4d14>] ? ll_i2gids+0x24/0xb0 [lustre] [ 7920.264031] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.264635] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.265222] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.265711] [<ffffffffc0de785c>] ll_setattr_raw+0xb6c/0xee0 [lustre] [ 7920.266332] [<ffffffffc0de7c3c>] ll_setattr+0x6c/0xd0 [lustre] [ 7920.266897] [<ffffffffa4239f14>] notify_change+0x2c4/0x420 [ 7920.267428] [<ffffffffa424fab9>] utimes_common+0xd9/0x1c0 [ 7920.267952] [<ffffffffa424fcde>] do_utimes+0x13e/0x180 [ 7920.268452] [<ffffffffa424fe24>] SyS_utimensat+0x64/0xb0 [ 7920.268966] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.269585] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.270156] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.270775] INFO: task cp:434 blocked for more than 120 seconds. [ 7920.271344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.272068] cp D ffff8e6df7cb8000 0 434 15776 0x00000080 [ 7920.272778] Call Trace: [ 7920.273028] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.273636] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.274225] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.274721] [<ffffffffc0e14bc3>] vvp_io_write_start+0x603/0x810 [lustre] [ 7920.275506] [<ffffffffc0978377>] ? cl_lock_request+0x67/0x1f0 [obdclass] [ 7920.276191] [<ffffffffc097a228>] cl_io_start+0x68/0x130 [obdclass] [ 7920.276803] [<ffffffffc097c611>] cl_io_loop+0x131/0xc70 [obdclass] [ 7920.277417] [<ffffffffc0dc9db2>] ll_file_io_generic+0x4e2/0xd10 [lustre] [ 7920.278064] [<ffffffffc0dcab32>] ll_file_aio_write+0x372/0x540 [lustre] [ 7920.278699] [<ffffffffc0dcada4>] ll_file_write+0xa4/0x170 [lustre] [ 7920.279300] [<ffffffffa421b490>] vfs_write+0xc0/0x1f0 [ 7920.279776] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.280408] [<ffffffffa421c2bf>] SyS_write+0x7f/0xf0 [ 7920.280894] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.281512] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.282096] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.282719] INFO: task getfattr:1159 blocked for more than 120 seconds. [ 7920.283347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.284090] getfattr D ffff8e6df815af70 0 1159 15746 0x00000080 [ 7920.284799] Call Trace: [ 7920.285054] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.285660] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.286244] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.286726] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.287231] [<ffffffffa42291d4>] link_path_walk+0x764/0x8b0 [ 7920.287770] [<ffffffffc094c943>] ? class_handle2object+0xc3/0x1c0 [obdclass] [ 7920.288453] [<ffffffffa422a24a>] path_lookupat+0x7a/0x8b0 [ 7920.288981] [<ffffffffa41f7b9b>] ? kmem_cache_alloc+0x19b/0x1f0 [ 7920.289547] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.290096] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.290624] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.291185] [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20 [ 7920.291740] [<ffffffffa422142f>] ? cp_new_stat+0x14f/0x180 [ 7920.292278] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.292775] [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0 [ 7920.293282] [<ffffffffa422148e>] SYSC_newstat+0x2e/0x60 [ 7920.293776] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.294407] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.295029] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.295645] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.296272] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.296895] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.297512] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.298132] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.298746] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.299373] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.300000] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.300616] [<ffffffffa422176e>] SyS_newstat+0xe/0x10 [ 7920.301103] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.301668] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.302299] INFO: task getfattr:1656 blocked for more than 120 seconds. [ 7920.302919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.303653] getfattr D ffff8e6e06b6eeb0 0 1656 15679 0x00000080 [ 7920.304366] Call Trace: [ 7920.304606] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.305221] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.305790] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.306292] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.306784] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.307321] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.307887] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.308418] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.308946] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.309494] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.309998] [<ffffffffa4243f68>] SyS_listxattr+0x48/0xb0 [ 7920.310506] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.311125] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.311749] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.312320] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.312939] INFO: task chmod:1674 blocked for more than 120 seconds. [ 7920.313533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.314270] chmod D ffff8e6e02b12f70 0 1674 15675 0x00000080 [ 7920.314990] Call Trace: [ 7920.315238] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.315829] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.316424] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.316921] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.317417] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.317955] [<ffffffffa406814e>] ? kvm_clock_get_cycles+0x1e/0x20 [ 7920.318549] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.319125] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.319657] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.320204] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.320749] [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.321316] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.321814] [<ffffffffa4219d07>] SyS_fchmodat+0x47/0xc0 [ 7920.322322] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.322942] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.323558] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.324124] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.324743] INFO: task getfattr:1834 blocked for more than 120 seconds. [ 7920.325372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.326101] getfattr D ffff8e6df7068000 0 1834 15711 0x00000080 [ 7920.326802] Call Trace: [ 7920.327060] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.327661] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.328249] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.328745] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.329254] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.329778] [<ffffffffa4242b12>] ? generic_getxattr+0x52/0x70 [ 7920.330338] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.331029] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.331651] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.332278] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.332952] [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.333913] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.335027] [<ffffffffa4243d2a>] SyS_getxattr+0x4a/0xc0 [ 7920.336126] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.336980] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.337720] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.338301] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.338928] INFO: task rm:1977 blocked for more than 120 seconds. [ 7920.339503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.340232] rm D ffff8e6e05499fa0 0 1977 15734 0x00000080 [ 7920.340937] Call Trace: [ 7920.341181] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.341784] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.342374] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.342857] [<ffffffffa422d265>] do_rmdir+0x165/0x220 [ 7920.343352] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.343972] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.344586] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.345204] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.345816] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.346444] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.347063] [<ffffffffa422e555>] SyS_unlinkat+0x25/0x40 [ 7920.347565] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.348130] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.348747] INFO: task mv:2144 blocked for more than 120 seconds. [ 7920.349323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.350049] mv D ffff8e6e05498000 0 2144 15700 0x00000080 [ 7920.350756] Call Trace: [ 7920.351005] [<ffffffffa422a2f2>] ? path_lookupat+0x122/0x8b0 [ 7920.351547] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.352148] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.352728] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.353222] [<ffffffffa42276e5>] lock_rename+0x55/0xe0 [ 7920.353713] [<ffffffffa422d81f>] SYSC_renameat2+0x22f/0x5a0 [ 7920.354255] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.354871] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.355489] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.356110] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.356732] [<ffffffffa422ea0e>] SyS_renameat2+0xe/0x10 [ 7920.357247] [<ffffffffa422ea4e>] SyS_rename+0x1e/0x20 [ 7920.357731] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.358303] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.358930] INFO: task ln:2148 blocked for more than 120 seconds. [ 7920.359504] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.360254] ln D ffff8e6df7cbeeb0 0 2148 15720 0x00000080 [ 7920.360973] Call Trace: [ 7920.361221] [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.361812] [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.362402] [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f [ 7920.362896] [<ffffffffa470bcde>] lookup_slow+0x33/0xa7 [ 7920.363390] [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0 [ 7920.363924] [<ffffffffa41944fb>] ? unlock_page+0x2b/0x30 [ 7920.364444] [<ffffffffa41c13f1>] ? do_read_fault.isra.60+0x121/0x1a0 [ 7920.365048] [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0 [ 7920.365610] [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.366146] [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0 [ 7920.366671] [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0 [ 7920.367232] [<ffffffffa41c7c3d>] ? handle_mm_fault+0x39d/0x9b0 [ 7920.367777] [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20 [ 7920.368329] [<ffffffffa422df51>] user_path_at+0x11/0x20 [ 7920.368821] [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0 [ 7920.369344] [<ffffffffa42214f1>] SYSC_newlstat+0x31/0x60 [ 7920.369857] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.370495] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.371115] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.371737] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.372363] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.372987] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.373604] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.374250] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.374872] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.375495] [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.376114] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.376736] [<ffffffffa422177e>] SyS_newlstat+0xe/0x10 [ 7920.377240] [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21 [ 7920.377806] [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146 [ 8022.948194] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection restored to 10.9.4.117@tcp (at 10.9.4.117@tcp) [ 8023.080282] LustreError: 4670:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8023.080306] LustreError: 4664:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
From the second client’s (vm10) console log, we see similar calls as in the first client’s call stack
[ 7722.357867] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-10vm10,trevis-10vm9.trevis.whamcloud.com DURATION=300 ====== 12:19:23 (1536495563) [ 7722.541747] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7722.542686] Lustre: DEBUG MARKER: DURATION=300 MDSCOUNT=1 OSTCOUNT=7 RACER_ENABLE_REMOTE_DIRS=false RACER_ENABLE_STRIPED_DIRS=false RACER_ENABLE_MIGRATION=false RACER_ENABLE_PFL=true RACER_ENABLE_DOM=true RACER_ENABLE_FLR=true LFS=/usr/bin/lfs /usr/lib64/lustre/t [ 7726.494344] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007164:0x60:0x0], use llapi_layout_get_by_path() [ 7920.259874] INFO: task dir_create.sh:14445 blocked for more than 120 seconds. [ 7920.260874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.261634] dir_create.sh D ffff9bac39240fd0 0 14445 14433 0x00000080 [ 7920.262414] Call Trace: [ 7920.262749] [<ffffffffc0e427e2>] ? ll_dcompare+0x72/0x2e0 [lustre] [ 7920.263384] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.264027] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.264623] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.265151] [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7 [ 7920.265692] [<ffffffff88a2927f>] link_path_walk+0x80f/0x8b0 [ 7920.266253] [<ffffffff88a2b75d>] ? do_last+0x66d/0x12c0 [ 7920.266782] [<ffffffff88a2c465>] path_openat+0xb5/0x640 [ 7920.267314] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.267848] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.268392] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.268943] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.269595] [<ffffffff88a1a44e>] SyS_open+0x1e/0x20 [ 7920.270104] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.270693] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.271383] INFO: task ls:23458 blocked for more than 120 seconds. [ 7920.271998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.272770] ls D ffff9bac1fd82f70 0 23458 14490 0x00000080 [ 7920.273503] Call Trace: [ 7920.273774] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.274406] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.275021] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.275525] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.276048] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.276582] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.277251] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.277784] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.278318] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.278858] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.279515] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.280033] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.280614] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.281265] INFO: task ls:23462 blocked for more than 120 seconds. [ 7920.281868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.282636] ls D ffff9bac1fc13f40 0 23462 14490 0x00000080 [ 7920.283377] Call Trace: [ 7920.283638] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.284279] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.284885] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.285399] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.285922] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.286448] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.287110] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.287631] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.288173] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.288717] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.289361] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.289878] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.290470] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.291141] INFO: task ls:23464 blocked for more than 120 seconds. [ 7920.291746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.292497] ls D ffff9bac20318000 0 23464 14490 0x00000080 [ 7920.293256] Call Trace: [ 7920.293500] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.294150] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.294767] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.295280] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.295790] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.296327] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.296990] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.297510] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.298056] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.298587] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.299256] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.299775] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.300367] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.301015] INFO: task ls:23466 blocked for more than 120 seconds. [ 7920.301612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.302395] ls D ffff9bac2031af70 0 23466 14490 0x00000080 [ 7920.303136] Call Trace: [ 7920.303397] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.304035] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.304632] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.305155] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.305673] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.306200] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.306762] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.307431] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.307963] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.308493] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.309037] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.309683] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.310196] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.310786] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.311447] INFO: task getfattr:1379 blocked for more than 120 seconds. [ 7920.312100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.312876] getfattr D ffff9bac1d229fa0 0 1379 14538 0x00000080 [ 7920.313611] Call Trace: [ 7920.313884] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.314513] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.315128] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.315631] [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7 [ 7920.316162] [<ffffffff88a2aa08>] path_lookupat+0x838/0x8b0 [ 7920.316722] [<ffffffff88a42b12>] ? generic_getxattr+0x52/0x70 [ 7920.317294] [<ffffffff88a42f6a>] ? vfs_getxattr+0x8a/0xb0 [ 7920.317855] [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0 [ 7920.318436] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.319001] [<ffffffff88a2aaab>] filename_lookup+0x2b/0xc0 [ 7920.319544] [<ffffffff88a2dee7>] user_path_at_empty+0x67/0xc0 [ 7920.320131] [<ffffffff88f1b57c>] ? __do_page_fault+0x1bc/0x4f0 [ 7920.320718] [<ffffffff88a2df51>] user_path_at+0x11/0x20 [ 7920.321247] [<ffffffff88a43d2a>] SyS_getxattr+0x4a/0xc0 [ 7920.321777] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.322439] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.323091] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.323689] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.324352] INFO: task ln:2464 blocked for more than 120 seconds. [ 7920.324955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.325711] ln D ffff9bac21380fd0 0 2464 14485 0x00000080 [ 7920.326458] Call Trace: [ 7920.326712] [<ffffffff88a2a2f2>] ? path_lookupat+0x122/0x8b0 [ 7920.327296] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.327932] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.328532] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.329053] [<ffffffff88a2abc5>] filename_create+0x85/0x180 [ 7920.329608] [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0 [ 7920.330201] [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0 [ 7920.330775] [<ffffffff88a2cdc4>] ? getname_flags+0xc4/0x1a0 [ 7920.331338] [<ffffffff88a2d071>] user_path_create+0x41/0x60 [ 7920.331898] [<ffffffff88a2e5e8>] SyS_symlinkat+0x58/0x110 [ 7920.332441] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.333091] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.333755] [<ffffffff88a2e6b6>] SyS_symlink+0x16/0x20 [ 7920.334279] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.334869] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.335512] INFO: task ls:2582 blocked for more than 120 seconds. [ 7920.336114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.336890] ls D ffff9bac1d329fa0 0 2582 14455 0x00000080 [ 7920.337619] Call Trace: [ 7920.337894] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.338519] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.339135] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.339640] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.340160] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.340700] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.341373] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.341902] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.342438] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.342981] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.343618] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.344142] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.344757] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.345402] INFO: task ls:2583 blocked for more than 120 seconds. [ 7920.346003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.346774] ls D ffff9bac1d32dee0 0 2583 14455 0x00000080 [ 7920.347511] Call Trace: [ 7920.347787] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.348417] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.349033] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.349539] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.350059] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.350587] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.351271] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.351805] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.352339] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.352881] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.353524] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.354050] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.354631] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 7920.355301] INFO: task ls:2584 blocked for more than 120 seconds. [ 7920.355903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7920.356685] ls D ffff9bac1e540fd0 0 2584 14455 0x00000080 [ 7920.357413] Call Trace: [ 7920.357688] [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70 [ 7920.358315] [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0 [ 7920.358927] [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f [ 7920.359432] [<ffffffff88a2b37f>] do_last+0x28f/0x12c0 [ 7920.359955] [<ffffffff88a2c487>] path_openat+0xd7/0x640 [ 7920.360484] [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre] [ 7920.361166] [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0 [ 7920.361698] [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170 [ 7920.362231] [<ffffffff88a1a327>] do_sys_open+0x137/0x240 [ 7920.362768] [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 7920.363411] [<ffffffff88a1a464>] SyS_openat+0x14/0x20 [ 7920.363931] [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21 [ 7920.364512] [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146 [ 8022.975327] LustreError: 4302:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8022.975389] LustreError: 4303:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4 [ 8022.975622] LustreError: 4305:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
Description continued in the first comment due to JIRA description line limits.
Attachments
Issue Links
- is duplicated by
-
LU-12091 DNE/DOM: client evictions -108
-
- Resolved
-
-
LU-12121 DOM: BRW to missing obj?
-
- Resolved
-
- is related to
-
LU-12037 Possible DNE issue leading to hung filesystem
-
- Resolved
-
-
LU-12321 Unlink speed needs to be improved in DoM configuration
-
- Resolved
-
-
LU-6758 racer test_1: test failed to respond and timed out
-
- Closed
-
-
LU-11751 racer deadlocks due to DOM glimpse request
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Latest patchset of https://review.whamcloud.com/34071/ introduced the use of lu_env_find (due to aa82cc83
LU-12034obdclass: put all service's env on the list ) and thus it doesn't compile on b2_12 anymore. I wanted to rebase our branch against 2.12.2 and also use the patch that landed master (patchset 20) instead of patchset 12 that we are currently using in production. What is your recommendation? Or could you please backport it to b2_12? Thanks much!