Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11359

racer test 1 times out with client hung in dir_create.sh, ls, … and MDS in ldlm_completion_ast()

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.12.0, Lustre 2.10.7, Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      Looking at the logs for https://testing.whamcloud.com/test_sets/ada6008e-b433-11e8-b86b-52540065bddc, we see that racer test_1 hangs in in ls, dir_create.sh. On both clients, we see several call to dir_create.sh, multiple ls calls, getfattr, ln and others blocked/hung.

      From the first client’s (vm9) console log, we see many calls hung ending with “lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4”

      [ 7722.701554] Lustre: DEBUG MARKER: DURATION=300 			MDSCOUNT=1 OSTCOUNT=7			RACER_ENABLE_REMOTE_DIRS=false 			RACER_ENABLE_STRIPED_DIRS=false 			RACER_ENABLE_MIGRATION=false 			RACER_ENABLE_PFL=true 			RACER_ENABLE_DOM=true 			RACER_ENABLE_FLR=true 			LFS=/usr/bin/lfs 			/usr/lib64/lustre/t
      [ 7722.706903] Lustre: DEBUG MARKER: DURATION=300 			MDSCOUNT=1 OSTCOUNT=7			RACER_ENABLE_REMOTE_DIRS=false 			RACER_ENABLE_STRIPED_DIRS=false 			RACER_ENABLE_MIGRATION=false 			RACER_ENABLE_PFL=true 			RACER_ENABLE_DOM=true 			RACER_ENABLE_FLR=true 			LFS=/usr/bin/lfs 			/usr/lib64/lustre/t
      [ 7726.780425] 3[19667]: segfault at 0 ip 0000000000403e5f sp 00007fff05523690 error 6 in 3 (deleted)[400000+6000]
      [ 7738.946240] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007163:0x167:0x0], use llapi_layout_get_by_path()
      [ 7858.011870] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection to lustre-OST0004 (at 10.9.4.117@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [ 7858.015216] LustreError: 167-0: lustre-OST0004-osc-ffff8e6e2480f800: This client was evicted by lustre-OST0004; in progress operations using this service will fail.
      [ 7858.019049] LustreError: 1426:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-OST0004-osc-ffff8e6e2480f800: namespace resource [0x832e:0x0:0x0].0x0 (ffff8e6e0ce026c0) refcount nonzero (2) after lock cleanup; forcing cleanup.
      [ 7920.249204] INFO: task dir_create.sh:15717 blocked for more than 120 seconds.
      [ 7920.250063] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.250808] dir_create.sh   D ffff8e6e02b14f10     0 15717  15658 0x00000080
      [ 7920.251566] Call Trace:
      [ 7920.251882]  [<ffffffffc0db47e2>] ? ll_dcompare+0x72/0x2e0 [lustre]
      [ 7920.252506]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.253246]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.253901]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.254400]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.254914]  [<ffffffffa422927f>] link_path_walk+0x80f/0x8b0
      [ 7920.255597]  [<ffffffffa422b75d>] ? do_last+0x66d/0x12c0
      [ 7920.256108]  [<ffffffffa422c465>] path_openat+0xb5/0x640
      [ 7920.256614]  [<ffffffffa422e01d>] do_filp_open+0x4d/0xb0
      [ 7920.257118]  [<ffffffffa423b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.257637]  [<ffffffffa421a327>] do_sys_open+0x137/0x240
      [ 7920.258159]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.258783]  [<ffffffffa421a44e>] SyS_open+0x1e/0x20
      [ 7920.259260]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.259819]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.260552] INFO: task cp:32379 blocked for more than 120 seconds.
      [ 7920.261136] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.261877] cp              D ffff8e6e2611af70     0 32379  15806 0x00000080
      [ 7920.262581] Call Trace:
      [ 7920.262857]  [<ffffffffc0df4cc5>] ? ll_i2suppgid+0x15/0x40 [lustre]
      [ 7920.263466]  [<ffffffffc0df4d14>] ? ll_i2gids+0x24/0xb0 [lustre]
      [ 7920.264031]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.264635]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.265222]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.265711]  [<ffffffffc0de785c>] ll_setattr_raw+0xb6c/0xee0 [lustre]
      [ 7920.266332]  [<ffffffffc0de7c3c>] ll_setattr+0x6c/0xd0 [lustre]
      [ 7920.266897]  [<ffffffffa4239f14>] notify_change+0x2c4/0x420
      [ 7920.267428]  [<ffffffffa424fab9>] utimes_common+0xd9/0x1c0
      [ 7920.267952]  [<ffffffffa424fcde>] do_utimes+0x13e/0x180
      [ 7920.268452]  [<ffffffffa424fe24>] SyS_utimensat+0x64/0xb0
      [ 7920.268966]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.269585]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.270156]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.270775] INFO: task cp:434 blocked for more than 120 seconds.
      [ 7920.271344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.272068] cp              D ffff8e6df7cb8000     0   434  15776 0x00000080
      [ 7920.272778] Call Trace:
      [ 7920.273028]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.273636]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.274225]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.274721]  [<ffffffffc0e14bc3>] vvp_io_write_start+0x603/0x810 [lustre]
      [ 7920.275506]  [<ffffffffc0978377>] ? cl_lock_request+0x67/0x1f0 [obdclass]
      [ 7920.276191]  [<ffffffffc097a228>] cl_io_start+0x68/0x130 [obdclass]
      [ 7920.276803]  [<ffffffffc097c611>] cl_io_loop+0x131/0xc70 [obdclass]
      [ 7920.277417]  [<ffffffffc0dc9db2>] ll_file_io_generic+0x4e2/0xd10 [lustre]
      [ 7920.278064]  [<ffffffffc0dcab32>] ll_file_aio_write+0x372/0x540 [lustre]
      [ 7920.278699]  [<ffffffffc0dcada4>] ll_file_write+0xa4/0x170 [lustre]
      [ 7920.279300]  [<ffffffffa421b490>] vfs_write+0xc0/0x1f0
      [ 7920.279776]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.280408]  [<ffffffffa421c2bf>] SyS_write+0x7f/0xf0
      [ 7920.280894]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.281512]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.282096]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.282719] INFO: task getfattr:1159 blocked for more than 120 seconds.
      [ 7920.283347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.284090] getfattr        D ffff8e6df815af70     0  1159  15746 0x00000080
      [ 7920.284799] Call Trace:
      [ 7920.285054]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.285660]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.286244]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.286726]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.287231]  [<ffffffffa42291d4>] link_path_walk+0x764/0x8b0
      [ 7920.287770]  [<ffffffffc094c943>] ? class_handle2object+0xc3/0x1c0 [obdclass]
      [ 7920.288453]  [<ffffffffa422a24a>] path_lookupat+0x7a/0x8b0
      [ 7920.288981]  [<ffffffffa41f7b9b>] ? kmem_cache_alloc+0x19b/0x1f0
      [ 7920.289547]  [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.290096]  [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0
      [ 7920.290624]  [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.291185]  [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20
      [ 7920.291740]  [<ffffffffa422142f>] ? cp_new_stat+0x14f/0x180
      [ 7920.292278]  [<ffffffffa422df51>] user_path_at+0x11/0x20
      [ 7920.292775]  [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0
      [ 7920.293282]  [<ffffffffa422148e>] SYSC_newstat+0x2e/0x60
      [ 7920.293776]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.294407]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.295029]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.295645]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.296272]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.296895]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.297512]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.298132]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.298746]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.299373]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.300000]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.300616]  [<ffffffffa422176e>] SyS_newstat+0xe/0x10
      [ 7920.301103]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.301668]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.302299] INFO: task getfattr:1656 blocked for more than 120 seconds.
      [ 7920.302919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.303653] getfattr        D ffff8e6e06b6eeb0     0  1656  15679 0x00000080
      [ 7920.304366] Call Trace:
      [ 7920.304606]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.305221]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.305790]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.306292]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.306784]  [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0
      [ 7920.307321]  [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0
      [ 7920.307887]  [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.308418]  [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0
      [ 7920.308946]  [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.309494]  [<ffffffffa422df51>] user_path_at+0x11/0x20
      [ 7920.309998]  [<ffffffffa4243f68>] SyS_listxattr+0x48/0xb0
      [ 7920.310506]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.311125]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.311749]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.312320]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.312939] INFO: task chmod:1674 blocked for more than 120 seconds.
      [ 7920.313533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.314270] chmod           D ffff8e6e02b12f70     0  1674  15675 0x00000080
      [ 7920.314990] Call Trace:
      [ 7920.315238]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.315829]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.316424]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.316921]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.317417]  [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0
      [ 7920.317955]  [<ffffffffa406814e>] ? kvm_clock_get_cycles+0x1e/0x20
      [ 7920.318549]  [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0
      [ 7920.319125]  [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.319657]  [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0
      [ 7920.320204]  [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.320749]  [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0
      [ 7920.321316]  [<ffffffffa422df51>] user_path_at+0x11/0x20
      [ 7920.321814]  [<ffffffffa4219d07>] SyS_fchmodat+0x47/0xc0
      [ 7920.322322]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.322942]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.323558]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.324124]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.324743] INFO: task getfattr:1834 blocked for more than 120 seconds.
      [ 7920.325372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.326101] getfattr        D ffff8e6df7068000     0  1834  15711 0x00000080
      [ 7920.326802] Call Trace:
      [ 7920.327060]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.327661]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.328249]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.328745]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.329254]  [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0
      [ 7920.329778]  [<ffffffffa4242b12>] ? generic_getxattr+0x52/0x70
      [ 7920.330338]  [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0
      [ 7920.331029]  [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.331651]  [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0
      [ 7920.332278]  [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.332952]  [<ffffffffa471b57c>] ? __do_page_fault+0x1bc/0x4f0
      [ 7920.333913]  [<ffffffffa422df51>] user_path_at+0x11/0x20
      [ 7920.335027]  [<ffffffffa4243d2a>] SyS_getxattr+0x4a/0xc0
      [ 7920.336126]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.336980]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.337720]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.338301]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.338928] INFO: task rm:1977 blocked for more than 120 seconds.
      [ 7920.339503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.340232] rm              D ffff8e6e05499fa0     0  1977  15734 0x00000080
      [ 7920.340937] Call Trace:
      [ 7920.341181]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.341784]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.342374]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.342857]  [<ffffffffa422d265>] do_rmdir+0x165/0x220
      [ 7920.343352]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.343972]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.344586]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.345204]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.345816]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.346444]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.347063]  [<ffffffffa422e555>] SyS_unlinkat+0x25/0x40
      [ 7920.347565]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.348130]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.348747] INFO: task mv:2144 blocked for more than 120 seconds.
      [ 7920.349323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.350049] mv              D ffff8e6e05498000     0  2144  15700 0x00000080
      [ 7920.350756] Call Trace:
      [ 7920.351005]  [<ffffffffa422a2f2>] ? path_lookupat+0x122/0x8b0
      [ 7920.351547]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.352148]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.352728]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.353222]  [<ffffffffa42276e5>] lock_rename+0x55/0xe0
      [ 7920.353713]  [<ffffffffa422d81f>] SYSC_renameat2+0x22f/0x5a0
      [ 7920.354255]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.354871]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.355489]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.356110]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.356732]  [<ffffffffa422ea0e>] SyS_renameat2+0xe/0x10
      [ 7920.357247]  [<ffffffffa422ea4e>] SyS_rename+0x1e/0x20
      [ 7920.357731]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.358303]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.358930] INFO: task ln:2148 blocked for more than 120 seconds.
      [ 7920.359504] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.360254] ln              D ffff8e6df7cbeeb0     0  2148  15720 0x00000080
      [ 7920.360973] Call Trace:
      [ 7920.361221]  [<ffffffffa4714f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.361812]  [<ffffffffa4712d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.362402]  [<ffffffffa47120ef>] mutex_lock+0x1f/0x2f
      [ 7920.362896]  [<ffffffffa470bcde>] lookup_slow+0x33/0xa7
      [ 7920.363390]  [<ffffffffa422aa08>] path_lookupat+0x838/0x8b0
      [ 7920.363924]  [<ffffffffa41944fb>] ? unlock_page+0x2b/0x30
      [ 7920.364444]  [<ffffffffa41c13f1>] ? do_read_fault.isra.60+0x121/0x1a0
      [ 7920.365048]  [<ffffffffa41f7b79>] ? kmem_cache_alloc+0x179/0x1f0
      [ 7920.365610]  [<ffffffffa422cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.366146]  [<ffffffffa422aaab>] filename_lookup+0x2b/0xc0
      [ 7920.366671]  [<ffffffffa422dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.367232]  [<ffffffffa41c7c3d>] ? handle_mm_fault+0x39d/0x9b0
      [ 7920.367777]  [<ffffffffa4125092>] ? from_kgid_munged+0x12/0x20
      [ 7920.368329]  [<ffffffffa422df51>] user_path_at+0x11/0x20
      [ 7920.368821]  [<ffffffffa4220f23>] vfs_fstatat+0x63/0xc0
      [ 7920.369344]  [<ffffffffa42214f1>] SYSC_newlstat+0x31/0x60
      [ 7920.369857]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.370495]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.371115]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.371737]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.372363]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.372987]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.373604]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.374250]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.374872]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.375495]  [<ffffffffa47206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.376114]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.376736]  [<ffffffffa422177e>] SyS_newlstat+0xe/0x10
      [ 7920.377240]  [<ffffffffa4720795>] system_call_fastpath+0x1c/0x21
      [ 7920.377806]  [<ffffffffa47206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 8022.948194] Lustre: lustre-OST0004-osc-ffff8e6e2480f800: Connection restored to 10.9.4.117@tcp (at 10.9.4.117@tcp)
      [ 8023.080282] LustreError: 4670:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
      [ 8023.080306] LustreError: 4664:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
      

      From the second client’s (vm10) console log, we see similar calls as in the first client’s call stack

      [ 7722.357867] Lustre: DEBUG MARKER: == racer test 1: racer on clients: trevis-10vm10,trevis-10vm9.trevis.whamcloud.com DURATION=300 ====== 12:19:23 (1536495563)
      [ 7722.541747] Lustre: DEBUG MARKER: DURATION=300 			MDSCOUNT=1 OSTCOUNT=7			RACER_ENABLE_REMOTE_DIRS=false 			RACER_ENABLE_STRIPED_DIRS=false 			RACER_ENABLE_MIGRATION=false 			RACER_ENABLE_PFL=true 			RACER_ENABLE_DOM=true 			RACER_ENABLE_FLR=true 			LFS=/usr/bin/lfs 			/usr/lib64/lustre/t
      [ 7722.542686] Lustre: DEBUG MARKER: DURATION=300 			MDSCOUNT=1 OSTCOUNT=7			RACER_ENABLE_REMOTE_DIRS=false 			RACER_ENABLE_STRIPED_DIRS=false 			RACER_ENABLE_MIGRATION=false 			RACER_ENABLE_PFL=true 			RACER_ENABLE_DOM=true 			RACER_ENABLE_FLR=true 			LFS=/usr/bin/lfs 			/usr/lib64/lustre/t
      [ 7726.494344] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200007164:0x60:0x0], use llapi_layout_get_by_path()
      [ 7920.259874] INFO: task dir_create.sh:14445 blocked for more than 120 seconds.
      [ 7920.260874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.261634] dir_create.sh   D ffff9bac39240fd0     0 14445  14433 0x00000080
      [ 7920.262414] Call Trace:
      [ 7920.262749]  [<ffffffffc0e427e2>] ? ll_dcompare+0x72/0x2e0 [lustre]
      [ 7920.263384]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.264027]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.264623]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.265151]  [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7
      [ 7920.265692]  [<ffffffff88a2927f>] link_path_walk+0x80f/0x8b0
      [ 7920.266253]  [<ffffffff88a2b75d>] ? do_last+0x66d/0x12c0
      [ 7920.266782]  [<ffffffff88a2c465>] path_openat+0xb5/0x640
      [ 7920.267314]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.267848]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.268392]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.268943]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.269595]  [<ffffffff88a1a44e>] SyS_open+0x1e/0x20
      [ 7920.270104]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.270693]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.271383] INFO: task ls:23458 blocked for more than 120 seconds.
      [ 7920.271998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.272770] ls              D ffff9bac1fd82f70     0 23458  14490 0x00000080
      [ 7920.273503] Call Trace:
      [ 7920.273774]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.274406]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.275021]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.275525]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.276048]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.276582]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.277251]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.277784]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.278318]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.278858]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.279515]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.280033]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.280614]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.281265] INFO: task ls:23462 blocked for more than 120 seconds.
      [ 7920.281868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.282636] ls              D ffff9bac1fc13f40     0 23462  14490 0x00000080
      [ 7920.283377] Call Trace:
      [ 7920.283638]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.284279]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.284885]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.285399]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.285922]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.286448]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.287110]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.287631]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.288173]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.288717]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.289361]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.289878]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.290470]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.291141] INFO: task ls:23464 blocked for more than 120 seconds.
      [ 7920.291746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.292497] ls              D ffff9bac20318000     0 23464  14490 0x00000080
      [ 7920.293256] Call Trace:
      [ 7920.293500]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.294150]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.294767]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.295280]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.295790]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.296327]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.296990]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.297510]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.298056]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.298587]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.299256]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.299775]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.300367]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.301015] INFO: task ls:23466 blocked for more than 120 seconds.
      [ 7920.301612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.302395] ls              D ffff9bac2031af70     0 23466  14490 0x00000080
      [ 7920.303136] Call Trace:
      [ 7920.303397]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.304035]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.304632]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.305155]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.305673]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.306200]  [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.306762]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.307431]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.307963]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.308493]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.309037]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.309683]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.310196]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.310786]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.311447] INFO: task getfattr:1379 blocked for more than 120 seconds.
      [ 7920.312100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.312876] getfattr        D ffff9bac1d229fa0     0  1379  14538 0x00000080
      [ 7920.313611] Call Trace:
      [ 7920.313884]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.314513]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.315128]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.315631]  [<ffffffff88f0bcde>] lookup_slow+0x33/0xa7
      [ 7920.316162]  [<ffffffff88a2aa08>] path_lookupat+0x838/0x8b0
      [ 7920.316722]  [<ffffffff88a42b12>] ? generic_getxattr+0x52/0x70
      [ 7920.317294]  [<ffffffff88a42f6a>] ? vfs_getxattr+0x8a/0xb0
      [ 7920.317855]  [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0
      [ 7920.318436]  [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.319001]  [<ffffffff88a2aaab>] filename_lookup+0x2b/0xc0
      [ 7920.319544]  [<ffffffff88a2dee7>] user_path_at_empty+0x67/0xc0
      [ 7920.320131]  [<ffffffff88f1b57c>] ? __do_page_fault+0x1bc/0x4f0
      [ 7920.320718]  [<ffffffff88a2df51>] user_path_at+0x11/0x20
      [ 7920.321247]  [<ffffffff88a43d2a>] SyS_getxattr+0x4a/0xc0
      [ 7920.321777]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.322439]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.323091]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.323689]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.324352] INFO: task ln:2464 blocked for more than 120 seconds.
      [ 7920.324955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.325711] ln              D ffff9bac21380fd0     0  2464  14485 0x00000080
      [ 7920.326458] Call Trace:
      [ 7920.326712]  [<ffffffff88a2a2f2>] ? path_lookupat+0x122/0x8b0
      [ 7920.327296]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.327932]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.328532]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.329053]  [<ffffffff88a2abc5>] filename_create+0x85/0x180
      [ 7920.329608]  [<ffffffff889f7a35>] ? kmem_cache_alloc+0x35/0x1f0
      [ 7920.330201]  [<ffffffff88a2cd4f>] ? getname_flags+0x4f/0x1a0
      [ 7920.330775]  [<ffffffff88a2cdc4>] ? getname_flags+0xc4/0x1a0
      [ 7920.331338]  [<ffffffff88a2d071>] user_path_create+0x41/0x60
      [ 7920.331898]  [<ffffffff88a2e5e8>] SyS_symlinkat+0x58/0x110
      [ 7920.332441]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.333091]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.333755]  [<ffffffff88a2e6b6>] SyS_symlink+0x16/0x20
      [ 7920.334279]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.334869]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.335512] INFO: task ls:2582 blocked for more than 120 seconds.
      [ 7920.336114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.336890] ls              D ffff9bac1d329fa0     0  2582  14455 0x00000080
      [ 7920.337619] Call Trace:
      [ 7920.337894]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.338519]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.339135]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.339640]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.340160]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.340700]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.341373]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.341902]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.342438]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.342981]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.343618]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.344142]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.344757]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.345402] INFO: task ls:2583 blocked for more than 120 seconds.
      [ 7920.346003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.346774] ls              D ffff9bac1d32dee0     0  2583  14455 0x00000080
      [ 7920.347511] Call Trace:
      [ 7920.347787]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.348417]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.349033]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.349539]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.350059]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.350587]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.351271]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.351805]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.352339]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.352881]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.353524]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.354050]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.354631]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 7920.355301] INFO: task ls:2584 blocked for more than 120 seconds.
      [ 7920.355903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7920.356685] ls              D ffff9bac1e540fd0     0  2584  14455 0x00000080
      [ 7920.357413] Call Trace:
      [ 7920.357688]  [<ffffffff88f14f49>] schedule_preempt_disabled+0x29/0x70
      [ 7920.358315]  [<ffffffff88f12d07>] __mutex_lock_slowpath+0xc7/0x1d0
      [ 7920.358927]  [<ffffffff88f120ef>] mutex_lock+0x1f/0x2f
      [ 7920.359432]  [<ffffffff88a2b37f>] do_last+0x28f/0x12c0
      [ 7920.359955]  [<ffffffff88a2c487>] path_openat+0xd7/0x640
      [ 7920.360484]  [<ffffffffc0e4e929>] ? ll_file_data_put+0x89/0x180 [lustre]
      [ 7920.361166]  [<ffffffff88a2e01d>] do_filp_open+0x4d/0xb0
      [ 7920.361698]  [<ffffffff88a3b4c4>] ? __alloc_fd+0xc4/0x170
      [ 7920.362231]  [<ffffffff88a1a327>] do_sys_open+0x137/0x240
      [ 7920.362768]  [<ffffffff88f206d5>] ? system_call_after_swapgs+0xa2/0x146
      [ 7920.363411]  [<ffffffff88a1a464>] SyS_openat+0x14/0x20
      [ 7920.363931]  [<ffffffff88f20795>] system_call_fastpath+0x1c/0x21
      [ 7920.364512]  [<ffffffff88f206e1>] ? system_call_after_swapgs+0xae/0x146
      [ 8022.975327] LustreError: 4302:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
      [ 8022.975389] LustreError: 4303:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
      [ 8022.975622] LustreError: 4305:0:(file.c:4366:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200007161:0x1:0x0] error: rc = -4
      

      Description continued in the first comment due to JIRA description line limits.

      Attachments

        Issue Links

          Activity

            [LU-11359] racer test 1 times out with client hung in dir_create.sh, ls, … and MDS in ldlm_completion_ast()

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35197/
            Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: e5810126b3fb488a3fed37e085e3ca4ae585324c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35197/ Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: e5810126b3fb488a3fed37e085e3ca4ae585324c

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35197
            Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 69bc9c24f5ea8e9a952ad5ab76dc5e36f6b24b66

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35197 Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 69bc9c24f5ea8e9a952ad5ab76dc5e36f6b24b66

            Ah, cool. Thanks!

            sthiell Stephane Thiell added a comment - Ah, cool. Thanks!

            LU-12034 is being ported to b2_12, Stephane - Note the LTS12 tag there.

            pfarrell Patrick Farrell (Inactive) added a comment - LU-12034 is being ported to b2_12, Stephane - Note the LTS12 tag there.

            Latest patchset of https://review.whamcloud.com/34071/ introduced the use of lu_env_find (due to aa82cc83 LU-12034 obdclass: put all service's env on the list ) and thus it doesn't compile on b2_12 anymore. I wanted to rebase our branch against 2.12.2 and also use the patch that landed master (patchset 20) instead of patchset 12 that we are currently using in production. What is your recommendation? Or could you please backport it to b2_12? Thanks much!

            sthiell Stephane Thiell added a comment - Latest patchset of https://review.whamcloud.com/34071/ introduced the use of lu_env_find (due to aa82cc83 LU-12034 obdclass: put all service's env on the list ) and thus it doesn't compile on b2_12 anymore. I wanted to rebase our branch against 2.12.2 and also use the patch that landed master (patchset 20) instead of patchset 12 that we are currently using in production. What is your recommendation? Or could you please backport it to b2_12? Thanks much!

            Sorry missed the message. I have tested the LU-11359 patch but not LU-11285.

            simmonsja James A Simmons added a comment - Sorry missed the message. I have tested the LU-11359 patch but not LU-11285 .
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34071/
            Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9c028e74c2202a8a481557c4cb22225734aaf19f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34071/ Subject: LU-11359 mdt: fix mdt_dom_discard_data() timeouts Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9c028e74c2202a8a481557c4cb22225734aaf19f
            pjones Peter Jones added a comment -

            James

            Have you tested the current patch (and LU-11285)?

            Peter

            pjones Peter Jones added a comment - James Have you tested the current patch (and LU-11285 )? Peter
            simmonsja James A Simmons added a comment - - edited

            We are also seeing this issue for our production system. I want to add it can be reproduced with running the io500 script.

            simmonsja James A Simmons added a comment - - edited We are also seeing this issue for our production system. I want to add it can be reproduced with running the io500 script.

            People

              tappro Mikhail Pershin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: