Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2652

lov_io.c:222:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6194

    Description

      HAd this crash running sanityn test 12:

      [45366.990436] Lustre: DEBUG MARKER: == sanityn test 12: test lock ordering (link, stat, unlink) ============= 10:01:13 (1358607673)
      [45367.146837] Lustre: DEBUG MARKER: start dir: /mnt/lustre/lockdir=144115205255725085 file: /mnt/lustre/lockdir/lockfile=144115205255725083
      [45445.727378] LustreError: 20512:0:(lov_io.c:222:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: 
      [45445.728276] LustreError: 20512:0:(lov_io.c:222:lov_sub_get()) LBUG
      [45445.728751] Pid: 20512, comm: statmany
      [45445.729089] 
      [45445.729090] Call Trace:
      [45445.729696]  [<ffffffffa0a7a915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [45445.730184]  [<ffffffffa0a7af17>] lbug_with_loc+0x47/0xb0 [libcfs]
      [45445.730673]  [<ffffffffa14041ba>] lov_sub_get+0x4aa/0x690 [lov]
      [45445.731129]  [<ffffffffa1400132>] lov_sublock_env_get+0xd2/0x140 [lov]
      [45445.731612]  [<ffffffffa1401841>] lov_sublock_alloc+0xf1/0x450 [lov]
      [45445.732086]  [<ffffffffa1402a2c>] lov_lock_init_raid0+0x3ec/0xe50 [lov]
      [45445.732632]  [<ffffffffa13fa2ae>] lov_lock_init+0x1e/0x60 [lov]
      [45445.733135]  [<ffffffffa0bec52c>] cl_lock_hold_mutex+0x34c/0x660 [obdclass]
      [45445.733652]  [<ffffffffa0bec9a2>] cl_lock_request+0x62/0x270 [obdclass]
      [45445.734143]  [<ffffffffa090b1f9>] cl_glimpse_lock+0x179/0x480 [lustre]
      [45445.734635]  [<ffffffffa090ba65>] cl_glimpse_size0+0x1a5/0x1d0 [lustre]
      [45445.735142]  [<ffffffffa08c5298>] ll_inode_revalidate_it+0x198/0x1c0 [lustre]
      [45445.735748]  [<ffffffffa08c5309>] ll_getattr_it+0x49/0x170 [lustre]
      [45445.736230]  [<ffffffffa08c5467>] ll_getattr+0x37/0x40 [lustre]
      [45445.736719]  [<ffffffff81214fa3>] ? security_inode_getattr+0x23/0x30
      [45445.737183]  [<ffffffff81180891>] vfs_getattr+0x51/0x80
      [45445.737613]  [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0
      [45445.738070]  [<ffffffff81180920>] vfs_fstatat+0x60/0x80
      [45445.738508]  [<ffffffff81180a6b>] vfs_stat+0x1b/0x20
      [45445.738919]  [<ffffffff81180a94>] sys_newstat+0x24/0x50
      [45445.739332]  [<ffffffff8109b05a>] ? do_gettimeofday+0x1a/0x50
      [45445.740967]  [<ffffffff8100c5b5>] ? math_state_restore+0x45/0x60
      [45445.741415]  [<ffffffff814fb72e>] ? do_device_not_available+0xe/0x10
      [45445.741875]  [<ffffffff8100befb>] ? device_not_available+0x1b/0x20
      [45445.742317]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      [45445.742763] 
      [45445.744348] Kernel panic - not syncing: LBUG
      

      Crashdump is in /exports/crashdumps/192.168.10.217-2013-01-19-10\:02\:36

      Attachments

        Issue Links

          Activity

            [LU-2652] lov_io.c:222:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed
            jhammond John Hammond added a comment -

            Saw this running racer today on 2.4.51-3-g9f5eea8. Reproduces much more readily if you do:

            # llmount.sh
            # cd /mnt/lustre
            # touch 0 1
            # while true; do
                lfs swap_layouts $((RANDOM % 2)) $((RANDOM % 2))
            done &
            # while true; do
                lfs swap_layouts $((RANDOM % 2)) $((RANDOM % 2))
            done &
            

            The same is true of LU-2766.

            jhammond John Hammond added a comment - Saw this running racer today on 2.4.51-3-g9f5eea8. Reproduces much more readily if you do: # llmount.sh # cd /mnt/lustre # touch 0 1 # while true; do lfs swap_layouts $((RANDOM % 2)) $((RANDOM % 2)) done & # while true; do lfs swap_layouts $((RANDOM % 2)) $((RANDOM % 2)) done & The same is true of LU-2766 .
            green Oleg Drokin added a comment -

            Patches landed, issue no longer visible

            green Oleg Drokin added a comment - Patches landed, issue no longer visible
            jay Jinshan Xiong (Inactive) added a comment - - edited patches are at: http://review.whamcloud.com/5157 , http://review.whamcloud.com/5158 , http://review.whamcloud.com/5159 . Oleg is verifying them.

            I've known the root cause of the problem, need time to cook a patch

            jay Jinshan Xiong (Inactive) added a comment - I've known the root cause of the problem, need time to cook a patch
            [  756.853670] LustreError: 25456:0:(lov_object.c:595:lov_layout_change()) ASSERTION( list_empty(&hdr->coh_locks) ) failed: 
            [  756.854210] LustreError: 25456:0:(lov_object.c:595:lov_layout_change()) LBUG
            [  756.854498] Pid: 25456, comm: ll_sa_25403
            [  756.854724] 
            [  756.854725] Call Trace:
            [  756.855090]  [<ffffffffa0449915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            [  756.855355]  [<ffffffffa0449f17>] lbug_with_loc+0x47/0xb0 [libcfs]
            [  756.855617]  [<ffffffffa0a711af>] lov_layout_change+0x2df/0x350 [lov]
            [  756.855874]  [<ffffffffa0a714b7>] lov_conf_set+0x297/0x6a0 [lov]
            [  756.856124]  [<ffffffffa045a401>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            [  756.856384]  [<ffffffffa04629eb>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs]
            [  756.856807]  [<ffffffffa061ad08>] cl_conf_set+0x58/0x100 [obdclass]
            [  756.857073]  [<ffffffffa0dd71c8>] ll_layout_conf+0x78/0x250 [lustre]
            [  756.857334]  [<ffffffffa0de62ea>] ll_layout_lock_set+0x40a/0xd30 [lustre]
            [  756.857599]  [<ffffffffa0947320>] ? lmv_set_lock_data+0x100/0x480 [lmv]
            [  756.857862]  [<ffffffffa0de73ef>] ll_layout_refresh+0x7df/0xd80 [lustre]
            [  756.858128]  [<ffffffffa0e0a560>] ? ll_md_blocking_ast+0x0/0x720 [lustre]
            [  756.858411]  [<ffffffffa0765370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
            [  756.858685]  [<ffffffffa0e2d2ed>] vvp_io_fini+0xfd/0x1b0 [lustre]
            [  756.858939]  [<ffffffffa0a7b7ad>] ? lov_io_fini+0x16d/0x3d0 [lov]
            [  756.859202]  [<ffffffffa062c807>] cl_io_fini+0x77/0x260 [obdclass]
            [  756.859460]  [<ffffffffa0e24985>] cl_glimpse_size0+0xc5/0x1d0 [lustre]
            [  756.859725]  [<ffffffffa0e1cde2>] ll_agl_trigger+0xd2/0x350 [lustre]
            [  756.859983]  [<ffffffffa0e233a7>] ll_statahead_thread+0x2d7/0xf40 [lustre]
            [  756.860245]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
            [  756.860501]  [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre]
            [  756.860776]  [<ffffffff8100c14a>] child_rip+0xa/0x20
            [  756.861012]  [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre]
            [  756.861279]  [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre]
            [  756.861537]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
            [  756.861769] 
            [  756.885413] Kernel panic - not syncing: LBUG
            
            jay Jinshan Xiong (Inactive) added a comment - [ 756.853670] LustreError: 25456:0:(lov_object.c:595:lov_layout_change()) ASSERTION( list_empty(&hdr->coh_locks) ) failed: [ 756.854210] LustreError: 25456:0:(lov_object.c:595:lov_layout_change()) LBUG [ 756.854498] Pid: 25456, comm: ll_sa_25403 [ 756.854724] [ 756.854725] Call Trace: [ 756.855090] [<ffffffffa0449915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 756.855355] [<ffffffffa0449f17>] lbug_with_loc+0x47/0xb0 [libcfs] [ 756.855617] [<ffffffffa0a711af>] lov_layout_change+0x2df/0x350 [lov] [ 756.855874] [<ffffffffa0a714b7>] lov_conf_set+0x297/0x6a0 [lov] [ 756.856124] [<ffffffffa045a401>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [ 756.856384] [<ffffffffa04629eb>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs] [ 756.856807] [<ffffffffa061ad08>] cl_conf_set+0x58/0x100 [obdclass] [ 756.857073] [<ffffffffa0dd71c8>] ll_layout_conf+0x78/0x250 [lustre] [ 756.857334] [<ffffffffa0de62ea>] ll_layout_lock_set+0x40a/0xd30 [lustre] [ 756.857599] [<ffffffffa0947320>] ? lmv_set_lock_data+0x100/0x480 [lmv] [ 756.857862] [<ffffffffa0de73ef>] ll_layout_refresh+0x7df/0xd80 [lustre] [ 756.858128] [<ffffffffa0e0a560>] ? ll_md_blocking_ast+0x0/0x720 [lustre] [ 756.858411] [<ffffffffa0765370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [ 756.858685] [<ffffffffa0e2d2ed>] vvp_io_fini+0xfd/0x1b0 [lustre] [ 756.858939] [<ffffffffa0a7b7ad>] ? lov_io_fini+0x16d/0x3d0 [lov] [ 756.859202] [<ffffffffa062c807>] cl_io_fini+0x77/0x260 [obdclass] [ 756.859460] [<ffffffffa0e24985>] cl_glimpse_size0+0xc5/0x1d0 [lustre] [ 756.859725] [<ffffffffa0e1cde2>] ll_agl_trigger+0xd2/0x350 [lustre] [ 756.859983] [<ffffffffa0e233a7>] ll_statahead_thread+0x2d7/0xf40 [lustre] [ 756.860245] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [ 756.860501] [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre] [ 756.860776] [<ffffffff8100c14a>] child_rip+0xa/0x20 [ 756.861012] [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre] [ 756.861279] [<ffffffffa0e230d0>] ? ll_statahead_thread+0x0/0xf40 [lustre] [ 756.861537] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [ 756.861769] [ 756.885413] Kernel panic - not syncing: LBUG
            green Oleg Drokin added a comment -

            Sometimes instead of the crash I am getting this sort of a locked up thread:

            .111182] Lustre: DEBUG MARKER: == sanityn test 12: test lock ordering (link, sta
            .378607] Lustre: DEBUG MARKER: start dir: /mnt/lustre/lockdir=144115205255725085
            .812526] LNet: Service thread pid 12451 was inactive for 40.00s. The thread migh
            .813279] Pid: 12451, comm: mdt00_001
            .813563] 
            .813564] Call Trace:
            .813979]  [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30
            .814234]  [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0
            .814507]  [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10
            .814762]  [<ffffffffa0e00771>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
            .815052]  [<ffffffffa053184d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc]
            .815338]  [<ffffffffa052d360>] ? ldlm_expired_completion_wait+0x0/0x260 [ptlrpc]
            .815791]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
            .816062]  [<ffffffffa0530f98>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
            .816351]  [<ffffffffa0531370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
            .816692]  [<ffffffffa09dc830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
            .816980]  [<ffffffffa09dedc0>] mdt_object_lock0+0x360/0xb10 [mdt]
            .817278]  [<ffffffffa09dc830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
            .817584]  [<ffffffffa0531370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc]
            .817878]  [<ffffffffa09df614>] mdt_object_lock+0x14/0x20 [mdt]
            .818158]  [<ffffffffa09fef3a>] mdt_reint_link+0x3fa/0x910 [mdt]
            .818459]  [<ffffffffa0559c5e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
            .818918]  [<ffffffffa09faa91>] mdt_reint_rec+0x41/0xe0 [mdt]
            .819194]  [<ffffffffa09f4133>] mdt_reint_internal+0x4f3/0x7e0 [mdt]
            .819495]  [<ffffffffa09f4464>] mdt_reint+0x44/0xe0 [mdt]
            .819764]  [<ffffffffa09e4782>] mdt_handle_common+0x8e2/0x1680 [mdt]
            .820052]  [<ffffffffa0a19ed5>] mds_regular_handle+0x15/0x20 [mdt]
            .820352]  [<ffffffffa0568483>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc]
            .820872]  [<ffffffffa0e0065e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            .821172]  [<ffffffffa0561221>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
            .821474]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
            .821758]  [<ffffffffa056af8d>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc]
            .822055]  [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
            .822333]  [<ffffffff8100c14a>] child_rip+0xa/0x20
            .822608]  [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
            .822904]  [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc]
            .823181]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
            .823438] 
            .. (this thread actually finished after 100 seconds) ...
            .864250] INFO: task statmany:14503 blocked for more than 120 seconds.
            .865404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag
            .867239] statmany      D 0000000000000000  2608 14503  14484 0x00000000
            .868449]  ffff880055affb10 0000000000000086 ffff880055affad8 ffff880055affad4
            .869752]  ffff880055affaf8 ffff8800bcc23080 ffff8800063167c0 0000000000000000
            .870198]  ffff8800aa03e8b8 ffff880055afffd8 000000000000fba8 ffff8800aa03e8b8
            .870647] Call Trace:
            .870842]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
            .871102]  [<ffffffff81054aaa>] ? enqueue_task_fair+0x14a/0x4e0
            .871356]  [<ffffffff81096d6a>] ? sched_clock_cpu+0x6a/0x110
            .871610]  [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30
            .871867]  [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30
            .872156]  [<ffffffff814f9ec7>] ? down_read+0x37/0x40
            .872474]  [<ffffffffa0717cba>] lov_lsm_addref+0x2a/0x130 [lov]
            .872796]  [<ffffffffa0717dea>] lov_lsm_get+0x2a/0x40 [lov]
            .873063]  [<ffffffffa0b28c95>] ccc_inode_lsm_get+0x15/0x20 [lustre]
            .873332]  [<ffffffffa0addb83>] ll_merge_lvb+0x53/0x5e0 [lustre]
            .873600]  [<ffffffff814f919e>] ? mutex_unlock+0xe/0x10
            .873863]  [<ffffffffa0f6dd0a>] ? cl_lock_mutex_put+0x7a/0x90 [obdclass]
            .874149]  [<ffffffffa0b283a8>] cl_glimpse_lock+0x328/0x480 [lustre]
            .874445]  [<ffffffffa0b28a65>] cl_glimpse_size0+0x1a5/0x1d0 [lustre]
            .874735]  [<ffffffffa0ae2298>] ll_inode_revalidate_it+0x198/0x1c0 [lustre]
            .875036]  [<ffffffffa0ae2309>] ll_getattr_it+0x49/0x170 [lustre]
            .875322]  [<ffffffffa0ae2467>] ll_getattr+0x37/0x40 [lustre]
            .875598]  [<ffffffff81214fa3>] ? security_inode_getattr+0x23/0x30
            .875875]  [<ffffffff81180891>] vfs_getattr+0x51/0x80
            .876154]  [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0
            .876439]  [<ffffffff81180920>] vfs_fstatat+0x60/0x80
            .876696]  [<ffffffff81180a6b>] vfs_stat+0x1b/0x20
            .876947]  [<ffffffff81180a94>] sys_newstat+0x24/0x50
            .877203]  [<ffffffff8109b05a>] ? do_gettimeofday+0x1a/0x50
            .877473]  [<ffffffff8100c5b5>] ? math_state_restore+0x45/0x60
            .877751]  [<ffffffff814fb72e>] ? do_device_not_available+0xe/0x10
            .878001]  [<ffffffff8100befb>] ? device_not_available+0x1b/0x20
            .878277]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
            .878552] INFO: task ll_imp_inval:14515 blocked for more than 120 seconds.
            .878838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag
            .879297] ll_imp_inval  D 0000000000000003  4640 14515      2 0x00000000
            .879607]  ffff8800b5f31980 0000000000000046 0000000000000000 ffffffff810385d8
            .880107]  ffff8800ffffffff 00075e5906b4fe40 ffff8800ffffffff ffff8800b5f31980
            .880590]  ffff8800ac106af8 ffff8800b5f31fd8 000000000000fba8 ffff8800ac106af8
            .881064] Call Trace:
            .881268]  [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0
            .881548]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
            .881807]  [<ffffffff814fad23>] rwsem_down_write_failed+0x23/0x30
            .882054]  [<ffffffff8127c133>] call_rwsem_down_write_failed+0x13/0x20
            .882339]  [<ffffffff814f9e85>] ? down_write+0x45/0x50
            .882605]  [<ffffffffa0716e44>] lov_conf_lock+0x34/0xc0 [lov]
            .882877]  [<ffffffffa0717268>] lov_conf_set+0x48/0x6a0 [lov]
            .883163]  [<ffffffffa0f65aad>] ? cl_env_nested_get+0x5d/0xc0 [obdclass]
            .883465]  [<ffffffffa0e189eb>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs]
            .883771]  [<ffffffffa0f63d08>] cl_conf_set+0x58/0x100 [obdclass]
            .884076]  [<ffffffffa0adb1c8>] ll_layout_conf+0x78/0x250 [lustre]
            .884365]  [<ffffffffa0b0e947>] ll_md_blocking_ast+0x3e7/0x720 [lustre]
            .884655]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
            .884936]  [<ffffffffa051106c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc]
            .885243]  [<ffffffffa052afda>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
            .885561]  [<ffffffffa052f8cb>] ldlm_cli_cancel+0x5b/0x360 [ptlrpc]
            .885848]  [<ffffffffa0519c1d>] cleanup_resource+0x12d/0x300 [ptlrpc]
            .886128]  [<ffffffffa0519e1f>] ldlm_resource_clean+0x2f/0x60 [ptlrpc]
            .886993]  [<ffffffffa0e167af>] cfs_hash_for_each_relax+0x17f/0x360 [libcfs]
            .887434]  [<ffffffffa0519df0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc]
            .887717]  [<ffffffffa0519df0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc]
            .887997]  [<ffffffffa0e1817f>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs]
            .888463]  [<ffffffffa051685e>] ldlm_namespace_cleanup+0x2e/0xc0 [ptlrpc]
            .888800]  [<ffffffffa04bbc81>] mdc_import_event+0x1c1/0xa30 [mdc]
            .889143]  [<ffffffffa057993f>] ptlrpc_invalidate_import+0x29f/0x8c0 [ptlrpc]
            .889596]  [<ffffffffa0e10401>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            .889870]  [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr
            .890341]  [<ffffffffa0579faf>] ptlrpc_invalidate_import_thread+0x4f/0x2e0 [ptlrp
            .890807]  [<ffffffff8100c14a>] child_rip+0xa/0x20
            .891077]  [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr
            .892918]  [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr
            .893398]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
            (these two threads never complete)
            
            green Oleg Drokin added a comment - Sometimes instead of the crash I am getting this sort of a locked up thread: .111182] Lustre: DEBUG MARKER: == sanityn test 12: test lock ordering (link, sta .378607] Lustre: DEBUG MARKER: start dir: /mnt/lustre/lockdir=144115205255725085 .812526] LNet: Service thread pid 12451 was inactive for 40.00s. The thread migh .813279] Pid: 12451, comm: mdt00_001 .813563] .813564] Call Trace: .813979] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 .814234] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 .814507] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 .814762] [<ffffffffa0e00771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] .815052] [<ffffffffa053184d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] .815338] [<ffffffffa052d360>] ? ldlm_expired_completion_wait+0x0/0x260 [ptlrpc] .815791] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 .816062] [<ffffffffa0530f98>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] .816351] [<ffffffffa0531370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] .816692] [<ffffffffa09dc830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] .816980] [<ffffffffa09dedc0>] mdt_object_lock0+0x360/0xb10 [mdt] .817278] [<ffffffffa09dc830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] .817584] [<ffffffffa0531370>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] .817878] [<ffffffffa09df614>] mdt_object_lock+0x14/0x20 [mdt] .818158] [<ffffffffa09fef3a>] mdt_reint_link+0x3fa/0x910 [mdt] .818459] [<ffffffffa0559c5e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] .818918] [<ffffffffa09faa91>] mdt_reint_rec+0x41/0xe0 [mdt] .819194] [<ffffffffa09f4133>] mdt_reint_internal+0x4f3/0x7e0 [mdt] .819495] [<ffffffffa09f4464>] mdt_reint+0x44/0xe0 [mdt] .819764] [<ffffffffa09e4782>] mdt_handle_common+0x8e2/0x1680 [mdt] .820052] [<ffffffffa0a19ed5>] mds_regular_handle+0x15/0x20 [mdt] .820352] [<ffffffffa0568483>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] .820872] [<ffffffffa0e0065e>] ? cfs_timer_arm+0xe/0x10 [libcfs] .821172] [<ffffffffa0561221>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] .821474] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 .821758] [<ffffffffa056af8d>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] .822055] [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] .822333] [<ffffffff8100c14a>] child_rip+0xa/0x20 .822608] [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] .822904] [<ffffffffa056a450>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] .823181] [<ffffffff8100c140>] ? child_rip+0x0/0x20 .823438] .. (this thread actually finished after 100 seconds) ... .864250] INFO: task statmany:14503 blocked for more than 120 seconds. .865404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag .867239] statmany D 0000000000000000 2608 14503 14484 0x00000000 .868449] ffff880055affb10 0000000000000086 ffff880055affad8 ffff880055affad4 .869752] ffff880055affaf8 ffff8800bcc23080 ffff8800063167c0 0000000000000000 .870198] ffff8800aa03e8b8 ffff880055afffd8 000000000000fba8 ffff8800aa03e8b8 .870647] Call Trace: .870842] [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0 .871102] [<ffffffff81054aaa>] ? enqueue_task_fair+0x14a/0x4e0 .871356] [<ffffffff81096d6a>] ? sched_clock_cpu+0x6a/0x110 .871610] [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30 .871867] [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30 .872156] [<ffffffff814f9ec7>] ? down_read+0x37/0x40 .872474] [<ffffffffa0717cba>] lov_lsm_addref+0x2a/0x130 [lov] .872796] [<ffffffffa0717dea>] lov_lsm_get+0x2a/0x40 [lov] .873063] [<ffffffffa0b28c95>] ccc_inode_lsm_get+0x15/0x20 [lustre] .873332] [<ffffffffa0addb83>] ll_merge_lvb+0x53/0x5e0 [lustre] .873600] [<ffffffff814f919e>] ? mutex_unlock+0xe/0x10 .873863] [<ffffffffa0f6dd0a>] ? cl_lock_mutex_put+0x7a/0x90 [obdclass] .874149] [<ffffffffa0b283a8>] cl_glimpse_lock+0x328/0x480 [lustre] .874445] [<ffffffffa0b28a65>] cl_glimpse_size0+0x1a5/0x1d0 [lustre] .874735] [<ffffffffa0ae2298>] ll_inode_revalidate_it+0x198/0x1c0 [lustre] .875036] [<ffffffffa0ae2309>] ll_getattr_it+0x49/0x170 [lustre] .875322] [<ffffffffa0ae2467>] ll_getattr+0x37/0x40 [lustre] .875598] [<ffffffff81214fa3>] ? security_inode_getattr+0x23/0x30 .875875] [<ffffffff81180891>] vfs_getattr+0x51/0x80 .876154] [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0 .876439] [<ffffffff81180920>] vfs_fstatat+0x60/0x80 .876696] [<ffffffff81180a6b>] vfs_stat+0x1b/0x20 .876947] [<ffffffff81180a94>] sys_newstat+0x24/0x50 .877203] [<ffffffff8109b05a>] ? do_gettimeofday+0x1a/0x50 .877473] [<ffffffff8100c5b5>] ? math_state_restore+0x45/0x60 .877751] [<ffffffff814fb72e>] ? do_device_not_available+0xe/0x10 .878001] [<ffffffff8100befb>] ? device_not_available+0x1b/0x20 .878277] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b .878552] INFO: task ll_imp_inval:14515 blocked for more than 120 seconds. .878838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag .879297] ll_imp_inval D 0000000000000003 4640 14515 2 0x00000000 .879607] ffff8800b5f31980 0000000000000046 0000000000000000 ffffffff810385d8 .880107] ffff8800ffffffff 00075e5906b4fe40 ffff8800ffffffff ffff8800b5f31980 .880590] ffff8800ac106af8 ffff8800b5f31fd8 000000000000fba8 ffff8800ac106af8 .881064] Call Trace: .881268] [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0 .881548] [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0 .881807] [<ffffffff814fad23>] rwsem_down_write_failed+0x23/0x30 .882054] [<ffffffff8127c133>] call_rwsem_down_write_failed+0x13/0x20 .882339] [<ffffffff814f9e85>] ? down_write+0x45/0x50 .882605] [<ffffffffa0716e44>] lov_conf_lock+0x34/0xc0 [lov] .882877] [<ffffffffa0717268>] lov_conf_set+0x48/0x6a0 [lov] .883163] [<ffffffffa0f65aad>] ? cl_env_nested_get+0x5d/0xc0 [obdclass] .883465] [<ffffffffa0e189eb>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs] .883771] [<ffffffffa0f63d08>] cl_conf_set+0x58/0x100 [obdclass] .884076] [<ffffffffa0adb1c8>] ll_layout_conf+0x78/0x250 [lustre] .884365] [<ffffffffa0b0e947>] ll_md_blocking_ast+0x3e7/0x720 [lustre] .884655] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 .884936] [<ffffffffa051106c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc] .885243] [<ffffffffa052afda>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] .885561] [<ffffffffa052f8cb>] ldlm_cli_cancel+0x5b/0x360 [ptlrpc] .885848] [<ffffffffa0519c1d>] cleanup_resource+0x12d/0x300 [ptlrpc] .886128] [<ffffffffa0519e1f>] ldlm_resource_clean+0x2f/0x60 [ptlrpc] .886993] [<ffffffffa0e167af>] cfs_hash_for_each_relax+0x17f/0x360 [libcfs] .887434] [<ffffffffa0519df0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc] .887717] [<ffffffffa0519df0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc] .887997] [<ffffffffa0e1817f>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] .888463] [<ffffffffa051685e>] ldlm_namespace_cleanup+0x2e/0xc0 [ptlrpc] .888800] [<ffffffffa04bbc81>] mdc_import_event+0x1c1/0xa30 [mdc] .889143] [<ffffffffa057993f>] ptlrpc_invalidate_import+0x29f/0x8c0 [ptlrpc] .889596] [<ffffffffa0e10401>] ? libcfs_debug_msg+0x41/0x50 [libcfs] .889870] [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr .890341] [<ffffffffa0579faf>] ptlrpc_invalidate_import_thread+0x4f/0x2e0 [ptlrp .890807] [<ffffffff8100c14a>] child_rip+0xa/0x20 .891077] [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr .892918] [<ffffffffa0579f60>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlr .893398] [<ffffffff8100c140>] ? child_rip+0x0/0x20 (these two threads never complete)

            People

              jay Jinshan Xiong (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: