Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.5.0, Lustre 2.7.0, Lustre 2.5.3
    • None
    • client and server: lustre-b2_5 RHEL6 build #2
    • 3
    • 11021

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/758c2064-3456-11e3-9356-52540035b04c.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out

      client console shows:

      08:50:18:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-13vm1.lab.whamcloud.com,client-13vm2 DURATION=900 == 08:49:53 (1381592993)
      08:50:18:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      08:50:19:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
      08:54:39:Lustre: lustre-OST0004-osc-ffff880032581c00: Connection to lustre-OST0004 (at 10.10.4.212@tcp) was lost; in progress operations using this service will wait for recovery to complete
      08:54:42:LustreError: 11-0: lustre-OST0004-osc-ffff880032581c00: Communicating with 10.10.4.212@tcp, operation ldlm_enqueue failed with -107.
      08:54:44:LustreError: 167-0: lustre-OST0004-osc-ffff880032581c00: This client was evicted by lustre-OST0004; in progress operations using this service will fail.
      08:54:44:Lustre: 26160:0:(llite_lib.c:2484:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.211@tcp:/lustre/fid: [0x200000403:0x158b:0x0]/ may get corrupted (rc -108)
      08:54:45:Lustre: 26161:0:(llite_lib.c:2484:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.10.4.211@tcp:/lustre/fid: [0x200000402:0x185b:0x0]//racer/11/11/11 may get corrupted (rc -108)
      08:54:45:Lustre: lustre-OST0004-osc-ffff880032581c00: Connection restored to lustre-OST0004 (at 10.10.4.212@tcp)
      09:01:24:INFO: task dir_create.sh:7007 blocked for more than 120 seconds.
      09:01:25:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:01:26:dir_create.sh D 0000000000000000     0  7007   6999 0x00000080
      09:01:27: ffff880029a7fbd8 0000000000000086 525971bf00000000 0000000000057b80
      09:01:27: 00001b5f00000000 00000d1d00000000 ffff880029a7fc38 ffffffffa08e9d3d
      09:01:27: ffff88004badf098 ffff880029a7ffd8 000000000000fb88 ffff88004badf098
      09:01:27:Call Trace:
      09:01:27: [<ffffffffa08e9d3d>] ? ll_revalidate_it+0x48d/0x1b20 [lustre]
      09:01:27: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:01:27: [<ffffffff8119af58>] ? __d_lookup+0xd8/0x150
      09:01:27: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:01:28: [<ffffffff811907ab>] do_lookup+0x11b/0x230
      09:01:28: [<ffffffff81190acd>] __link_path_walk+0x20d/0x1030
      09:01:28: [<ffffffff81191b7a>] path_walk+0x6a/0xe0
      09:01:28: [<ffffffff81192d91>] do_filp_open+0x201/0xdc0
      09:01:29: [<ffffffff8104759c>] ? __do_page_fault+0x1ec/0x480
      09:01:30: [<ffffffff81014f79>] ? init_fpu+0x59/0xc0
      09:01:30: [<ffffffff8119f922>] ? alloc_fd+0x92/0x160
      09:01:30: [<ffffffff8117e249>] do_sys_open+0x69/0x140
      09:01:31: [<ffffffff8117e360>] sys_open+0x20/0x30
      09:01:31: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:01:31:INFO: task dir_create.sh:7016 blocked for more than 120 seconds.
      09:01:31:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:01:32:dir_create.sh D 0000000000000000     0  7016   6999 0x00000080
      09:01:32: ffff88007cdefbd8 0000000000000086 525971bf00000000 000000000009639c
      09:01:32: 00001b6800000000 00000d1d00000000 ffff88007cdefc38 ffffffffa08e9d3d
      09:01:32: ffff88007cdedab8 ffff88007cdeffd8 000000000000fb88 ffff88007cdedab8
      09:01:32:Call Trace:
      09:01:32: [<ffffffffa08e9d3d>] ? ll_revalidate_it+0x48d/0x1b20 [lustre]
      09:01:33: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:01:34: [<ffffffff8119af58>] ? __d_lookup+0xd8/0x150
      09:01:34: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:01:35: [<ffffffff811907ab>] do_lookup+0x11b/0x230
      09:01:35: [<ffffffff81190acd>] __link_path_walk+0x20d/0x1030
      09:01:35: [<ffffffff81191b7a>] path_walk+0x6a/0xe0
      09:01:36: [<ffffffff81192d91>] do_filp_open+0x201/0xdc0
      09:01:36: [<ffffffff8104759c>] ? __do_page_fault+0x1ec/0x480
      09:01:36: [<ffffffff81014f79>] ? init_fpu+0x59/0xc0
      09:01:38: [<ffffffff8119f922>] ? alloc_fd+0x92/0x160
      09:01:38: [<ffffffff8117e249>] do_sys_open+0x69/0x140
      09:01:39: [<ffffffff8117e360>] sys_open+0x20/0x30
      09:01:39: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:01:39:INFO: task rm:14589 blocked for more than 120 seconds.
      09:01:39:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:01:39:rm            D 0000000000000000     0 14589   7055 0x00000080
      09:01:40: ffff880068037e08 0000000000000086 ffff8800103e4040 ffff8800103e4040
      09:01:40: ffff8800103e4040 ffff880068169040 ffff8800103e4040 0000001000000286
      09:01:40: ffff8800103e45f8 ffff880068037fd8 000000000000fb88 ffff8800103e45f8
      09:01:40:Call Trace:
      09:01:40: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:01:41: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:01:41: [<ffffffff81192367>] do_rmdir+0xb7/0x120
      09:01:42: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:01:43: [<ffffffff811923fd>] sys_unlinkat+0x2d/0x40
      09:01:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:01:43:INFO: task mv:14594 blocked for more than 120 seconds.
      09:01:44:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:01:46:mv            D 0000000000000000     0 14594   7009 0x00000080
      09:01:46: ffff8800376c7d08 0000000000000082 ffff8800376c7d18 ffffffff81190a15
      09:01:47: 0000003f1ec64000 ffffea00016d5268 ffff8800376c7d08 ffff880018523540
      09:01:47: ffff880018523af8 ffff8800376c7fd8 000000000000fb88 ffff880018523af8
      09:01:47:Call Trace:
      09:01:47: [<ffffffff81190a15>] ? __link_path_walk+0x155/0x1030
      09:01:50: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
      09:01:50: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:01:50: [<ffffffff8118f541>] ? path_put+0x31/0x40
      09:01:51: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:01:51: [<ffffffff8118e9b3>] lock_rename+0x63/0xe0
      09:01:51: [<ffffffff81191f9f>] sys_renameat+0x11f/0x240
      09:01:52: [<ffffffff81186b64>] ? cp_new_stat+0xe4/0x100
      09:01:53: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:01:53: [<ffffffff811920db>] sys_rename+0x1b/0x20
      09:01:53: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:01:53:INFO: task mv:14603 blocked for more than 120 seconds.
      09:01:54:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:01:54:mv            D 0000000000000000     0 14603   7038 0x00000080
      09:01:54: ffff88005fc1dd08 0000000000000082 ffff88005fc1dd18 ffffffff81190a15
      09:01:54: 0000003f1ec64000 ffffea00016d5268 ffff88005fc1dd08 ffff88002d8d4040
      09:01:54: ffff88002d8d45f8 ffff88005fc1dfd8 000000000000fb88 ffff88002d8d45f8
      09:01:54:Call Trace:
      09:01:55: [<ffffffff81190a15>] ? __link_path_walk+0x155/0x1030
      09:01:56: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
      09:01:57: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:01:57: [<ffffffff8118f541>] ? path_put+0x31/0x40
      09:01:58: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:01:58: [<ffffffff8118e98f>] lock_rename+0x3f/0xe0
      09:01:58: [<ffffffff81191f9f>] sys_renameat+0x11f/0x240
      09:01:58: [<ffffffff81186b64>] ? cp_new_stat+0xe4/0x100
      09:01:58: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:01:58: [<ffffffff811920db>] sys_rename+0x1b/0x20
      09:01:59: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:01:59:INFO: task ln:14613 blocked for more than 120 seconds.
      09:02:00:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:02:01:ln            D 0000000000000000     0 14613   7011 0x00000080
      09:02:02: ffff88005c457dd8 0000000000000082 ffff88005e721bb8 0000000000000000
      09:02:02: ffff88005c457d88 ffff88007c2f4ae0 ffff88007c2f4ae0 ffff88007c2f4ae0
      09:02:03: ffff88007c2f5098 ffff88005c457fd8 000000000000fb88 ffff88007c2f5098
      09:02:03:Call Trace:
      09:02:03: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
      09:02:03: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:02:03: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:02:03: [<ffffffff8118ebb0>] lookup_create+0x30/0xd0
      09:02:04: [<ffffffff811928d9>] sys_symlinkat+0x99/0x120
      09:02:05: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:02:07: [<ffffffff81192976>] sys_symlink+0x16/0x20
      09:02:07: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:02:08:INFO: task rm:14615 blocked for more than 120 seconds.
      09:02:08:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:02:09:rm            D 0000000000000000     0 14615   7046 0x00000080
      09:02:09: ffff88006088de08 0000000000000086 ffff880019cb7540 ffff880019cb7540
      09:02:10: ffff880019cb7540 ffff88005dd3c1b8 ffff880019cb7540 000000108119cfaf
      09:02:11: ffff880019cb7af8 ffff88006088dfd8 000000000000fb88 ffff880019cb7af8
      09:02:12:Call Trace:
      09:02:13: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:02:13: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:02:14: [<ffffffff81192367>] do_rmdir+0xb7/0x120
      09:02:14: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:02:15: [<ffffffff811923fd>] sys_unlinkat+0x2d/0x40
      09:02:15: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:02:15:INFO: task rm:14625 blocked for more than 120 seconds.
      09:02:16:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:02:16:rm            D 0000000000000000     0 14625   7026 0x00000080
      09:02:17: ffff8800376c5e08 0000000000000086 ffff88005e07f500 ffff88005e07f500
      09:02:17: ffff88005e07f500 ffff8800102690c0 ffff88005e07f500 0000001000000286
      09:02:17: ffff88005e07fab8 ffff8800376c5fd8 000000000000fb88 ffff88005e07fab8
      09:02:17:Call Trace:
      09:02:19: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:02:20: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:02:20: [<ffffffff81192367>] do_rmdir+0xb7/0x120
      09:02:20: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:02:20: [<ffffffff811923fd>] sys_unlinkat+0x2d/0x40
      09:02:20: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:02:21:INFO: task mv:14630 blocked for more than 120 seconds.
      09:02:21:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:02:22:mv            D 0000000000000000     0 14630   7018 0x00000080
      09:02:24: ffff8800670b5d08 0000000000000082 ffff8800670b5d18 ffffffff81190a15
      09:02:25: 0000003f1ec64000 ffffea00016d5268 ffff8800670b5c98 ffff8800103dd540
      09:02:25: ffff8800103ddaf8 ffff8800670b5fd8 000000000000fb88 ffff8800103ddaf8
      09:02:26:Call Trace:
      09:02:26: [<ffffffff81190a15>] ? __link_path_walk+0x155/0x1030
      09:02:26: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
      09:02:28: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:02:29: [<ffffffff8118f541>] ? path_put+0x31/0x40
      09:02:30: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:02:30: [<ffffffff8118e98f>] lock_rename+0x3f/0xe0
      09:02:30: [<ffffffff81191f9f>] sys_renameat+0x11f/0x240
      09:02:30: [<ffffffff81186b64>] ? cp_new_stat+0xe4/0x100
      09:02:30: [<ffffffff81186e3e>] ? vfs_lstat+0x1e/0x20
      09:02:30: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
      09:02:31: [<ffffffff811920db>] sys_rename+0x1b/0x20
      09:02:31: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:02:32:INFO: task ls:14643 blocked for more than 120 seconds.
      09:02:33:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      09:02:34:ls            D 0000000000000000     0 14643   7031 0x00000080
      09:02:34: ffff88005d631748 0000000000000086 ffff88005d631728 ffffffffa0807116
      09:02:34: 0000000000000000 ffffffffa086b240 ffff88005f9e6400 ffffffffa07e04c0
      09:02:35: ffff88007a273058 ffff88005d631fd8 000000000000fb88 ffff88007a273058
      09:02:35:Call Trace:
      09:02:35: [<ffffffffa0807116>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      09:02:35: [<ffffffffa07e04c0>] ? lustre_msg_string+0x0/0x290 [ptlrpc]
      09:02:36: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
      09:02:36: [<ffffffffa0493167>] ? mdc_open_pack+0x247/0x260 [mdc]
      09:02:37: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
      09:02:38: [<ffffffffa0497a7d>] mdc_enqueue+0x22d/0x1a10 [mdc]
      09:02:38: [<ffffffffa049945d>] mdc_intent_lock+0x1fd/0x64a [mdc]
      09:02:39: [<ffffffffa0932740>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
      09:02:39: [<ffffffffa07bf520>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      09:02:39: [<ffffffffa0495d71>] ? mdc_lock_match+0xb1/0x160 [mdc]
      09:02:39: [<ffffffffa03d85a3>] lmv_intent_open+0x213/0x8d0 [lmv]
      09:02:41: [<ffffffffa0932740>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
      09:02:42: [<ffffffffa03be7cb>] ? lmv_lock_match+0x1bb/0x470 [lmv]
      09:02:42: [<ffffffffa062ba15>] ? class_handle2object+0x95/0x190 [obdclass]
      09:02:42: [<ffffffffa03d8f1b>] lmv_intent_lock+0x2bb/0x380 [lmv]
      09:02:42: [<ffffffffa0932740>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
      09:02:42: [<ffffffffa0931e8d>] ? ll_i2gids+0x3d/0xd0 [lustre]
      09:02:42: [<ffffffffa0918c9d>] ? ll_prep_md_op_data+0x10d/0x3b0 [lustre]
      09:02:43: [<ffffffffa08e9b25>] ll_revalidate_it+0x275/0x1b20 [lustre]
      09:02:43: [<ffffffffa0932740>] ? ll_md_blocking_ast+0x0/0x790 [lustre]
      09:02:43: [<ffffffffa08eb503>] ll_revalidate_nd+0x133/0x3e0 [lustre]
      09:02:43: [<ffffffff811906f6>] do_lookup+0x66/0x230
      09:02:44: [<ffffffff81190ff4>] __link_path_walk+0x734/0x1030
      09:02:44: [<ffffffffa03c511d>] ? lmv_clear_open_replay_data+0x2ad/0x6d0 [lmv]
      09:02:45: [<ffffffffa07d06b0>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc]
      09:02:45: [<ffffffff81191b7a>] path_walk+0x6a/0xe0
      09:02:45: [<ffffffff81191d4b>] do_path_lookup+0x5b/0xa0
      09:02:45: [<ffffffff81182830>] ? get_empty_filp+0xa0/0x180
      09:02:46: [<ffffffff81192c8b>] do_filp_open+0xfb/0xdc0
      09:02:46: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
      09:02:46: [<ffffffff8119f922>] ? alloc_fd+0x92/0x160
      09:02:46: [<ffffffff8117e249>] do_sys_open+0x69/0x140
      09:02:46: [<ffffffff8100c535>] ? math_state_restore+0x45/0x60
      09:02:46: [<ffffffff8117e360>] sys_open+0x20/0x30
      09:02:47: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      09:02:47:Lustre: lustre-MDT0000-mdc-ffff880032581c00: Connection to lustre-MDT0000 (at 10.10.4.211@tcp) was lost; in progress operations using this service will wait for recovery to complete
      09:02:47:LustreError: 14642:0:(ldlm_request.c:134:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1381593535, 200s ago), entering recovery for lustre-MDT0000_UUID@10.10.4.211@tcp ns: lustre-MDT0000-mdc-ffff880032581c00 lock: ffff880067089b40/0x19bd7e0c7fd16370 lrc: 4/1,0 mode: --/PR res: [0x200000400:0x1:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x31861536cea3e045 expref: -99 pid: 14642 timeout: 0 lvb_type: 0
      

      Attachments

        Issue Links

          Activity

            [LU-4105] Failure on test suite racer test_1
            pjones Peter Jones added a comment -

            duplicate of LU-5144

            pjones Peter Jones added a comment - duplicate of LU-5144
            sarah Sarah Liu added a comment - an instance on master: https://testing.hpdd.intel.com/test_sets/49fc050a-44b7-11e4-bb5a-5254006e85c2
            yujian Jian Yu added a comment -

            As per Oleg, this will be fixed in LU-5144.
            The patch for Lustre b2_5 branch is in http://review.whamcloud.com/10918.

            yujian Jian Yu added a comment - As per Oleg, this will be fixed in LU-5144 . The patch for Lustre b2_5 branch is in http://review.whamcloud.com/10918 .
            yujian Jian Yu added a comment -
            yujian Jian Yu added a comment - Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/82/ Distro/Arch: RHEL6.5/x86_64 MDSCOUNT=2 The same failure occurred: https://testing.hpdd.intel.com/test_sets/750a9454-28cf-11e4-85c7-5254006e85c2
            yujian Jian Yu added a comment -

            Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/80/
            Distro/Arch: RHEL6.5/x86_64

            The same failure occurred while running racer test:

            00:34:43:Lustre: DEBUG MARKER: == racer test 1: racer on clients: shadow-23vm1,shadow-23vm5.shadow.whamcloud.com DURATION=900 == 06:19:27 (1408256367)
            00:34:43:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
            00:34:43:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
            00:34:43:INFO: task dir_create.sh:16222 blocked for more than 120 seconds.
            00:34:43:      Not tainted 2.6.32-431.23.3.el6.x86_64 #1
            00:34:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            00:34:43:dir_create.sh D 0000000000000001     0 16222  16215 0x00000084
            00:34:43: ffff880051103b98 0000000000000086 53f04c1a00000001 0000000000061e97
            00:34:43: 00003f5e00000000 00000d2700000000 ffff880051103bf8 ffffffffa0fc5d3d
            00:34:43: ffff88005078c638 ffff880051103fd8 000000000000fbc8 ffff88005078c638
            00:34:43:Call Trace:
            00:34:43: [<ffffffffa0fc5d3d>] ? ll_revalidate_it+0x48d/0x1b20 [lustre]
            00:34:43: [<ffffffff8152a36e>] __mutex_lock_slowpath+0x13e/0x180
            00:34:43: [<ffffffff811a4198>] ? __d_lookup+0xd8/0x150
            00:34:43: [<ffffffff8152a20b>] mutex_lock+0x2b/0x50
            00:34:43: [<ffffffff811989fb>] do_lookup+0x11b/0x230
            00:34:43: [<ffffffff81199150>] __link_path_walk+0x200/0x1000
            00:34:43: [<ffffffff8119a20a>] path_walk+0x6a/0xe0
            00:34:43: [<ffffffff8119b9ea>] do_filp_open+0x1fa/0xd20
            00:34:43: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
            00:34:43: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40
            00:34:43: [<ffffffff8128f87a>] ? strncpy_from_user+0x4a/0x90
            00:34:43: [<ffffffff811a8bd2>] ? alloc_fd+0x92/0x160
            00:34:43: [<ffffffff81185c39>] do_sys_open+0x69/0x140
            00:34:43: [<ffffffff81185d50>] sys_open+0x20/0x30
            00:34:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            

            Maloo report: https://testing.hpdd.intel.com/test_sets/3ce2436e-26aa-11e4-9de1-5254006e85c2

            yujian Jian Yu added a comment - Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/80/ Distro/Arch: RHEL6.5/x86_64 The same failure occurred while running racer test: 00:34:43:Lustre: DEBUG MARKER: == racer test 1: racer on clients: shadow-23vm1,shadow-23vm5.shadow.whamcloud.com DURATION=900 == 06:19:27 (1408256367) 00:34:43:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 00:34:43:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 00:34:43:INFO: task dir_create.sh:16222 blocked for more than 120 seconds. 00:34:43: Not tainted 2.6.32-431.23.3.el6.x86_64 #1 00:34:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 00:34:43:dir_create.sh D 0000000000000001 0 16222 16215 0x00000084 00:34:43: ffff880051103b98 0000000000000086 53f04c1a00000001 0000000000061e97 00:34:43: 00003f5e00000000 00000d2700000000 ffff880051103bf8 ffffffffa0fc5d3d 00:34:43: ffff88005078c638 ffff880051103fd8 000000000000fbc8 ffff88005078c638 00:34:43:Call Trace: 00:34:43: [<ffffffffa0fc5d3d>] ? ll_revalidate_it+0x48d/0x1b20 [lustre] 00:34:43: [<ffffffff8152a36e>] __mutex_lock_slowpath+0x13e/0x180 00:34:43: [<ffffffff811a4198>] ? __d_lookup+0xd8/0x150 00:34:43: [<ffffffff8152a20b>] mutex_lock+0x2b/0x50 00:34:43: [<ffffffff811989fb>] do_lookup+0x11b/0x230 00:34:43: [<ffffffff81199150>] __link_path_walk+0x200/0x1000 00:34:43: [<ffffffff8119a20a>] path_walk+0x6a/0xe0 00:34:43: [<ffffffff8119b9ea>] do_filp_open+0x1fa/0xd20 00:34:43: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 00:34:43: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 00:34:43: [<ffffffff8128f87a>] ? strncpy_from_user+0x4a/0x90 00:34:43: [<ffffffff811a8bd2>] ? alloc_fd+0x92/0x160 00:34:43: [<ffffffff81185c39>] do_sys_open+0x69/0x140 00:34:43: [<ffffffff81185d50>] sys_open+0x20/0x30 00:34:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Maloo report: https://testing.hpdd.intel.com/test_sets/3ce2436e-26aa-11e4-9de1-5254006e85c2
            sarah Sarah Liu added a comment -

            another instance:

            https://testing.hpdd.intel.com/test_sets/ba82eb98-14ee-11e4-bb6a-5254006e85c2

            04:36:20:LustreError: 12291:0:(file.c:3105:ll_migrate()) lustre: migrate 5 , but fid [0x0:0x0:0x0] is insane
            04:36:21:LustreError: 12993:0:(file.c:3105:ll_migrate()) lustre: migrate 1 , but fid [0x0:0x0:0x0] is insane
            04:36:21:LustreError: 12993:0:(file.c:3105:ll_migrate()) Skipped 4 previous similar messages
            04:36:22:LustreError: 14629:0:(file.c:3105:ll_migrate()) lustre: migrate 17 , but fid [0x0:0x0:0x0] is insane
            04:36:22:LustreError: 14629:0:(file.c:3105:ll_migrate()) Skipped 5 previous similar messages
            04:36:22:LustreError: 15942:0:(file.c:3105:ll_migrate()) lustre: migrate 7 , but fid [0x0:0x0:0x0] is insane
            04:36:22:LustreError: 15942:0:(file.c:3105:ll_migrate()) Skipped 2 previous similar messages
            04:36:24:LustreError: 16166:0:(file.c:3105:ll_migrate()) lustre: migrate 4 , but fid [0x0:0x0:0x0] is insane
            04:36:24:LustreError: 16166:0:(file.c:3105:ll_migrate()) Skipped 5 previous similar messages
            04:36:24:LustreError: 16510:0:(file.c:3105:ll_migrate()) lustre: migrate 13 , but fid [0x0:0x0:0x0] is insane
            04:36:24:LustreError: 16510:0:(file.c:3105:ll_migrate()) Skipped 6 previous similar messages
            04:36:25:LustreError: 20445:0:(file.c:3105:ll_migrate()) lustre: migrate 7 , but fid [0x0:0x0:0x0] is insane
            04:36:25:LustreError: 20445:0:(file.c:3105:ll_migrate()) Skipped 13 previous similar messages
            04:36:25:LustreError: 28207:0:(file.c:3105:ll_migrate()) lustre: migrate 0 , but fid [0x0:0x0:0x0] is insane
            04:36:25:LustreError: 28207:0:(file.c:3105:ll_migrate()) Skipped 27 previous similar messages
            04:36:27:LustreError: 2474:0:(lmv_intent.c:239:lmv_revalidate_slaves()) lustre-clilmv-ffff88007b1c9c00: nlink 0 < 2 corrupt stripe 0 [0x3c0000403:0xbf1:0x0]:[0x3c0000403:0xbf1:0x0]
            04:38:32:LustreError: 2474:0:(llite_lib.c:2419:ll_prep_inode()) new_inode -fatal: rc -5
            04:38:33:LNet: Host 10.2.4.243 reset our connection while we were sending data; it may have rebooted.
            04:38:33:Lustre: 2431:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1406288230/real 1406288230]  req@ffff88006b84dc00 x1474599723026556/t0(0) o400->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 224/224 e 0 to 1 dl 1406288414 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:33:LustreError: 166-1: MGC10.2.4.243@tcp: Connection to MGS (at 10.2.4.243@tcp) was lost; in progress operations using this service will fail
            04:38:33:Lustre: lustre-MDT0001-mdc-ffff88007b1c9c00: Connection to lustre-MDT0001 (at 10.2.4.243@tcp) was lost; in progress operations using this service will wait for recovery to complete
            04:38:33:Lustre: 2431:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
            04:38:33:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288230/real 1406288230]  req@ffff88006c138400 x1474599723026628/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288236 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:33:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
            04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288235/real 1406288235]  req@ffff88006b947c00 x1474599723026648/t0(0) o38->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 400/544 e 0 to 1 dl 1406288246 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
            04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406288240/real 0]  req@ffff88006e028400 x1474599723026712/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288251 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 1 previous similar message
            04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406288250/real 0]  req@ffff8800700cc800 x1474599723026836/t0(0) o38->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 400/544 e 0 to 1 dl 1406288266 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
            04:38:35:Lustre: 26740:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288091/real 1406288091]  req@ffff88006d299400 x1474599722652016/t0(0) o101->lustre-MDT0000-mdc-ffff88007aaad800@10.2.4.243@tcp:12/10 lens 576/1296 e 4 to 1 dl 1406288275 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1
            04:38:35:Lustre: 26740:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 1 previous similar message
            04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288280/real 1406288280]  req@ffff880067819400 x1474599723027200/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288301 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 12 previous similar messages
            04:38:36:INFO: task dir_create.sh:10472 blocked for more than 120 seconds.
            04:38:36:      Not tainted 2.6.32-431.20.3.el6.x86_64 #1
            04:38:36:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            04:38:36:dir_create.sh D 0000000000000001     0 10472  10446 0x00000080
            04:38:37: ffff880071c5fb98 0000000000000086 0000004b00000000 ffffffffa0bad040
            04:38:37: 0000000000000098 0020000000000080 53d240db00000001 000000000000f756
            04:38:37: ffff880072a46638 ffff880071c5ffd8 000000000000fbc8 ffff880072a46638
            04:38:39:Call Trace:
            04:38:39: [<ffffffff81529f9e>] __mutex_lock_slowpath+0x13e/0x180
            04:38:40: [<ffffffff811a4028>] ? __d_lookup+0xd8/0x150
            04:38:40: [<ffffffff81529e3b>] mutex_lock+0x2b/0x50
            04:38:40: [<ffffffff8119888b>] do_lookup+0x11b/0x230
            04:38:41: [<ffffffff81198fe0>] __link_path_walk+0x200/0x1000
            04:38:41: [<ffffffff8119a09a>] path_walk+0x6a/0xe0
            04:38:41: [<ffffffff8119b87a>] do_filp_open+0x1fa/0xd20
            04:38:42: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
            04:38:42: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40
            04:38:42: [<ffffffff8128f70a>] ? strncpy_from_user+0x4a/0x90
            04:38:43: [<ffffffff811a8a62>] ? alloc_fd+0x92/0x160
            04:38:43: [<ffffffff81185ba9>] do_sys_open+0x69/0x140
            04:38:43: [<ffffffff81185cc0>] sys_open+0x20/0x30
            04:38:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            04:38:44:INFO: task dir_create.sh:10692 blocked for more than 120 seconds.
            04:38:44:      Not tainted 2.6.32-431.20.3.el6.x86_64 #1
            04:38:44:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            04:38:46:dir_create.sh D 0000000000000001     0 10692  10456 0x00000080
            04:38:47: ffff880070315b98 0000000000000082 0000004b42547b0a ffffffffa0bad040
            04:38:47: 0000000000000098 0020000000000080 53d2413500000001 00000000000c4627
            04:38:47: ffff880070313ab8 ffff880070315fd8 000000000000fbc8 ffff880070313ab8
            04:38:49:Call Trace:
            04:38:49: [<ffffffff81529f9e>] __mutex_lock_slowpath+0x13e/0x180
            04:38:49: [<ffffffff811a4028>] ? __d_lookup+0xd8/0x150
            04:38:50: [<ffffffff81529e3b>] mutex_lock+0x2b/0x50
            04:38:50: [<ffffffff8119888b>] do_lookup+0x11b/0x230
            04:38:50: [<ffffffff81198fe0>] __link_path_walk+0x200/0x1000
            04:38:50: [<ffffffff8119a09a>] path_walk+0x6a/0xe0
            04:38:51: [<ffffffff8119b87a>] do_filp_open+0x1fa/0xd20
            04:38:51: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50
            04:38:52: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40
            04:38:52: [<ffffffff8128f70a>] ? strncpy_from_user+0x4a/0x90
            04:38:53: [<ffffffff811a8a62>] ? alloc_fd+0x92/0x160
            04:38:53: [<ffffffff81185ba9>] do_sys_open+0x69/0x140
            04:38:53: [<ffffffff81185cc0>] sys_open+0x20/0x30
            04:38:53: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            
            sarah Sarah Liu added a comment - another instance: https://testing.hpdd.intel.com/test_sets/ba82eb98-14ee-11e4-bb6a-5254006e85c2 04:36:20:LustreError: 12291:0:(file.c:3105:ll_migrate()) lustre: migrate 5 , but fid [0x0:0x0:0x0] is insane 04:36:21:LustreError: 12993:0:(file.c:3105:ll_migrate()) lustre: migrate 1 , but fid [0x0:0x0:0x0] is insane 04:36:21:LustreError: 12993:0:(file.c:3105:ll_migrate()) Skipped 4 previous similar messages 04:36:22:LustreError: 14629:0:(file.c:3105:ll_migrate()) lustre: migrate 17 , but fid [0x0:0x0:0x0] is insane 04:36:22:LustreError: 14629:0:(file.c:3105:ll_migrate()) Skipped 5 previous similar messages 04:36:22:LustreError: 15942:0:(file.c:3105:ll_migrate()) lustre: migrate 7 , but fid [0x0:0x0:0x0] is insane 04:36:22:LustreError: 15942:0:(file.c:3105:ll_migrate()) Skipped 2 previous similar messages 04:36:24:LustreError: 16166:0:(file.c:3105:ll_migrate()) lustre: migrate 4 , but fid [0x0:0x0:0x0] is insane 04:36:24:LustreError: 16166:0:(file.c:3105:ll_migrate()) Skipped 5 previous similar messages 04:36:24:LustreError: 16510:0:(file.c:3105:ll_migrate()) lustre: migrate 13 , but fid [0x0:0x0:0x0] is insane 04:36:24:LustreError: 16510:0:(file.c:3105:ll_migrate()) Skipped 6 previous similar messages 04:36:25:LustreError: 20445:0:(file.c:3105:ll_migrate()) lustre: migrate 7 , but fid [0x0:0x0:0x0] is insane 04:36:25:LustreError: 20445:0:(file.c:3105:ll_migrate()) Skipped 13 previous similar messages 04:36:25:LustreError: 28207:0:(file.c:3105:ll_migrate()) lustre: migrate 0 , but fid [0x0:0x0:0x0] is insane 04:36:25:LustreError: 28207:0:(file.c:3105:ll_migrate()) Skipped 27 previous similar messages 04:36:27:LustreError: 2474:0:(lmv_intent.c:239:lmv_revalidate_slaves()) lustre-clilmv-ffff88007b1c9c00: nlink 0 < 2 corrupt stripe 0 [0x3c0000403:0xbf1:0x0]:[0x3c0000403:0xbf1:0x0] 04:38:32:LustreError: 2474:0:(llite_lib.c:2419:ll_prep_inode()) new_inode -fatal: rc -5 04:38:33:LNet: Host 10.2.4.243 reset our connection while we were sending data; it may have rebooted. 04:38:33:Lustre: 2431:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1406288230/real 1406288230] req@ffff88006b84dc00 x1474599723026556/t0(0) o400->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 224/224 e 0 to 1 dl 1406288414 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:33:LustreError: 166-1: MGC10.2.4.243@tcp: Connection to MGS (at 10.2.4.243@tcp) was lost; in progress operations using this service will fail 04:38:33:Lustre: lustre-MDT0001-mdc-ffff88007b1c9c00: Connection to lustre-MDT0001 (at 10.2.4.243@tcp) was lost; in progress operations using this service will wait for recovery to complete 04:38:33:Lustre: 2431:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 3 previous similar messages 04:38:33:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288230/real 1406288230] req@ffff88006c138400 x1474599723026628/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288236 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:33:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 3 previous similar messages 04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288235/real 1406288235] req@ffff88006b947c00 x1474599723026648/t0(0) o38->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 400/544 e 0 to 1 dl 1406288246 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 2 previous similar messages 04:38:34:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406288240/real 0] req@ffff88006e028400 x1474599723026712/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288251 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 1 previous similar message 04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1406288250/real 0] req@ffff8800700cc800 x1474599723026836/t0(0) o38->lustre-MDT0000-mdc-ffff88007b1c9c00@10.2.4.243@tcp:12/10 lens 400/544 e 0 to 1 dl 1406288266 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 2 previous similar messages 04:38:35:Lustre: 26740:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288091/real 1406288091] req@ffff88006d299400 x1474599722652016/t0(0) o101->lustre-MDT0000-mdc-ffff88007aaad800@10.2.4.243@tcp:12/10 lens 576/1296 e 4 to 1 dl 1406288275 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 04:38:35:Lustre: 26740:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 1 previous similar message 04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1406288280/real 1406288280] req@ffff880067819400 x1474599723027200/t0(0) o250->MGC10.2.4.243@tcp@10.2.4.243@tcp:26/25 lens 400/544 e 0 to 1 dl 1406288301 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 04:38:35:Lustre: 2429:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 12 previous similar messages 04:38:36:INFO: task dir_create.sh:10472 blocked for more than 120 seconds. 04:38:36: Not tainted 2.6.32-431.20.3.el6.x86_64 #1 04:38:36:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 04:38:36:dir_create.sh D 0000000000000001 0 10472 10446 0x00000080 04:38:37: ffff880071c5fb98 0000000000000086 0000004b00000000 ffffffffa0bad040 04:38:37: 0000000000000098 0020000000000080 53d240db00000001 000000000000f756 04:38:37: ffff880072a46638 ffff880071c5ffd8 000000000000fbc8 ffff880072a46638 04:38:39:Call Trace: 04:38:39: [<ffffffff81529f9e>] __mutex_lock_slowpath+0x13e/0x180 04:38:40: [<ffffffff811a4028>] ? __d_lookup+0xd8/0x150 04:38:40: [<ffffffff81529e3b>] mutex_lock+0x2b/0x50 04:38:40: [<ffffffff8119888b>] do_lookup+0x11b/0x230 04:38:41: [<ffffffff81198fe0>] __link_path_walk+0x200/0x1000 04:38:41: [<ffffffff8119a09a>] path_walk+0x6a/0xe0 04:38:41: [<ffffffff8119b87a>] do_filp_open+0x1fa/0xd20 04:38:42: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 04:38:42: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 04:38:42: [<ffffffff8128f70a>] ? strncpy_from_user+0x4a/0x90 04:38:43: [<ffffffff811a8a62>] ? alloc_fd+0x92/0x160 04:38:43: [<ffffffff81185ba9>] do_sys_open+0x69/0x140 04:38:43: [<ffffffff81185cc0>] sys_open+0x20/0x30 04:38:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 04:38:44:INFO: task dir_create.sh:10692 blocked for more than 120 seconds. 04:38:44: Not tainted 2.6.32-431.20.3.el6.x86_64 #1 04:38:44:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 04:38:46:dir_create.sh D 0000000000000001 0 10692 10456 0x00000080 04:38:47: ffff880070315b98 0000000000000082 0000004b42547b0a ffffffffa0bad040 04:38:47: 0000000000000098 0020000000000080 53d2413500000001 00000000000c4627 04:38:47: ffff880070313ab8 ffff880070315fd8 000000000000fbc8 ffff880070313ab8 04:38:49:Call Trace: 04:38:49: [<ffffffff81529f9e>] __mutex_lock_slowpath+0x13e/0x180 04:38:49: [<ffffffff811a4028>] ? __d_lookup+0xd8/0x150 04:38:50: [<ffffffff81529e3b>] mutex_lock+0x2b/0x50 04:38:50: [<ffffffff8119888b>] do_lookup+0x11b/0x230 04:38:50: [<ffffffff81198fe0>] __link_path_walk+0x200/0x1000 04:38:50: [<ffffffff8119a09a>] path_walk+0x6a/0xe0 04:38:51: [<ffffffff8119b87a>] do_filp_open+0x1fa/0xd20 04:38:51: [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 04:38:52: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 04:38:52: [<ffffffff8128f70a>] ? strncpy_from_user+0x4a/0x90 04:38:53: [<ffffffff811a8a62>] ? alloc_fd+0x92/0x160 04:38:53: [<ffffffff81185ba9>] do_sys_open+0x69/0x140 04:38:53: [<ffffffff81185cc0>] sys_open+0x20/0x30 04:38:53: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            sarah Sarah Liu added a comment -

            hit same error here:
            https://maloo.whamcloud.com/test_sets/931f9f5c-80dc-11e3-81ba-52540035b04c

            server and client: lustre-master build #1837 RHEL6 ldiskfs DNE mode

            client 1 console

            00:03:08:INFO: task mv:22208 blocked for more than 120 seconds.
            00:03:09:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            00:03:10:mv            D 0000000000000001     0 22208   3780 0x00000080
            00:03:10: ffff8800536c7d08 0000000000000082 ffff8800536c7d18 ffffffff81190a45
            00:03:10: ffff880036d42500 ffff8800483bffa0 ffffffff8100bb8e ffff8800536c7d08
            00:03:11: ffff88006c4a2638 ffff8800536c7fd8 000000000000fb88 ffff88006c4a2638
            00:03:11:Call Trace:
            00:03:11: [<ffffffff81190a45>] ? __link_path_walk+0x155/0x1030
            00:03:11: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
            00:03:13: [<ffffffff810521eb>] ? mutex_spin_on_owner+0x9b/0xc0
            00:03:14: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180
            00:03:14: [<ffffffff8118f571>] ? path_put+0x31/0x40
            00:03:14: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50
            00:03:14: [<ffffffff8118ea37>] lock_rename+0xb7/0xe0
            00:03:14: [<ffffffff81191fcf>] sys_renameat+0x11f/0x240
            00:03:14: [<ffffffff81186b94>] ? cp_new_stat+0xe4/0x100
            00:03:15: [<ffffffff81186ea6>] ? sys_newlstat+0x36/0x50
            00:03:15: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
            00:03:15: [<ffffffff8119210b>] sys_rename+0x1b/0x20
            00:03:15: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            00:03:15:INFO: task rm:23013 blocked for more than 120 seconds.
            00:03:15:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            00:03:16:rm            D 0000000000000001     0 23013   3800 0x00000080
            00:03:16: ffff8800483a1e08 0000000000000086 ffff88006ce8c080 ffff88006ce8c080
            00:03:17: ffff88006ce8c080 ffff88007ba2d000 ffff88006ce8c080 00000010a0a57caf
            00:03:17: ffff88006ce8c638 ffff8800483a1fd8 000000000000fb88 ffff88006ce8c638
            00:03:18:Call Trace:
            00:03:18: [<ffffffff8118f571>] ? path_put+0x31/0x40
            00:03:18: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180
            00:03:18: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50
            00:03:18: [<ffffffff811921b9>] do_unlinkat+0xa9/0x1b0
            00:03:18: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
            00:03:19: [<ffffffff81192422>] sys_unlinkat+0x22/0x40
            00:03:20: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            00:03:20:INFO: task mv:23427 blocked for more than 120 seconds.
            00:03:20:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            00:03:20:mv            D 0000000000000001     0 23427   3790 0x00000080
            00:03:21: ffff880041c23d08 0000000000000086 ffff880041c23d18 ffffffff81190a45
            00:03:21: 00007fae6043f000 ffffea0000c33ff0 ffff880041c23c98 ffff88006edd8080
            00:03:21: ffff88006edd8638 ffff880041c23fd8 000000000000fb88 ffff88006edd8638
            00:03:22:Call Trace:
            00:03:23: [<ffffffff81190a45>] ? __link_path_walk+0x155/0x1030
            00:03:23: [<ffffffff811a2100>] ? mntput_no_expire+0x30/0x110
            00:03:23: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180
            00:03:24: [<ffffffff8118f571>] ? path_put+0x31/0x40
            00:03:24: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50
            00:03:25: [<ffffffff8118e9bf>] lock_rename+0x3f/0xe0
            00:03:25: [<ffffffff81191fcf>] sys_renameat+0x11f/0x240
            00:03:25: [<ffffffff81186b94>] ? cp_new_stat+0xe4/0x100
            00:03:25: [<ffffffff81186e6e>] ? vfs_lstat+0x1e/0x20
            00:03:26: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200
            00:03:27: [<ffffffff8119210b>] sys_rename+0x1b/0x20
            00:03:27: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            
            sarah Sarah Liu added a comment - hit same error here: https://maloo.whamcloud.com/test_sets/931f9f5c-80dc-11e3-81ba-52540035b04c server and client: lustre-master build #1837 RHEL6 ldiskfs DNE mode client 1 console 00:03:08:INFO: task mv:22208 blocked for more than 120 seconds. 00:03:09:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 00:03:10:mv D 0000000000000001 0 22208 3780 0x00000080 00:03:10: ffff8800536c7d08 0000000000000082 ffff8800536c7d18 ffffffff81190a45 00:03:10: ffff880036d42500 ffff8800483bffa0 ffffffff8100bb8e ffff8800536c7d08 00:03:11: ffff88006c4a2638 ffff8800536c7fd8 000000000000fb88 ffff88006c4a2638 00:03:11:Call Trace: 00:03:11: [<ffffffff81190a45>] ? __link_path_walk+0x155/0x1030 00:03:11: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 00:03:13: [<ffffffff810521eb>] ? mutex_spin_on_owner+0x9b/0xc0 00:03:14: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180 00:03:14: [<ffffffff8118f571>] ? path_put+0x31/0x40 00:03:14: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50 00:03:14: [<ffffffff8118ea37>] lock_rename+0xb7/0xe0 00:03:14: [<ffffffff81191fcf>] sys_renameat+0x11f/0x240 00:03:14: [<ffffffff81186b94>] ? cp_new_stat+0xe4/0x100 00:03:15: [<ffffffff81186ea6>] ? sys_newlstat+0x36/0x50 00:03:15: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200 00:03:15: [<ffffffff8119210b>] sys_rename+0x1b/0x20 00:03:15: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 00:03:15:INFO: task rm:23013 blocked for more than 120 seconds. 00:03:15:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 00:03:16:rm D 0000000000000001 0 23013 3800 0x00000080 00:03:16: ffff8800483a1e08 0000000000000086 ffff88006ce8c080 ffff88006ce8c080 00:03:17: ffff88006ce8c080 ffff88007ba2d000 ffff88006ce8c080 00000010a0a57caf 00:03:17: ffff88006ce8c638 ffff8800483a1fd8 000000000000fb88 ffff88006ce8c638 00:03:18:Call Trace: 00:03:18: [<ffffffff8118f571>] ? path_put+0x31/0x40 00:03:18: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180 00:03:18: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50 00:03:18: [<ffffffff811921b9>] do_unlinkat+0xa9/0x1b0 00:03:18: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200 00:03:19: [<ffffffff81192422>] sys_unlinkat+0x22/0x40 00:03:20: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 00:03:20:INFO: task mv:23427 blocked for more than 120 seconds. 00:03:20:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 00:03:20:mv D 0000000000000001 0 23427 3790 0x00000080 00:03:21: ffff880041c23d08 0000000000000086 ffff880041c23d18 ffffffff81190a45 00:03:21: 00007fae6043f000 ffffea0000c33ff0 ffff880041c23c98 ffff88006edd8080 00:03:21: ffff88006edd8638 ffff880041c23fd8 000000000000fb88 ffff88006edd8638 00:03:22:Call Trace: 00:03:23: [<ffffffff81190a45>] ? __link_path_walk+0x155/0x1030 00:03:23: [<ffffffff811a2100>] ? mntput_no_expire+0x30/0x110 00:03:23: [<ffffffff8150f81e>] __mutex_lock_slowpath+0x13e/0x180 00:03:24: [<ffffffff8118f571>] ? path_put+0x31/0x40 00:03:24: [<ffffffff8150f6bb>] mutex_lock+0x2b/0x50 00:03:25: [<ffffffff8118e9bf>] lock_rename+0x3f/0xe0 00:03:25: [<ffffffff81191fcf>] sys_renameat+0x11f/0x240 00:03:25: [<ffffffff81186b94>] ? cp_new_stat+0xe4/0x100 00:03:25: [<ffffffff81186e6e>] ? vfs_lstat+0x1e/0x20 00:03:26: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200 00:03:27: [<ffffffff8119210b>] sys_rename+0x1b/0x20 00:03:27: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: