Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
-
OpenSFS cluster with two MDSs with one MDT each, three OSSs with two OSTs each and three clients. Lustre master tag 2.6.90 build #2734
-
3
-
16514
Description
Racer test_1 failed with 'test_1 failed with 4'. Logs at https://testing.hpdd.intel.com/test_sets/525741e4-6a21-11e4-aeb4-5254006e85c2
From the client test_log, there are many lines like the following:
c13: /usr/lib64/lustre/tests/racer/racer.sh: line 70: ./file_delxattr.sh: No such file or directory c13: /usr/lib64/lustre/tests/racer/racer.sh: line 70: ./file_truncate.sh: No such file or directory c13: /usr/lib64/lustre/tests/racer/racer.sh: line 70: ./file_chmod.sh: No such file or directory c13: /usr/lib64/lustre/tests/racer/racer.sh: line 70: ./file_chown.sh: No such file or directory c13: /usr/lib64/lustre/tests/racer/racer.sh: line 70: ./file_truncate.sh: No such file or directory
From the client dmesg log:
LustreError: 29864:0:(file.c:3040:ll_migrate()) scratch: migrate 1 , but fid [0x0:0x0:0x0] is insane LustreError: 29864:0:(file.c:3040:ll_migrate()) Skipped 1 previous similar message LustreError: 29864:0:(file.c:3040:ll_migrate()) scratch: migrate 9 , but fid [0x0:0x0:0x0] is insane LustreError: 29864:0:(file.c:3040:ll_migrate()) Skipped 2 previous similar messages LustreError: 1562:0:(file.c:3040:ll_migrate()) scratch: migrate 2 , but fid [0x0:0x0:0x0] is insane LustreError: 1562:0:(file.c:3040:ll_migrate()) Skipped 3 previous similar messages LustreError: 3290:0:(file.c:3040:ll_migrate()) scratch: migrate 2 , but fid [0x0:0x0:0x0] is insane LustreError: 3290:0:(file.c:3040:ll_migrate()) Skipped 12 previous similar messages INFO: task dir_create.sh:27528 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dir_create.sh D 0000000000000004 0 27528 27501 0x00000080 ffff8805c6993b98 0000000000000086 0000004b00000000 ffffffffa130f683 000000000000009b 0020000000000080 545dced000000004 00000000000e4a17 ffff88081325bab8 ffff8805c6993fd8 000000000000fbc8 ffff88081325bab8 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119b99a>] do_filp_open+0x1fa/0xd20 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b ... INFO: task ls:662 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000005 0 662 27725 0x00000080 ffff880507b91a78 0000000000000082 0000029600000000 ffff880800aff0b8 0000000000000000 0000000000000001 ffff880812126000 ffff8807fe90b088 ffff880544ddd058 ffff880507b91fd8 000000000000fbc8 ffff880544ddd058 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff811996a4>] __link_path_walk+0x7a4/0x1000 [<ffffffffa12e22d0>] ? ll_follow_link+0x350/0xdb0 [lustre] [<ffffffff81199bdf>] __link_path_walk+0xcdf/0x1000 [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b ...