[LU-6273] Hard Failover replay-dual test_17: Failover OST mount hang Created: 24/Feb/15 Updated: 23/Sep/15 Resolved: 22/Sep/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
client and server: lustre-master build # 2856 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 17590 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/0429703c-ba58-11e4-8053-5254006e85c2. The sub-test test_17 failed with the following error: test failed to respond and timed out ost dmesg Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 120 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre D 0000000000000000 0 3630 3629 0x00000080
ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
Call Trace:
[<ffffffff81055783>] ? set_next_buddy+0x43/0x50
[<ffffffff8152a595>] schedule_timeout+0x215/0x2e0
[<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
[<ffffffff8152a213>] wait_for_common+0x123/0x180
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa090cd00>] ? client_lwp_config_process+0x0/0x1948 [obdclass]
[<ffffffff8152a32d>] wait_for_completion+0x1d/0x20
[<ffffffffa0898e14>] llog_process_or_fork+0x354/0x540 [obdclass]
[<ffffffffa0899014>] llog_process+0x14/0x30 [obdclass]
[<ffffffffa08c81d4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
[<ffffffffa10314f2>] mgc_process_log+0xeb2/0x1970 [mgc]
[<ffffffffa102b1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc]
[<ffffffffa0ad0860>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
[<ffffffffa1032ef8>] mgc_process_config+0x658/0x1210 [mgc]
[<ffffffffa08d9383>] lustre_process_log+0x7e3/0x1130 [obdclass]
[<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa08d514f>] ? server_name2fsname+0x6f/0x90 [obdclass]
[<ffffffffa0907496>] server_start_targets+0x12b6/0x1af0 [obdclass]
[<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa08dbfe6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
[<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa08d3390>] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
[<ffffffffa090c255>] server_fill_super+0xbe5/0x1690 [obdclass]
[<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa08dde90>] lustre_fill_super+0x560/0xa80 [obdclass]
[<ffffffffa08dd930>] ? lustre_fill_super+0x0/0xa80 [obdclass]
[<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
[<ffffffffa08d4ee5>] lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
[<ffffffff8118bd72>] do_kern_mount+0x52/0x130
[<ffffffff8119e972>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811ad74b>] do_mount+0x2fb/0x930
[<ffffffff811ade10>] sys_mount+0x90/0xe0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565647/real 1424565647] req@ffff880070449080 x1493765169611180/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565712/real 1424565712] req@ffff880070449680 x1493765169611316/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565737 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre D 0000000000000000 0 3630 3629 0x00000080
ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
Call Trace:
[<ffffffff81055783>] ? set_next_buddy+0x43/0x50
[<ffffffff8152a595>] schedule_timeout+0x215/0x2e0
[<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
[<ffffffff8152a213>] wait_for_common+0x123/0x180
[<ffffffff81061d00>] ? default_wake_function+0x0/0x20
[<ffffffffa090cd00>] ? client_lwp_config_process+0x0/0x1948 [obdclass]
[<ffffffff8152a32d>] wait_for_completion+0x1d/0x20
[<ffffffffa0898e14>] llog_process_or_fork+0x354/0x540 [obdclass]
[<ffffffffa0899014>] llog_process+0x14/0x30 [obdclass]
[<ffffffffa08c81d4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
[<ffffffffa10314f2>] mgc_process_log+0xeb2/0x1970 [mgc]
[<ffffffffa102b1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc]
[<ffffffffa0ad0860>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
[<ffffffffa1032ef8>] mgc_process_config+0x658/0x1210 [mgc]
[<ffffffffa08d9383>] lustre_process_log+0x7e3/0x1130 [obdclass]
[<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa08d514f>] ? server_name2fsname+0x6f/0x90 [obdclass]
[<ffffffffa0907496>] server_start_targets+0x12b6/0x1af0 [obdclass]
[<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa08dbfe6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
[<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa08d3390>] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
[<ffffffffa090c255>] server_fill_super+0xbe5/0x1690 [obdclass]
[<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
[<ffffffffa08dde90>] lustre_fill_super+0x560/0xa80 [obdclass]
[<ffffffffa08dd930>] ? lustre_fill_super+0x0/0xa80 [obdclass]
[<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
[<ffffffffa08d4ee5>] lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
[<ffffffff8118bd72>] do_kern_mount+0x52/0x130
[<ffffffff8119e972>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811ad74b>] do_mount+0x2fb/0x930
[<ffffffff811ade10>] sys_mount+0x90/0xe0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.156@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 304 previous similar messages
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565842/real 1424565842] req@ffff880070449c80 x1493765169611592/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565867 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
|
| Comments |
| Comment by Oleg Drokin [ 24/Feb/15 ] |
|
So the real problem here is such that MDS has crashed and now OSTs cannot reconnect to it anymore. |
| Comment by Jodi Levi (Inactive) [ 24/Feb/15 ] |
|
Mike, |
| Comment by Oleg Drokin [ 24/Feb/15 ] |
|
Additionally - even with MDS/MGS node still down OSTs absolutely must be able to start as long as they have a cached copy of config. |
| Comment by Sarah Liu [ 04/Mar/15 ] |
|
another instance seen in recovery-double-scale https://testing.hpdd.intel.com/test_sets/85116226-c170-11e4-bef2-5254006e85c2 OST Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
INFO: task mount.lustre:27983 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre D 0000000000000000 0 27983 27982 0x00000080
ffff88005ca1f718 0000000000000082 000000000000ebc8 ffff88005ed70640
ffff88007e4c2aa0 ffff88005ed70080 ffff88005ca1f698 ffffffff810588b3
ffff88007e4c2ad8 ffff880002215900 ffff88005ed70638 ffff88005ca1ffd8
Call Trace:
[<ffffffff810588b3>] ? set_next_buddy+0x43/0x50
[<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0
[<ffffffff8152b185>] schedule_timeout+0x215/0x2e0
[<ffffffff8106d175>] ? enqueue_entity+0x125/0x450
[<ffffffff8105dea4>] ? check_preempt_wakeup+0x1a4/0x260
[<ffffffff8106d59b>] ? enqueue_task_fair+0xfb/0x100
[<ffffffff8152ae03>] wait_for_common+0x123/0x180
[<ffffffff81064b90>] ? default_wake_function+0x0/0x20
[<ffffffffa09127c0>] ? client_lwp_config_process+0x0/0x1912 [obdclass]
[<ffffffff8152af1d>] wait_for_completion+0x1d/0x20
[<ffffffffa089ecd4>] llog_process_or_fork+0x354/0x540 [obdclass]
[<ffffffffa089eed4>] llog_process+0x14/0x30 [obdclass]
[<ffffffffa08cddf4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
[<ffffffffa10344f2>] mgc_process_log+0xeb2/0x1970 [mgc]
[<ffffffffa102e1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc]
[<ffffffffa0ad5800>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
[<ffffffffa1035ef8>] mgc_process_config+0x658/0x1210 [mgc]
[<ffffffff81174d4c>] ? __kmalloc+0x21c/0x230
[<ffffffffa08defa3>] lustre_process_log+0x7e3/0x1130 [obdclass]
[<ffffffffa0906e9c>] ? server_find_mount+0xbc/0x160 [obdclass]
[<ffffffff811749f3>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
[<ffffffffa08dad6f>] ? server_name2fsname+0x6f/0x90 [obdclass]
[<ffffffffa090cf56>] server_start_targets+0x12b6/0x1af0 [obdclass]
[<ffffffffa08e1bdb>] ? lustre_start_mgc+0x48b/0x1e00 [obdclass]
[<ffffffffa08d8fb0>] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
[<ffffffffa0911d15>] server_fill_super+0xbe5/0x1690 [obdclass]
[<ffffffffa08e3ab0>] lustre_fill_super+0x560/0xa80 [obdclass]
[<ffffffffa08e3550>] ? lustre_fill_super+0x0/0xa80 [obdclass]
[<ffffffff811917af>] get_sb_nodev+0x5f/0xa0
[<ffffffffa08dab05>] lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff81190deb>] vfs_kern_mount+0x7b/0x1b0
[<ffffffff81190f92>] do_kern_mount+0x52/0x130
[<ffffffff811a3c12>] ? vfs_ioctl+0x22/0xa0
[<ffffffff811b2b9b>] do_mount+0x2fb/0x930
[<ffffffff811b3260>] sys_mount+0x90/0xe0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1425299784/real 1425299784] req@ffff88005ee57380 x1494534964904508/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.1.4.205@tcp:12/10 lens 400/544 e 0 to 1 dl 1425299809 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.1.4.198@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 40 previous similar messages
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425299849/real 1425299849] req@ffff88005c0a69c0 x1494534964904636/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.1.4.205@tcp:12/10 lens 400/544 e 0 to 1 dl 1425299874 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
INFO: task mount.lustre:27983 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.8.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre D 0000000000000000 0 27983 27982 0x00000080
|
| Comment by Jian Yu [ 11/Mar/15 ] |
|
This is blocking ZFS hard failover testing on master branch: |
| Comment by Joseph Gmitter (Inactive) [ 18/Aug/15 ] |
|
Hi Mike, |
| Comment by Mikhail Pershin [ 18/Aug/15 ] |
|
yes, I am working on this |
| Comment by Mikhail Pershin [ 19/Aug/15 ] |
|
this looks like bug with lw_client detection in target_handle_connect(). I think this patch should do the job: |
| Comment by Joseph Gmitter (Inactive) [ 20/Aug/15 ] |
|
Looks like that patch needs to be rebased to move forward. |
| Comment by Mikhail Pershin [ 20/Aug/15 ] |
|
in fact I think this is fixed already by commit f1d81db9376965e302ecc05e10be220d72b2f04a from |
| Comment by Jian Yu [ 24/Aug/15 ] |
|
On master branch: RHEL 7.1 with ZFS: RHEL 6.6 with ZFS: RHEL 7.1 with ldiskfs: RHEL 6.6 with ldiskfs: The failure also occurred on ZFS test sessions on master branch. Not on ldiskfs test sessions. |
| Comment by Mikhail Pershin [ 25/Aug/15 ] |
|
It looks like OST is unable to switch to the failover MDS after all and keep trying to reach failed MDS. That means problem in its import connection logic or local config doesn't provide information about failover MDS |
| Comment by Jian Yu [ 04/Sep/15 ] |
|
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_6/2 (2.6.0) The failure also occurred on Lustre 2.6.0 release: |
| Comment by Mikhail Pershin [ 06/Sep/15 ] |
|
The problem looks still related to the LWP device, it is not able to switch to the failover server for some reason. Meanwhile the local config llog exists on OST and contains failover nid, so problem might be inside LWP code. Also it is not clear why it exists only on ZFS setups. |
| Comment by Niu Yawei (Inactive) [ 06/Sep/15 ] |
Call Trace: [<ffffffff81055783>] ? set_next_buddy+0x43/0x50 [<ffffffff8152a595>] schedule_timeout+0x215/0x2e0 [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450 [<ffffffff8152a213>] wait_for_common+0x123/0x180 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa090cd00>] ? client_lwp_config_process+0x0/0x1948 [obdclass] [<ffffffff8152a32d>] wait_for_completion+0x1d/0x20 [<ffffffffa0898e14>] llog_process_or_fork+0x354/0x540 [obdclass] [<ffffffffa0899014>] llog_process+0x14/0x30 [obdclass] [<ffffffffa08c81d4>] class_config_parse_llog+0x1e4/0x330 [obdclass] [<ffffffffa10314f2>] mgc_process_log+0xeb2/0x1970 [mgc] [<ffffffffa102b1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc] [<ffffffffa0ad0860>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa1032ef8>] mgc_process_config+0x658/0x1210 [mgc] [<ffffffffa08d9383>] lustre_process_log+0x7e3/0x1130 [obdclass] [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa08d514f>] ? server_name2fsname+0x6f/0x90 [obdclass] [<ffffffffa0907496>] server_start_targets+0x12b6/0x1af0 [obdclass] [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa08dbfe6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass] [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa08d3390>] ? class_config_llog_handler+0x0/0x1a70 [obdclass] [<ffffffffa090c255>] server_fill_super+0xbe5/0x1690 [obdclass] [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs] [<ffffffffa08dde90>] lustre_fill_super+0x560/0xa80 [obdclass] [<ffffffffa08dd930>] ? lustre_fill_super+0x0/0xa80 [obdclass] [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0 [<ffffffffa08d4ee5>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff8118bd72>] do_kern_mount+0x52/0x130 [<ffffffff8119e972>] ? vfs_ioctl+0x22/0xa0 [<ffffffff811ad74b>] do_mount+0x2fb/0x930 [<ffffffff811ade10>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b The mount thread is waiting on processing log records, let's see what the log process thread is doing: llog_process_ S 0000000000000001 0 28768 2 0x00000080 ffff880058535970 0000000000000046 ffff880058535960 ffffffffa0c04f4c ffff8800585359d0 000000007f823240 0000000000000000 ffff8800590f4511 0000004958535910 ffffffffa0cccece ffff880058f285f8 ffff880058535fd8 Call Trace: [<ffffffffa0c04f4c>] ? ptlrpc_unregister_reply+0x6c/0x810 [ptlrpc] [<ffffffff8152b222>] schedule_timeout+0x192/0x2e0 [<ffffffff81087540>] ? process_timeout+0x0/0x10 [<ffffffffa0c07ba9>] ptlrpc_set_wait+0x319/0xa20 [ptlrpc] [<ffffffffa0bfd2c0>] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc] [<ffffffff81064c00>] ? default_wake_function+0x0/0x20 [<ffffffffa0c13da5>] ? lustre_msg_set_jobid+0xf5/0x130 [ptlrpc] [<ffffffffa0c08331>] ptlrpc_queue_wait+0x81/0x220 [ptlrpc] [<ffffffffa0e7455b>] fld_client_rpc+0x15b/0x510 [fld] [<ffffffff81297869>] ? simple_strtoul+0x9/0x10 [<ffffffffa0e7b024>] fld_update_from_controller+0x1c4/0x570 [fld] [<ffffffffa124f8e8>] ofd_register_lwp_callback+0xa8/0x5a0 [ofd] [<ffffffffa0a0c84f>] lustre_lwp_connect+0xacf/0xd10 [obdclass] [<ffffffffa0a0db25>] lustre_lwp_setup+0x8c5/0xc60 [obdclass] [<ffffffffa09dbc08>] ? target_name2index+0x78/0xc0 [obdclass] [<ffffffffa0a0fa07>] client_lwp_config_process+0x1357/0x1da0 [obdclass] [<ffffffffa099c35a>] llog_process_thread+0x94a/0xfc0 [obdclass] [<ffffffffa099d515>] llog_process_thread_daemonize+0x45/0x70 [obdclass] [<ffffffffa099d4d0>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass] [<ffffffff8109e78e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20 It's trying to query FLDB from MDT0, but unfortunately the FLD RPC timedout somehow. Though I don't quite sure the timeout reason, I think we shouldn't do that in ofd_register_lwp_callback() from the beginning, because that'll block the OST mount process when the MDT0 isn't started. Di, could you take a look? Is there anyway to avert that problem? |
| Comment by Niu Yawei (Inactive) [ 07/Sep/15 ] |
|
I see why the LWP on OST can't connect to the failover MDT now: On OST mount, we process the client log to setup LWP as following: We can see if the mount process is blocked on step 1, then it will never have a chance to add failover connection, and LWP won't be able to switch to failover node forever. Unfortunately, the process could be blocked on step 1, see lustre_lwp_setup() -> lustre_lwp_connect(): rc = obd_connect(&env, &exp, lwp, uuid, data, NULL);
if (rc != 0) {
CERROR("%s: connect failed: rc = %d\n", lwp->obd_name, rc);
} else {
if (unlikely(lwp->obd_lwp_export != NULL))
class_export_put(lwp->obd_lwp_export);
lwp->obd_lwp_export = class_export_get(exp);
lustre_notify_lwp_list(exp);
}
lustre_notify_lwp_list() is called after obd_connect() is called successfully (which means CONNECT can be sent successfully, but not connection established), lustre_notify_lwp_list calls ofd_register_lwp_callback() to send FLD RPC, please note that it's now try to send RPC to the primary MDT0, (connection of failover MDT0 hasn't been added yet). Let look at fld_client_rpc(): if (rc != 0) { if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) { /* Since LWP is not replayable, so it will keep * trying unless umount happens, otherwise it would * cause unecessary failure of the application. */ ptlrpc_req_finished(req); rc = 0; goto again; } GOTO(out_req, rc); } It's trying to send the RPC to primary MDT0 again and again in a loop... At the end, the mount process is blocked here forever (before the failover connection being added). I'd think further on how to resolve this. Di, any thoughts? Thanks. |
| Comment by Di Wang [ 07/Sep/15 ] |
|
Hmm, I actually met this problem in one of my patch test See this failure https://testing.hpdd.intel.com/test_sets/e2a60dca-3ff5-11e5-afd4-5254006e85c2 we probably should set no_delay and allow_replay flag for update FLD RPC. So I think this patch should fix the problem diff --git a/lustre/fld/fld_request.c b/lustre/fld/fld_request.c
index a59ab95..c82b098 100644
--- a/lustre/fld/fld_request.c
+++ b/lustre/fld/fld_request.c
@@ -399,6 +399,14 @@ again:
req_capsule_set_size(&req->rq_pill, &RMF_GENERIC_DATA,
RCL_SERVER, PAGE_CACHE_SIZE);
+
+ /* This might happen before the import state becomes to FULL,
+ * let's set allow_replay for this request to avoid deadlock
+ * see LU-6273 */
+ req->rq_allow_replay = 1;
+ /* This will always use LWP connection, let's send the req
+ * with no_delay flags, see above */
+ req->rq_no_delay = 1;
break;
default:
rc = -EINVAL;
I will push the fix to http://review.whamcloud.com/#/c/15275/ |
| Comment by Mikhail Pershin [ 07/Sep/15 ] |
|
yes, that looks like the reason, but I still wonder why ldiskfs setups are not affected. Di, IIRC, you did local FLDB in past, shouldn't OST have a local copy of FLDB? |
| Comment by Di Wang [ 07/Sep/15 ] |
|
I discussed with niu a bit, and this patch is probably not enough according to niu's comment above. We probably should move this fled_update_FLDB to a separate thread, otherwise the LWP config log analyze will not continue, and LWP will never be able to connect. Yes, there is FLDB locally on each target, this FLDB will only happens during first initialization (checking lsf_new flags) or upgrade from 2.4 (DNE env) to the current DNE version (2.8). And I do not know why it only happen on ZFS. It maybe because fld_index_init is slow on ZFS, so OST is restart before fld_index_init finish. |
| Comment by Mikhail Pershin [ 08/Sep/15 ] |
|
I think we can just move the lustre_notify_lwp_list() to the lustre_start_lwp(), right after lsi_lwp_started is set to 1. At this point the llog is processed and lwp should be fully setup. |
| Comment by Gerrit Updater [ 08/Sep/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16303 |
| Comment by Di Wang [ 08/Sep/15 ] |
I think we can just move the lustre_notify_lwp_list() to the lustre_start_lwp(), right after lsi_lwp_started is set to 1. At this point the llog is processed and lwp should be fully setup. No, lustre_start_lwp does not promise LWP will be setup accordingly, because mdt0 might not be added to the client config log yet, for example, OST might be started before MDT0. |
| Comment by Gerrit Updater [ 08/Sep/15 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16304 |
| Comment by Gerrit Updater [ 22/Sep/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16303/ |
| Comment by Peter Jones [ 22/Sep/15 ] |
|
Landed for 2.8 |