[LU-2823] large-scale test_3a hung: LBUG: ASSERTION(ergo(!obd->obd_recovering, diff >= 0)) failed: lustre-OST0001: 1013476 - 1024134 = -10658 Created: 16/Feb/13 Updated: 17/Apr/13 Resolved: 17/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v1_8_9_WC1_RC1 |
||
| Severity: | 3 |
| Rank (Obsolete): | 6834 |
| Description |
|
The large-scale test_3a hung as follows: 1 : Starting failover on mds Failing mds on node client-20-ib CMD: client-20-ib grep -c /mnt/mds' ' /proc/mounts Stopping /mnt/mds (opts:) CMD: client-20-ib umount -d /mnt/mds CMD: client-20-ib lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' affected facets: mds df pid is 29886 Failover mds to client-20-ib 05:15:08 (1360934108) waiting for client-20-ib network 900 secs ... 05:15:08 (1360934108) network interface is UP CMD: client-20-ib hostname Starting mds: -o user_xattr,acl /dev/lvm-MDS/P1 /mnt/mds CMD: client-20-ib mkdir -p /mnt/mds; mount -t lustre -o user_xattr,acl /dev/lvm-MDS/P1 /mnt/mds CMD: client-20-ib PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/tests:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib64/lustre/utils:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey::/usr/lib64/lustre/tests/mpi NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \" 0xffb7e3ff\" 32 client-20-ib: lnet.debug=-1 client-20-ib: lnet.subsystem_debug=0xffb7e3ff client-20-ib: lnet.debug_mb=32 CMD: client-20-ib e2label /dev/lvm-MDS/P1 Started lustre-MDT0000 affected facets: mds CMD: client-20-ib PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/tests:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib64/lustre/utils:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey::/usr/lib64/lustre/tests/mpi NAME=autotest_config sh rpc.sh _wait_recovery_complete *.lustre-MDT0000.recovery_status 200 client-20-ib: *.lustre-MDT0000.recovery_status status: RECOVERING client-20-ib: Waiting 195 secs for *.lustre-MDT0000.recovery_status recovery done. status: RECOVERING client-20-ib: *.lustre-MDT0000.recovery_status status: COMPLETE CMD: client-20-ib lctl get_param -n *.lustre-MDT0000.recovery_status RECOVERY TIME: NFILES=50000 number of clients: 1 recovery_duration: 2 Console log on Client (client-22-ib) showed that: 05:15:13:Lustre: MGC192.168.4.20@o2ib: Connection restored to service MGS using nid 192.168.4.20@o2ib. 05:15:13:Lustre: Skipped 5 previous similar messages 05:18:15:INFO: task mdsrate:9347 blocked for more than 120 seconds. 05:18:15:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 05:18:15:mdsrate D 0000000000000002 0 9347 9346 0x00000080 05:18:15: ffff8802f9f71d38 0000000000000082 ffff8802181a4200 0000000000000001 05:18:15: ffff88029d9e2018 0000000000640c40 ffffffff8100bb8e ffff8802f9f71d38 05:18:15: ffff880313dcfab8 ffff8802f9f71fd8 000000000000fb88 ffff880313dcfab8 05:18:15:Call Trace: 05:18:15: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 05:18:15: [<ffffffff8104dfbf>] ? mutex_spin_on_owner+0x9f/0xc0 05:18:15: [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180 05:18:15: [<ffffffff8115eb22>] ? kmem_cache_alloc+0x182/0x190 05:18:15: [<ffffffff814eb14b>] mutex_lock+0x2b/0x50 05:18:15: [<ffffffff81186e4e>] do_filp_open+0x2be/0xd60 05:18:15: [<ffffffff8104338c>] ? __do_page_fault+0x1ec/0x480 05:18:15: [<ffffffff811b9cae>] ? ep_poll+0x12e/0x330 05:18:15: [<ffffffff81193272>] ? alloc_fd+0x92/0x160 05:18:15: [<ffffffff81173a39>] do_sys_open+0x69/0x140 05:18:15: [<ffffffff81173b50>] sys_open+0x20/0x30 05:18:15: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Console log on MDS (client-20-ib) showed that: 05:15:03:Lustre: DEBUG MARKER: 1 : Starting failover on mds 05:15:03:Lustre: DEBUG MARKER: grep -c /mnt/mds' ' /proc/mounts 05:15:03:Lustre: DEBUG MARKER: umount -d /mnt/mds 05:15:03:Lustre: Failing over lustre-MDT0000 05:15:03:Lustre: Skipped 17 previous similar messages 05:15:03:LustreError: 3780:0:(osc_create.c:595:osc_create()) lustre-OST0000-osc: oscc recovery failed: -4 05:15:03:LustreError: 3782:0:(osc_create.c:595:osc_create()) lustre-OST0001-osc: oscc recovery failed: -4 05:15:03:LustreError: 3782:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -4 05:15:03:LustreError: 3781:0:(osc_create.c:595:osc_create()) lustre-OST0002-osc: oscc recovery failed: -4 05:15:03:LustreError: 3782:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -4 05:15:03:LustreError: 3782:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -4, deactivating 05:15:03:LustreError: 3785:0:(osc_create.c:595:osc_create()) lustre-OST0005-osc: oscc recovery failed: -4 05:15:03:LustreError: 3786:0:(osc_create.c:595:osc_create()) lustre-OST0006-osc: oscc recovery failed: -4 05:15:03:LustreError: 6245:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway 05:15:03:LustreError: 6245:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Skipped 1 previous similar message 05:15:03:LustreError: 6245:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 05:15:03:LustreError: 6245:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) Skipped 1 previous similar message 05:15:03:LustreError: 3780:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 0/7: rc = -4 05:15:03:LustreError: 3784:0:(osc_create.c:595:osc_create()) lustre-OST0004-osc: oscc recovery failed: -4 05:15:03:LustreError: 3784:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0004_UUID failed at mds_lov_clear_orphans: -4 05:15:03:LustreError: 3784:0:(mds_lov.c:1057:__mds_lov_synchronize()) Skipped 3 previous similar messages 05:15:03:LustreError: 3784:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0004_UUID sync failed -4, deactivating 05:15:03:LustreError: 3784:0:(mds_lov.c:1066:__mds_lov_synchronize()) Skipped 3 previous similar messages 05:15:03:LustreError: 3783:0:(osc_create.c:595:osc_create()) lustre-OST0003-osc: oscc recovery failed: -4 05:15:03:LustreError: 3478:0:(mds_open.c:442:mds_create_objects()) error creating objects for inode 524309: rc = -5 05:15:03:LustreError: 3478:0:(mds_open.c:827:mds_finish_open()) mds_create_objects: rc = -5 05:15:03:Lustre: 3478:0:(mds_reint.c:257:mds_finish_transno()) commit transaction for disconnected client 3e5acf3c-82ca-17ee-2609-2dd7736ba638: rc -5 <~snip~> 05:31:03:Lustre: lustre-OST0001-osc: Connection to service lustre-OST0001 via nid 192.168.4.21@o2ib was lost; in progress operations using this service will wait for recovery to complete. 05:31:03:LustreError: 11-0: an error occurred while communicating with 192.168.4.21@o2ib. The ost_connect operation failed with -16 Console log on OSS (client-21-ib) showed that: 05:14:54:Lustre: DEBUG MARKER: 1 : Starting failover on mds 05:15:16:Lustre: Service thread pid 14788 completed after 122.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 05:15:16:Lustre: Skipped 1 previous similar message 05:15:16:Lustre: 14830:0:(ldlm_lib.c:820:target_handle_connect()) lustre-OST0000: received new MDS connection from NID 192.168.4.20@o2ib, removing former export from same NID 05:15:16:Lustre: 14830:0:(ldlm_lib.c:820:target_handle_connect()) Skipped 16 previous similar messages 05:15:16:Lustre: lustre-OST0000: received MDS connection from 192.168.4.20@o2ib 05:15:16:Lustre: 14857:0:(filter.c:3129:filter_destroy_precreated()) lustre-OST0002: deleting orphan objects from 1013476 to 1017190, orphan objids won't be reused any more. 05:15:16:Lustre: Skipped 11 previous similar messages 05:15:16:LustreError: 11-0: an error occurred while communicating with 192.168.4.20@o2ib. The obd_ping operation failed with -107 05:15:16:LustreError: 166-1: MGC192.168.4.20@o2ib: Connection to service MGS via nid 192.168.4.20@o2ib was lost; in progress operations using this service will fail. 05:15:16:Lustre: 9791:0:(import.c:855:ptlrpc_connect_interpret()) MGS@MGC192.168.4.20@o2ib_0 changed server handle from 0x3e8c8b7064aac07f to 0x3e8c8b7064b57220 05:15:16:Lustre: MGC192.168.4.20@o2ib: Reactivating import 05:15:27:LustreError: 14877:0:(filter.c:3120:filter_destroy_precreated()) lustre-OST0003: destroy_in_progress already cleared 05:15:27:Lustre: Service thread pid 14809 completed after 137.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 05:15:27:LustreError: 14872:0:(filter.c:3120:filter_destroy_precreated()) lustre-OST0000: destroy_in_progress already cleared 05:15:27:Lustre: Service thread pid 14787 completed after 140.41s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 05:15:38:LustreError: 14884:0:(filter.c:3120:filter_destroy_precreated()) lustre-OST0004: destroy_in_progress already cleared 05:15:38:Lustre: Service thread pid 14810 completed after 148.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 05:15:38:LustreError: 14865:0:(filter.c:3120:filter_destroy_precreated()) lustre-OST0005: destroy_in_progress already cleared 05:15:38:LustreError: 17526:0:(filter.c:3234:filter_handle_precreate()) ASSERTION(ergo(!obd->obd_recovering, diff >= 0)) failed: lustre-OST0001: 1013476 - 1024134 = -10658 05:15:38:LustreError: 17526:0:(filter.c:3234:filter_handle_precreate()) LBUG 05:15:38:Pid: 17526, comm: ll_ost_creat_04 05:15:38: 05:15:38:Call Trace: 05:15:38: [<ffffffff8afc26a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs] 05:15:38: [<ffffffff8afc2bda>] lbug_with_loc+0x7a/0xd0 [libcfs] 05:15:38: [<ffffffff8b4782c6>] filter_create+0x1186/0x17d0 [obdfilter] 05:15:38: [<ffffffff8afcacfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs] 05:15:38: [<ffffffff8b15cec9>] lustre_pack_reply+0x29/0xb0 [ptlrpc] 05:15:38: [<ffffffff8b44d801>] ost_handle+0x1281/0x55c0 [ost] 05:15:38: [<ffffffff8afc7868>] libcfs_ip_addr2str+0x38/0x40 [libcfs] 05:15:38: [<ffffffff8b168874>] ptlrpc_server_handle_request+0x984/0xe00 [ptlrpc] 05:15:38: [<ffffffff8b169f16>] ptlrpc_main+0xf16/0x10e0 [ptlrpc] 05:15:38: [<ffffffff8005dfc1>] child_rip+0xa/0x11 05:15:38: [<ffffffff8b169000>] ptlrpc_main+0x0/0x10e0 [ptlrpc] 05:15:38: [<ffffffff8005dfb7>] child_rip+0x0/0x11 Maloo report: https://maloo.whamcloud.com/test_sets/bbb3f1bc-779a-11e2-987d-52540035b04c |
| Comments |
| Comment by Peter Jones [ 16/Feb/13 ] |
|
Niu Is this a duplicate of Peter |
| Comment by Jian Yu [ 16/Feb/13 ] |
|
The same test run by autotest passed over TCP network: https://maloo.whamcloud.com/test_sets/ad2ba008-7683-11e2-bc2f-52540035b04c |
| Comment by Niu Yawei (Inactive) [ 16/Feb/13 ] |
|
Looks it's not a duplicate of |
| Comment by Niu Yawei (Inactive) [ 16/Feb/13 ] |
|
Looks the story like: The obdfilter is very slow for some reason, which makes the ost thread stuck in orphan cleanup for a very long time: Lustre: Service thread pid 14810 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 14810, comm: ll_ost_32 Call Trace: [<ffffffff8006ed48>] do_gettimeofday+0x40/0x90 [<ffffffff800155b4>] sync_buffer+0x0/0x3f [<ffffffff800637de>] io_schedule+0x3f/0x67 [<ffffffff800155ef>] sync_buffer+0x3b/0x3f [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e [<ffffffff800155b4>] sync_buffer+0x0/0x3f [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff800a3c15>] wake_bit_function+0x0/0x23 [<ffffffff800175ee>] ll_rw_block+0x8c/0xab [<ffffffff8b3dde61>] ldiskfs_bread+0x51/0x80 [ldiskfs] [<ffffffff8b3eee05>] ldiskfs_find_entry+0x1c5/0x670 [ldiskfs] [<ffffffff8afcacfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs] [<ffffffff8b3f0097>] ldiskfs_lookup+0x57/0x200 [ldiskfs] [<ffffffff8003722b>] __lookup_hash+0x10b/0x130 [<ffffffff800ed617>] lookup_one_len+0x53/0x61 [<ffffffff8b46d3ed>] filter_fid2dentry+0x42d/0x740 [obdfilter] [<ffffffff8b21d18f>] filter_quota_adjust+0x27f/0x2b0 [lquota] [<ffffffff8000d585>] dput+0x2c/0x114 [<ffffffff8b475304>] filter_destroy+0x154/0x1f90 [obdfilter] [<ffffffff8006474d>] __down_failed_trylock+0x35/0x3a [<ffffffff8b477df9>] filter_create+0xcb9/0x17d0 [obdfilter] [<ffffffff8b15cec9>] lustre_pack_reply+0x29/0xb0 [ptlrpc] [<ffffffff8b44d801>] ost_handle+0x1281/0x55c0 [ost] [<ffffffff8afc7868>] libcfs_ip_addr2str+0x38/0x40 [libcfs] [<ffffffff8b168874>] ptlrpc_server_handle_request+0x984/0xe00 [ptlrpc] [<ffffffff8b169f16>] ptlrpc_main+0xf16/0x10e0 [ptlrpc] [<ffffffff8005dfc1>] child_rip+0xa/0x11 [<ffffffff8b169000>] ptlrpc_main+0x0/0x10e0 [ptlrpc] [<ffffffff8005dfb7>] child_rip+0x0/0x11 And the orhpan recovery RPC from MDS was interrupted: LustreError: 3782:0:(osc_create.c:595:osc_create()) lustre-OST0001-osc: oscc recovery failed: -4 LustreError: 3782:0:(lov_obd.c:1153:lov_clear_orphans()) error in orphan recovery on OST idx 1/7: rc = -4 LustreError: 3781:0:(osc_create.c:595:osc_create()) lustre-OST0002-osc: oscc recovery failed: -4 LustreError: 3782:0:(mds_lov.c:1057:__mds_lov_synchronize()) lustre-OST0001_UUID failed at mds_lov_clear_orphans: -4 LustreError: 3782:0:(mds_lov.c:1066:__mds_lov_synchronize()) lustre-OST0001_UUID sync failed -4, deactivating MDS failover, a new orphan recovery was sent, but this one was skipped because the obdfilter is still working on previous orhpan cleanup: LustreError: 14884:0:(filter.c:3120:filter_destroy_precreated()) lustre-OST0004: destroy_in_progress already cleared Lustre: Service thread pid 14810 completed after 148.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). So the oscc_last_id on MDS wasn't synced with the last_id on obdfilter at the end, and the assertion (id to created < last_id on obdfilter) was triggered when creating objects. |
| Comment by Niu Yawei (Inactive) [ 16/Feb/13 ] |
|
I don't see any reason why we should skip orhpan recovery in the filter_destroy_precreated(): if (!filter->fo_destroy_in_progress) { CERROR("%s: destroy_in_progress already cleared\n", exp->exp_obd->obd_name); RETURN(0); } That could result in MDS not synced with obdfilter about last_id, actually, redo the orphan destroy won't hurt anybody. patch for b1_8: http://review.whamcloud.com/5448 |
| Comment by Jian Yu [ 17/Feb/13 ] |
|
Lustre Tag: v1_8_9_WC1_RC1 MGS/MDS node: fat-amd-2-ib The large-scale test passed over IB network on the same distro/arch and Lustre build in manual run: |
| Comment by Niu Yawei (Inactive) [ 17/Apr/13 ] |
|
patch landed for 1.8.9 |