[LU-10067] LBUG mdt_handler.c:222:mdt_lock_pdo_mode() Created: 03/Oct/17 Updated: 09/Nov/17 Resolved: 09/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | James Casper | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis, full DNE |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
https://testing.hpdd.intel.com/test_sessions/00511056-db38-4995-a4f8-b35856fa09f7 racer, test_1: test failed to respond and timed out From MDS console: 06:18:07:[19681.903127] LustreError: 22680:0:(mdt_handler.c:222:mdt_lock_pdo_mode()) ASSERTION( lh->mlh_pdo_mode == LCK_MINMODE ) failed: 06:18:07:[19681.908979] LustreError: 22680:0:(mdt_handler.c:222:mdt_lock_pdo_mode()) LBUG 06:18:07:[19681.912061] Pid: 22680, comm: mdt00_037 06:18:07:[19681.914860] 06:18:07:[19681.914860] Call Trace: 06:18:07:[19681.920079] [<ffffffffc06817ae>] libcfs_call_trace+0x4e/0x60 [libcfs] 06:18:07:[19681.923062] [<ffffffffc068183c>] lbug_with_loc+0x4c/0xb0 [libcfs] 06:18:07:[19681.926146] [<ffffffffc11b59db>] mdt_object_local_lock+0xa4b/0xad0 [mdt] 06:18:07:[19681.928946] [<ffffffffc11b56db>] ? mdt_object_local_lock+0x74b/0xad0 [mdt] 06:18:07:[19681.931741] [<ffffffffc11a5f10>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] 06:18:07:[19681.934380] [<ffffffffc0e33770>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 06:18:07:[19681.937367] [<ffffffffc11b5ad0>] mdt_object_lock_internal+0x70/0x330 [mdt] 06:18:07:[19681.940203] [<ffffffffc11b5ad0>] ? mdt_object_lock_internal+0x70/0x330 [mdt] 06:18:07:[19681.943129] [<ffffffffc11b5db0>] mdt_object_lock+0x20/0x30 [mdt] 06:18:07:[19681.945893] [<ffffffffc11fa3f4>] mdt_lock_objects_in_linkea+0x748/0xa68 [mdt] 06:18:07:[19681.948705] [<ffffffffc11ca148>] mdt_reint_migrate_internal.isra.38+0x8c8/0x16d0 [mdt] 06:18:07:[19681.951559] [<ffffffffc0692002>] ? cfs_hash_bd_from_key+0x32/0xb0 [libcfs] 06:18:07:[19681.954250] [<ffffffffc0c41788>] ? lu_object_put+0x148/0x3d0 [obdclass] 06:18:07:[19681.956946] [<ffffffffc11cb1b5>] mdt_reint_rename_or_migrate.isra.39+0x265/0x860 [mdt] 06:18:07:[19681.959450] [<ffffffffc11c0181>] ? mdt_root_squash+0x21/0x430 [mdt] 06:18:07:[19681.961948] [<ffffffff8132c212>] ? strlcpy+0x42/0x60 06:18:07:[19681.964254] [<ffffffffc11cb7c0>] mdt_reint_migrate+0x10/0x20 [mdt] 06:18:07:[19681.966606] [<ffffffffc11cf790>] mdt_reint_rec+0x80/0x210 [mdt] 06:18:07:[19681.969134] [<ffffffffc11b131b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] 06:18:07:[19681.971704] [<ffffffffc11bcda7>] mdt_reint+0x67/0x140 [mdt] 06:18:07:[19681.974204] [<ffffffffc0ecc225>] tgt_request_handle+0x925/0x1370 [ptlrpc] 06:18:07:[19681.976507] [<ffffffffc0e750c6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] 06:18:07:[19681.979113] [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90 06:18:07:[19681.981580] [<ffffffffc0e78862>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] 06:18:07:[19681.983929] [<ffffffff81029557>] ? __switch_to+0xd7/0x510 06:18:07:[19681.986432] [<ffffffff816a8f00>] ? __schedule+0x330/0x8b0 06:18:07:[19681.988691] [<ffffffffc0e77dd0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] 06:18:07:[19681.990751] [<ffffffff810b098f>] kthread+0xcf/0xe0 06:18:07:[19681.993085] [<ffffffff810b08c0>] ? kthread+0x0/0xe0 06:18:07:[19681.995015] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 06:18:07:[19681.997259] [<ffffffff810b08c0>] ? kthread+0x0/0xe0 06:18:07:[19681.999174] 06:18:07:[19682.000888] Kernel panic - not syncing: LBUG 06:18:07:[19682.001870] CPU: 1 PID: 22680 Comm: mdt00_037 Tainted: P OE ------------ 3.10.0-693.1.1.el7_lustre.x86_64 #1 06:18:07:[19682.001870] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 06:18:07:[19682.001870] ffff8800664d1f00 0000000068d88604 ffff88004cf8b898 ffffffff816a3d6d 06:18:07:[19682.001870] ffff88004cf8b918 ffffffff8169dc54 ffffffff00000008 ffff88004cf8b928 06:18:07:[19682.001870] ffff88004cf8b8c8 0000000068d88604 0000000068d88604 ffff88007fd0f8b8 06:18:07:[19682.001870] Call Trace: 06:18:07:[19682.001870] [<ffffffff816a3d6d>] dump_stack+0x19/0x1b 06:18:07:[19682.001870] [<ffffffff8169dc54>] panic+0xe8/0x20d 06:18:07:[19682.001870] [<ffffffffc0681854>] lbug_with_loc+0x64/0xb0 [libcfs] 06:18:07:[19682.001870] [<ffffffffc11b59db>] mdt_object_local_lock+0xa4b/0xad0 [mdt] 06:18:07:[19682.001870] [<ffffffffc11b56db>] ? mdt_object_local_lock+0x74b/0xad0 [mdt] 06:18:07:[19682.001870] [<ffffffffc11a5f10>] ? mdt_obd_reconnect+0x1d0/0x1d0 [mdt] 06:18:07:[19682.001870] [<ffffffffc0e33770>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc] 06:18:07:[19682.001870] [<ffffffffc11b5ad0>] mdt_object_lock_internal+0x70/0x330 [mdt] 06:18:07:[19682.001870] [<ffffffffc11b5ad0>] ? mdt_object_lock_internal+0x70/0x330 [mdt] 06:18:07:[19682.001870] [<ffffffffc11b5db0>] mdt_object_lock+0x20/0x30 [mdt] 06:18:07:[19682.001870] [<ffffffffc11fa3f4>] mdt_lock_objects_in_linkea+0x748/0xa68 [mdt] 06:18:07:[19682.001870] [<ffffffffc11ca148>] mdt_reint_migrate_internal.isra.38+0x8c8/0x16d0 [mdt] 06:18:07:[19682.001870] [<ffffffffc0692002>] ? cfs_hash_bd_from_key+0x32/0xb0 [libcfs] 06:18:07:[19682.001870] [<ffffffffc0c41788>] ? lu_object_put+0x148/0x3d0 [obdclass] 06:18:07:[19682.001870] [<ffffffffc11cb1b5>] mdt_reint_rename_or_migrate.isra.39+0x265/0x860 [mdt] 06:18:07:[19682.001870] [<ffffffffc11c0181>] ? mdt_root_squash+0x21/0x430 [mdt] 06:18:07:[19682.001870] [<ffffffff8132c212>] ? strlcpy+0x42/0x60 06:18:07:[19682.001870] [<ffffffffc11cb7c0>] mdt_reint_migrate+0x10/0x20 [mdt] 06:18:07:[19682.001870] [<ffffffffc11cf790>] mdt_reint_rec+0x80/0x210 [mdt] 06:18:07:[19682.001870] [<ffffffffc11b131b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] 06:18:07:[19682.001870] [<ffffffffc11bcda7>] mdt_reint+0x67/0x140 [mdt] 06:18:07:[19682.001870] [<ffffffffc0ecc225>] tgt_request_handle+0x925/0x1370 [ptlrpc] 06:18:07:[19682.001870] [<ffffffffc0e750c6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] 06:18:07:[19682.001870] [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90 06:18:07:[19682.001870] [<ffffffffc0e78862>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] 06:18:07:[19682.001870] [<ffffffff81029557>] ? __switch_to+0xd7/0x510 06:18:07:[19682.001870] [<ffffffff816a8f00>] ? __schedule+0x330/0x8b0 06:18:07:[19682.001870] [<ffffffffc0e77dd0>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc] 06:18:07:[19682.001870] [<ffffffff810b098f>] kthread+0xcf/0xe0 06:18:07:[19682.001870] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 06:18:07:[19682.001870] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 06:18:07:[19682.001870] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 |
| Comments |
| Comment by James Nunez (Inactive) [ 03/Oct/17 ] |
|
In the test_log, we see CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_27457 trevis-6vm4: Warning: Permanently added 'trevis-6vm4' (ECDSA) to the list of known hosts. trevis-6vm4: Warning: Permanently added 'trevis-6vm3' (ECDSA) to the list of known hosts. CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null trevis-6vm4: Warning: Permanently added 'trevis-6vm8' (ECDSA) to the list of known hosts. CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_29957 CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null CMD: trevis-6vm4 /usr/sbin/lctl snapshot_destroy -F lustre -n lss_29957 -f CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12954 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_2287 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_14963 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12585 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable trevis-6vm4: Connection closed by UNKNOWN port 65535 … Although racer test_1 hangs frequently during autotesting and does fail with “Resource temporarily unavailable” messages in the test log, we don’t see the LBUG in the MDS logs in previous runs of racer. |
| Comment by James Nunez (Inactive) [ 04/Oct/17 ] |
|
Hongchao, Would you please comment on this issue? James |
| Comment by Gerrit Updater [ 13/Oct/17 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/29597 |
| Comment by Gerrit Updater [ 09/Nov/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29597/ |
| Comment by Minh Diep [ 09/Nov/17 ] |
|
Landed for 2.11 |