[LU-10293] racer test_1: ASSERTION( list_empty(&lock->l_bl_ast) ) failed Created: 28/Nov/17  Updated: 22/Feb/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9997 Suspicious assert check in ldlm_cli_c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1603c320-d468-11e7-8027-52540065bddc.

The sub-test test_1 failed with the following error:

Timeout occurred after 659 mins, last suite running was racer, restarting cluster to continue tests

MDS 2/4 console

[35958.074210] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer test 1: racer on clients: onyx-37vm1,onyx-37vm2 DURATION=900 ================================ 20:00:57 \(1511841657\)
[35958.119902] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-37vm1,onyx-37vm2 DURATION=900 ================================ 20:00:57 (1511841657)
[36006.265976] Lustre: lustre-MDT0003: Client 4f4d252f-b32f-6403-c049-6cb9c41dce2f (at 10.2.8.81@tcp) reconnecting
[36161.131025] Lustre: lustre-MDT0001: Client 130c094c-47a0-0fa7-2a3e-e6aa71c19c09 (at 10.2.8.81@tcp) reconnecting
[36161.131036] Lustre: Skipped 1 previous similar message
[36223.548420] Lustre: lustre-MDT0001: Client 130c094c-47a0-0fa7-2a3e-e6aa71c19c09 (at 10.2.8.81@tcp) reconnecting
[36223.548434] Lustre: Skipped 1 previous similar message
[36294.232151] Lustre: lustre-MDT0001: Client 130c094c-47a0-0fa7-2a3e-e6aa71c19c09 (at 10.2.8.81@tcp) reconnecting
[36294.232154] Lustre: Skipped 1 previous similar message
[36352.302688] Lustre: lustre-MDT0003: Client 130c094c-47a0-0fa7-2a3e-e6aa71c19c09 (at 10.2.8.81@tcp) reconnecting
[36428.713708] Lustre: lustre-MDT0003: Client 130c094c-47a0-0fa7-2a3e-e6aa71c19c09 (at 10.2.8.81@tcp) reconnecting
[36428.713714] Lustre: Skipped 1 previous similar message
[36458.255367] Lustre: lustre-MDT0003: Client 4f4d252f-b32f-6403-c049-6cb9c41dce2f (at 10.2.8.81@tcp) reconnecting
[36487.286869] LustreError: 22735:0:(ldlm_request.c:1391:ldlm_cli_cancel()) ASSERTION( list_empty(&lock->l_bl_ast) ) failed: 
[36487.288485] LustreError: 22735:0:(ldlm_request.c:1391:ldlm_cli_cancel()) LBUG
[36487.289339] Pid: 22735, comm: mdt00_002
[36487.290026] 
[36487.290026] Call Trace:
[36487.291042]  [<ffffffff81019b19>] dump_trace+0x59/0x310
[36487.292134]  [<ffffffffa07326ca>] libcfs_call_trace+0x4a/0x60 [libcfs]
[36487.293012]  [<ffffffffa0732741>] lbug_with_loc+0x41/0xa0 [libcfs]
[36487.294205]  [<ffffffffa0bac704>] ldlm_cli_cancel+0x354/0x370 [ptlrpc]
[36487.295346]  [<ffffffffa102b5f6>] mdt_remote_blocking_ast+0x156/0x540 [mdt]
[36487.295393]  [<ffffffffa0bb7385>] ldlm_handle_bl_callback+0xc5/0x3e0 [ptlrpc]
[36487.295479]  [<ffffffffa0b8be36>] ldlm_lock_decref_internal+0x186/0x770 [ptlrpc]
[36487.295519]  [<ffffffffa0b8c509>] ldlm_lock_decref_and_cancel+0x79/0x140 [ptlrpc]
[36487.295532]  [<ffffffffa1038ccf>] mdt_object_unlock+0xdf/0x390 [mdt]
[36487.295547]  [<ffffffffa103ce92>] mdt_object_unlock_put+0x12/0x100 [mdt]
[36487.295560]  [<ffffffffa107b771>] mdt_lock_objects_in_linkea+0x880/0x9cf [mdt]
[36487.295574]  [<ffffffffa104d15c>] mdt_reint_migrate_internal.isra.38+0x[    3.010940] RPC: Registered named UNIX socket transport module.
[    3.010941] RPC: Registered udp transport module.
[    3.010942] RPC: Registered tcp transport module.
[    3.010942] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    3.164356] ACPI: bus type USB registered
[    3.164383] usbcore: registered new interface driver usbfs
[    3.164391] usbcore: registered new interface driver hub
[    3.164401] usbcore: registered new device driver usb
[    3.167321] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    3.176228] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver


 Comments   
Comment by Sarah Liu [ 28/Nov/17 ]

client and server: 2.10.2 RC1 SLES12 SP3 DNE

Comment by Jinshan Xiong (Inactive) [ 28/Nov/17 ]

This problem should have been fixed in FLR branch by https://review.whamcloud.com/29080

Comment by Bob Glossman (Inactive) [ 07/Dec/17 ]

another on b2_10 with sles12sp2:
https://testing.hpdd.intel.com/test_sets/883161dc-db9d-11e7-9c63-52540065bddc

Generated at Sat Feb 10 02:33:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.