[LU-12304] replay-single test_62: 'unlinkmany /mnt/lustre/d62.replay-single/f62.replay-single failed' Created: 15/May/19  Updated: 14/Jul/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Recent failure in 2.12.2 testing:

https://testing.whamcloud.com/test_sets/81a8b6dc-fdf0-11e8-b837-52540065bddc
CMD: onyx-30vm4 lctl set_param fail_loc=0
fail_loc=0
unlink(/mnt/lustre/d62.replay-single/f62.replay-single-0) error: No such file or directory
total: 0 unlinks in 0 seconds: -nan unlinks/second
replay-single test_62: @@@@@@ FAIL: unlinkmany /mnt/lustre/d62.replay-single/f62.replay-single failed

Earlier hit, erroneously attached to LU-11762:

https://testing.whamcloud.com/test_sets/81a8b6dc-fdf0-11e8-b837-52540065bddc



 Comments   
Comment by Patrick Farrell (Inactive) [ 15/May/19 ]

jamesanunez highlighted this slightly scary log snippet:

"We're seeing something similar with replay-single test 62 for ldiskfs/DNE for 2.12.2 RC1 at https://testing.whamcloud.com/test_sets/78994818-753c-11e9-a6f9-52540065bddc . We see the following in the client 2 dmesg"

[64633.042303] Lustre: DEBUG MARKER: == replay-single test 0d: expired recovery with no clients =========================================== 09:24:46 (1557653086)
[64633.892025] Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
[64634.219536] Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
[64647.402309] LustreError: 166-1: MGC10.2.4.96@tcp: Connection to MGS (at 10.2.4.96@tcp) was lost; in progress operations using this service will fail
[64655.316668] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-30vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[64655.543605] Lustre: DEBUG MARKER: onyx-30vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[64657.419063] Lustre: Evicted from MGS (at 10.2.4.96@tcp) after server handle changed from 0x4e283717d3799726 to 0x4e283717d3799d54
[64657.421383] LustreError: 17190:0:(import.c:1267:ptlrpc_connect_interpret()) lustre-MDT0000_UUID went back in time (transno 4295093706 was previously committed, server now claims 4295093699)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646
[64837.930298] LustreError: 11-0: lustre-MDT0000-mdc-ffff91839c315000: operation mds_reint to node 10.2.4.96@tcp failed: rc = -107
[64842.704776] LustreError: 167-0: lustre-MDT0000-mdc-ffff91839c315000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[64843.836846] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null 
Generated at Sat Feb 10 02:51:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.