[LU-13804] LustreError (osd_index.c:1201:osd_dir_delete()) lquake-MDT0000: failed to destroy agent object (0) for the entry data049: rc = -22 Created: 20/Jul/20  Updated: 20/Oct/23  Resolved: 22/Jul/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Lai Siyao
Resolution: Not a Bug Votes: 0
Labels: llnl
Environment:

Lustre 2.10.8 and 2.12.5
RHEL 7.8


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

jet1 console log reports the following during an I/O SWL (and a few other misc jobs) on Opal:

Jul 15 09:51:54 jet1 kernel: LustreError: 1391:0:(osd_index.c:1201:osd_dir_delete()) lquake-MDT0000: failed to destroy agent object (0) for the entry data049: rc = -22
Jul 15 09:51:54 jet1 kernel: LustreError: 1391:0:(osd_index.c:1201:osd_dir_delete()) Skipped 1 previous similar message
Jul 15 09:51:54 jet1 kernel: LustreError: 17478:0:(osd_index.c:1201:osd_dir_delete()) lquake-MDT0000: failed to destroy agent object (0) for the entry data026: rc = -22
Jul 15 09:51:54 jet1 kernel: LustreError: 17478:0:(osd_index.c:1201:osd_dir_delete()) Skipped 75 previous similar messages

There were no console log messages on Opal that appeared to correspond.  No messages at 09:51, and no unusual messages at all on opal.

Testing both under Lustre 2.10 and Lustre 2.12 included creating striped directories via lfs mkdir -i3 -c4 <target>. Some of these directories were likely created under 2.10 and deleted under 2.12.

Before this occurred, the jet servers had been upgraded from Lustre 2.10 to Lustre 2.12, then downgraded to 2.10, and then upgraded to 2.12 again. Significant I/O was performed between each Lustre version, and changelog users were deregistered and logs cleared.



 Comments   
Comment by Olaf Faaland [ 20/Jul/20 ]

The osd_dir_delete() CERROR was added after 2.10 branched. The commit is

  • c0a455e LU-10190 osd-zfs: create agent object for remote object

Based on the commit message, I believe it's possible that creating remote objects under lustre 2.10, and then deleting them under 2.12, would trigger this error message.

Reaching this error message does not result in osd_dir_delete() returning early or returning an error.

So this seems like this is harmless (IE does not indicate damage to the file system). And it also seems like it is explained by my downgrade-then-upgrade.

Please confirm, thanks.

Comment by Olaf Faaland [ 20/Jul/20 ]

Note this was performed on our staging file system, not a production one; but we are about to put Lustre 2.12.5 on all our file systems, so I'm working through error messages to minimize risk.

Comment by Peter Jones [ 21/Jul/20 ]

Lai

Could you please advise?

Thanks

Peter

Comment by Lai Siyao [ 22/Jul/20 ]

Yes, it's exactly what you described.

Comment by Olaf Faaland [ 22/Jul/20 ]

Thank you

Generated at Sat Feb 10 03:04:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.