[LU-5129] CLONE - failed to update accounting ZAP for user Created: 01/Jun/14  Updated: 02/Jun/14  Resolved: 02/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: mn4, mq213
Environment:

Lustre 2.4.0-19chaos


Attachments: Text File 0001-LU-4345-quota-debug-patch.patch     Text File debug_patch-v1.patch    
Issue Links:
Related
is related to LU-4345 failed to update accounting ZAP for user Resolved
is related to LU-4504 User quota problem after Lustre upgra... Resolved
Severity: 3
Rank (Obsolete): 14154

 Description   

We are using lustre 2.4.0-19chaos on our servers running with the ZFS OSD. On some of the OSS nodes we are seeing messages like this:

Nov  6 00:06:29 stout8 kernel: LustreError: 14909:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST0007: failed to update accounting ZAP for user 132245 (-2)
Nov  6 00:06:29 stout8 kernel: LustreError: 14909:0:(osd_object.c:973:osd_attr_set()) Skipped 5 previous similar messages
Nov  6 00:06:38 stout16 kernel: LustreError: 15266:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST000f: failed to update accounting ZAP for user 122392 (-2)
Nov  6 00:06:38 stout16 kernel: LustreError: 15266:0:(osd_object.c:973:osd_attr_set()) Skipped 3 previous similar messages
Nov  6 00:06:40 stout12 kernel: LustreError: 15801:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST000b: failed to update accounting ZAP for user 122708 (-2)
Nov  6 00:06:40 stout12 kernel: LustreError: 15801:0:(osd_object.c:973:osd_attr_set()) Skipped 4 previous similar messages
Nov  7 00:31:36 porter31 kernel: LustreError: 7704:0:(osd_object.c:973:osd_attr_set()) lse-OST001f: failed to update accounting ZAP for user 54916 (-2)
Nov  7 02:53:05 porter19 kernel: LustreError: 9380:0:(osd_object.c:973:osd_attr_set()) lse-OST0013: failed to update accounting ZAP for user 7230 (-2)
Dec  3 12:01:21 stout7 kernel: Lustre: Skipped 3 previous similar messages
Dec  3 13:52:30 stout4 kernel: LustreError: 15806:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0003: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout4 kernel: LustreError: 15806:0:(osd_object.c:967:osd_attr_set()) Skipped 3 previous similar messages
Dec  3 13:52:30 stout1 kernel: LustreError: 15324:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0000: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout1 kernel: LustreError: 15784:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0000: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout14 kernel: LustreError: 16345:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST000d: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout12 kernel: LustreError: 32355:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST000b: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout2 kernel: LustreError: 15145:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0001: failed to update accounting ZAP for user 1752876224 (-2)
Dec  3 13:52:30 stout10 kernel: LustreError: 14570:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0009: failed to update accounting ZAP for user 1752876224 (-2)

First of all, these messages are terrible. If you look at osd_attr_set() there are four exactly identical messages that are printed. Ok, granted, we can look them up by line number. But even better would be to make them unique.

So looking them up by line numbers 967 and 973, it would appear that we have hit at least the first two of the "filed to update accounting ZAP for user" messages.

Note that the UID numbers do not look correct to me. Many of them are clearly not in the valid UID range. But then I don't completely understand what is going on here yet.



 Comments   
Comment by Jodi Levi (Inactive) [ 01/Jun/14 ]

This ticket will track the follow on work for this issue that will be addressed in 2.7

Comment by Andreas Dilger [ 02/Jun/14 ]

In particular the patch http://review.whamcloud.com/7157 needs to be refreshed.

Comment by Andreas Dilger [ 02/Jun/14 ]

Sorry, I didn't look closely enough. 7157 is tracked under LU-2600.

Generated at Sat Feb 10 01:48:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.