Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.4.0-19chaos
-
3
-
11907
Description
We are using lustre 2.4.0-19chaos on our servers running with the ZFS OSD. On some of the OSS nodes we are seeing messages like this:
Nov 6 00:06:29 stout8 kernel: LustreError: 14909:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST0007: failed to update accounting ZAP for user 132245 (-2) Nov 6 00:06:29 stout8 kernel: LustreError: 14909:0:(osd_object.c:973:osd_attr_set()) Skipped 5 previous similar messages Nov 6 00:06:38 stout16 kernel: LustreError: 15266:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST000f: failed to update accounting ZAP for user 122392 (-2) Nov 6 00:06:38 stout16 kernel: LustreError: 15266:0:(osd_object.c:973:osd_attr_set()) Skipped 3 previous similar messages Nov 6 00:06:40 stout12 kernel: LustreError: 15801:0:(osd_object.c:973:osd_attr_set()) fsrzb-OST000b: failed to update accounting ZAP for user 122708 (-2) Nov 6 00:06:40 stout12 kernel: LustreError: 15801:0:(osd_object.c:973:osd_attr_set()) Skipped 4 previous similar messages
Nov 7 00:31:36 porter31 kernel: LustreError: 7704:0:(osd_object.c:973:osd_attr_set()) lse-OST001f: failed to update accounting ZAP for user 54916 (-2) Nov 7 02:53:05 porter19 kernel: LustreError: 9380:0:(osd_object.c:973:osd_attr_set()) lse-OST0013: failed to update accounting ZAP for user 7230 (-2)
Dec 3 12:01:21 stout7 kernel: Lustre: Skipped 3 previous similar messages Dec 3 13:52:30 stout4 kernel: LustreError: 15806:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0003: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout4 kernel: LustreError: 15806:0:(osd_object.c:967:osd_attr_set()) Skipped 3 previous similar messages Dec 3 13:52:30 stout1 kernel: LustreError: 15324:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0000: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout1 kernel: LustreError: 15784:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0000: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout14 kernel: LustreError: 16345:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST000d: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout12 kernel: LustreError: 32355:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST000b: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout2 kernel: LustreError: 15145:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0001: failed to update accounting ZAP for user 1752876224 (-2) Dec 3 13:52:30 stout10 kernel: LustreError: 14570:0:(osd_object.c:967:osd_attr_set()) fsrzb-OST0009: failed to update accounting ZAP for user 1752876224 (-2)
First of all, these messages are terrible. If you look at osd_attr_set() there are four exactly identical messages that are printed. Ok, granted, we can look them up by line number. But even better would be to make them unique.
So looking them up by line numbers 967 and 973, it would appear that we have hit at least the first two of the "filed to update accounting ZAP for user" messages.
Note that the UID numbers do not look correct to me. Many of them are clearly not in the valid UID range. But then I don't completely understand what is going on here yet.
Ahh OK. I remember now.
It looks to me like you're failing to call dmu_tx_commit() after dsl_sync_task_nowait(). The commit is responsible for dropping all the dnode holds and notifying any waiters. Without it the holds are just going to accumulate and I'd expect to see exactly what you're describing.
My suggestion would be to take a good look at the spa_history_log() function in zfs. It's a fairly nice example of how to go about this. In this case they create a tx per history update since there aren't that many of them. In the Lustre case I agree it's probably a good idea to batch them as your doing. However, the commit callback I'd still strongly suggest allowing a dedicated tx for this purpose. I think it would make the code more readable and easy to verify that you've constructed the tx properly. There's a nice comment in include/sys/dmu.h describing what you can and cannot do when constructing a tx. If you break any of those rules you're likely to see some strange problems.
The racy comment was just a subjective feeling I had about the code. There's so much non-private state you're depending on it's hard to easily look at the code and convince yourself it's safe. An good example of this is the reuse the the ot_tx. It's hard to know exactly what state the tx is in when you enter the function, and depending on what state that is there are certain things you must not do. If you were to create a new tx here it would be much clearer.