The issue was hit with the lustre code w/o LU-15880 landed, that code contains an incorrect integer values conversion when tgt_cb_last_committed() calls dt_reserve_or_free_quota():
dt_reserve_or_free_quota(&temp_env, th->th_dev,
th->th_reserved_quota.qrr_type,
th->th_reserved_quota.qrr_id.qid_uid,
th->th_reserved_quota.qrr_id.qid_gid,
-th->th_reserved_quota.qrr_count,
false);
the "count" parameter in dt_reserve_or_free_quota prototype is inly 4-bytes "int", while it is used to pass an 64bit value down the call stack. For enough big numbers of quota space to be released, the sign bit of the value can be lost.. the underlaying functions use sign of the passing value to determine whether it is quota "reservation" or "freeing". Eventually, in qsd_reserve_or_free_quota(), it causes wrong function to be called, qsd_op_begin0 instead of qsd_op_end0 at the transaction commit time.
here the values from a crash dump:
And, LU-15880 fixes that by getting rid of "int count" parameter in dt_reserve_or_free_quota(). So the issue should not be reproducible with the changes from LU-15880 patches in place.
Closing as a dup of
LU-15880.