Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16780

zfs's osd_sync() doesn't wait for commit callbacks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      zfs's osd_sync (implementing dt_sync()) can return before all related commit callbacks have been processed. this result in an incorrect quota state: quota "usage" (read in lquota_disk_read()) returns actual number, but "pending" is out of date (updated from the commit callback).
      finally qsd_acquire_local() returns EDQUOT:

      	/* use latest usage */
      	usage = lqe->lqe_usage;
      	/* take pending write into account */
      	usage += lqe->lqe_pending_write;
      	if (space + usage <= lqe->lqe_granted - lqe->lqe_pending_rel) {
      		lqe->lqe_pending_write += space;
      		lqe->lqe_waiting_write -= space;
      		rc = 0;
      	} else if (lqe->lqe_edquot &&
      		   (lqe->lqe_edquot_time > ktime_get_seconds() - 5)) {
      		rc = -EDQUOT;
      	} else {
      		rc = -EAGAIN;
      	}
      

      this is a snipped from the log confirming the problem:

      00040000:04000000:1.0:1682597449.976673:0:27241:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 0  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:952 waiting:1 req:0 usage: 0 qunit:1024 qtune:512 edquot:1 default:no
      00040000:04000000:1.0:1682597449.994977:0:7285:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 219  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:953 waiting:1 req:0 usage: 219 qunit:1024 qtune:512 edquot:1 default:no
      00040000:04000000:1.0:1682597450.084402:0:6415:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 879  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:299 waiting:1 req:0 usage: 879 qunit:1024 qtune:512 edquot:1 default:no
      00040000:04000000:1.0:1682597450.094358:0:6415:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 879  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:74 waiting:1 req:0 usage: 879 qunit:1024 qtune:512 edquot:1 default:no
      00040000:04000000:1.0:1682597450.186265:0:7285:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 953  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:74 waiting:1 req:0 usage: 953 qunit:1024 qtune:512 edquot:1 default:no
      ...
      00040000:04000000:1.0:1682597450.186948:0:7285:0:(qsd_handler.c:774:qsd_op_begin0()) $$$ acquire quota failed:-122  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:74 waiting:1 req:0 usage: 953 qunit:1024 qtune:512 edquot:1 default:no
      00040000:00000001:1.0:1682597450.186950:0:7285:0:(qsd_handler.c:830:qsd_op_begin0()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      ...
      00040000:04000000:1.0:1682597450.310321:0:6415:0:(qsd_entry.c:253:qsd_refresh_usage()) $$$ disk usage: 953  qsd:lustre-MDT0001 qtype:usr id:60000 enforced:1 granted: 1024 pending:0 waiting:0 req:0 usage: 953 qunit:1024 qtune:512 edquot:1 default:no
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: