Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10740

replay-single test_2d: FAIL: checkstat -v /mnt/lustre/d2d.replay-single check failed

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.11.0, Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      2.7 (IEEL) based server, 2.10.58 client

      == replay-single test 2d: setdirstripe replay ======================================================== 00:25:51 (1519604751)
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID       344152        2752      317200   1% /mnt/lustre[MDT:0]
      lustre-MDT0001_UUID       344152        2492      317460   1% /mnt/lustre[MDT:1]
      lustre-OST0000_UUID       664400       17712      597688   3% /mnt/lustre[OST:0]
      lustre-OST0001_UUID       664400       17712      597688   3% /mnt/lustre[OST:1]
      
      filesystem_summary:      1328800       35424     1195376   3% /mnt/lustre
      
      Failing mds1 on fre1209
      Stopping /mnt/lustre-mds1 (opts:) on fre1209
      pdsh@fre1211: fre1209: ssh exited with exit code 1
      reboot facets: mds1
      Failover mds1 to fre1209
      00:26:02 (1519604762) waiting for fre1209 network 900 secs ...
      00:26:02 (1519604762) network interface is UP
      mount facets: mds1
      Starting mds1: -o rw,user_xattr  /dev/vdb /mnt/lustre-mds1
      pdsh@fre1211: fre1209: ssh exited with exit code 1
      pdsh@fre1211: fre1209: ssh exited with exit code 1
      Started lustre-MDT0000
      fre1212: fre1212: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
      fre1211: fre1211: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
      fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
      fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
      Can't lstat /mnt/lustre/d2d.replay-single: No such file or directory
       replay-single test_2d: @@@@@@ FAIL: checkstat -v  /mnt/lustre/d2d.replay-single check failed 
       
      

      100% reproducible with review-dne-zfs-part-4

      Attachments

        Issue Links

          Activity

            [LU-10740] replay-single test_2d: FAIL: checkstat -v /mnt/lustre/d2d.replay-single check failed

            I think this is a dup of LU-10143

            bzzz Alex Zhuravlev added a comment - I think this is a dup of LU-10143

            adilger, sorry, missed your question.. I'm using ramdisk.

            bzzz Alex Zhuravlev added a comment - adilger , sorry, missed your question.. I'm using ramdisk.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33106/
            Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 16e92e2d01a71c2a97cae89c70c58abf409c12c0

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33106/ Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 16e92e2d01a71c2a97cae89c70c58abf409c12c0

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33106
            Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a7675ca33ec579525b4d39ad114b54ca3b462a62

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33106 Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a7675ca33ec579525b4d39ad114b54ca3b462a62

            Alex, is your testing running with an HDD or SSD or ramdisk? On an HDD I can't see how a TXG commit could ever complete in less than 4x seek time (just for the überblock writes, 2x each at the start and end of the device), which is about 32-40 ms.

            adilger Andreas Dilger added a comment - Alex, is your testing running with an HDD or SSD or ramdisk? On an HDD I can't see how a TXG commit could ever complete in less than 4x seek time (just for the überblock writes, 2x each at the start and end of the device), which is about 32-40 ms.
            bzzz Alex Zhuravlev added a comment - - edited

            the patch I used:

            @@ -276,6 +276,17 @@ static int zfs_osd_mntdev_seq_show(struct seq_file *m, void *data)
             }
             LPROC_SEQ_FOPS_RO(zfs_osd_mntdev);
             
            +struct osd_sync_cb_data {
            +	ktime_t time;
            +};
            +
            +static void osd_sync_cb(void *cb_data, int error)
            +{
            +	struct osd_sync_cb_data *cb = cb_data;
            +	printk("sync took %llu usec\n", ktime_us_delta(ktime_get(), cb->time));
            +	OBD_FREE_PTR(cb);
            +}
            +
             static ssize_t
             lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer,
             				size_t count, loff_t *off)
            @@ -288,6 +299,19 @@ lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer,
             	rc = lu_env_init(&env, LCT_LOCAL);
             	if (rc)
             		return rc;
            +	{
            +		struct osd_device  *osd = osd_dt_dev(dt);
            +		struct osd_sync_cb_data *cb;
            +		dmu_tx_t *tx;
            +
            +		OBD_ALLOC_PTR(cb);
            +		cb->time = ktime_get();
            +		tx = dmu_tx_create(osd->od_os);
            +		dmu_tx_assign(tx, TXG_WAIT);
            +		dmu_tx_callback_register(tx, osd_sync_cb, cb);
            +		dmu_tx_commit(tx);
            +	}
            

            i.e. I registered a commit callback and just calc/printk when it's called.

            txg_wait_synced() asks sync thread to initiate txg commit:

            	if (tx->tx_sync_txg_waiting < txg)
            		tx->tx_sync_txg_waiting = txg;
            ...
            		cv_broadcast(&tx->tx_sync_more_cv);
            

             

            bzzz Alex Zhuravlev added a comment - - edited the patch I used: @@ -276,6 +276,17 @@ static int zfs_osd_mntdev_seq_show(struct seq_file *m, void *data) } LPROC_SEQ_FOPS_RO(zfs_osd_mntdev); +struct osd_sync_cb_data { + ktime_t time; +}; + + static void osd_sync_cb(void *cb_data, int error) +{ + struct osd_sync_cb_data *cb = cb_data; + printk( "sync took %llu usec\n" , ktime_us_delta(ktime_get(), cb->time)); + OBD_FREE_PTR(cb); +} + static ssize_t lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *off) @@ -288,6 +299,19 @@ lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer, rc = lu_env_init(&env, LCT_LOCAL); if (rc) return rc; + { + struct osd_device *osd = osd_dt_dev(dt); + struct osd_sync_cb_data *cb; + dmu_tx_t *tx; + + OBD_ALLOC_PTR(cb); + cb->time = ktime_get(); + tx = dmu_tx_create(osd->od_os); + dmu_tx_assign(tx, TXG_WAIT); + dmu_tx_callback_register(tx, osd_sync_cb, cb); + dmu_tx_commit(tx); + } i.e. I registered a commit callback and just calc/printk when it's called. txg_wait_synced() asks sync thread to initiate txg commit: if (tx->tx_sync_txg_waiting < txg) tx->tx_sync_txg_waiting = txg; ... cv_broadcast(&tx->tx_sync_more_cv);  
            adilger Andreas Dilger added a comment - - edited

            Alex, what are you using to force a TXG commit in your tests, and are you sure that the commit is actually making it to disk? It seems very fast in some cases, 0.1ms for a commit...

            static int osd_trans_stop(const struct lu_env *env, struct dt_device *dt,
                                      struct thandle *th)
            {
                    if (sync) {
                            if (osd_txg_sync_delay_us < 0)
                                    txg_wait_synced(dmu_objset_pool(osd->od_os), txg);
                            else
                                    udelay(osd_txg_sync_delay_us);
                    }
            

            My understanding is that txg_wait_synced() will just wait for the TXG commit to happen, it can't necessarily force a TXG commit like jbd2_journal_stop() can. While we don't want every thread to be forcing a TXT commit, IMHO the jbd2 code has it right - wait a short time for more threads to join the TXG and commit, then force the commit. The length of time to wait should be based on the current storage commit time, as jbd2_journal_stop() does, so that it is fast on flash and slower on disks to be most efficient.

            adilger Andreas Dilger added a comment - - edited Alex, what are you using to force a TXG commit in your tests, and are you sure that the commit is actually making it to disk? It seems very fast in some cases, 0.1ms for a commit... static int osd_trans_stop( const struct lu_env *env, struct dt_device *dt, struct thandle *th) { if (sync) { if (osd_txg_sync_delay_us < 0) txg_wait_synced(dmu_objset_pool(osd->od_os), txg); else udelay(osd_txg_sync_delay_us); } My understanding is that txg_wait_synced() will just wait for the TXG commit to happen, it can't necessarily force a TXG commit like jbd2_journal_stop() can. While we don't want every thread to be forcing a TXT commit, IMHO the jbd2 code has it right - wait a short time for more threads to join the TXG and commit, then force the commit. The length of time to wait should be based on the current storage commit time, as jbd2_journal_stop() does, so that it is fast on flash and slower on disks to be most efficient.

            Andreas, I added a simple debugging data - print time it takes to commit TXG upon enforced commit and initiated that many times during sanity.sh:

            sync took 8276 usec
            sync took 26489 usec
            sync took 15471 usec
            sync took 8526 usec
            sync took 36187 usec
            sync took 16634 usec
            sync took 283 usec
            sync took 8142 usec
            sync took 7335 usec
            sync took 609 usec
            sync took 641 usec
            sync took 660 usec
            sync took 9278 usec
            sync took 396 usec
            sync took 17179 usec
            sync took 401 usec
            sync took 455 usec
            sync took 403 usec
            sync took 532 usec
            sync took 424 usec
            sync took 9057 usec
            sync took 444 usec
            sync took 11116 usec
            sync took 16007 usec
            sync took 16611 usec
            sync took 680 usec
            sync took 240 usec
            sync took 289 usec
            sync took 256 usec
            sync took 289 usec
            sync took 311 usec
            sync took 7353 usec
            sync took 285 usec
            sync took 9369 usec
            sync took 13867 usec
            sync took 275 usec
            sync took 6440 usec
            sync took 327 usec
            sync took 14827 usec
            sync took 15640 usec
            sync took 17490 usec
            sync took 113 usec
            sync took 524 usec
            sync took 268 usec
            sync took 375 usec
            sync took 15034 usec
            sync took 297 usec
            sync took 247 usec
            sync took 401 usec
            sync took 9492 usec

            bzzz Alex Zhuravlev added a comment - Andreas, I added a simple debugging data - print time it takes to commit TXG upon enforced commit and initiated that many times during sanity.sh: sync took 8276 usec sync took 26489 usec sync took 15471 usec sync took 8526 usec sync took 36187 usec sync took 16634 usec sync took 283 usec sync took 8142 usec sync took 7335 usec sync took 609 usec sync took 641 usec sync took 660 usec sync took 9278 usec sync took 396 usec sync took 17179 usec sync took 401 usec sync took 455 usec sync took 403 usec sync took 532 usec sync took 424 usec sync took 9057 usec sync took 444 usec sync took 11116 usec sync took 16007 usec sync took 16611 usec sync took 680 usec sync took 240 usec sync took 289 usec sync took 256 usec sync took 289 usec sync took 311 usec sync took 7353 usec sync took 285 usec sync took 9369 usec sync took 13867 usec sync took 275 usec sync took 6440 usec sync took 327 usec sync took 14827 usec sync took 15640 usec sync took 17490 usec sync took 113 usec sync took 524 usec sync took 268 usec sync took 375 usec sync took 15034 usec sync took 297 usec sync took 247 usec sync took 401 usec sync took 9492 usec
            laisiyao Lai Siyao added a comment -

            FID seq file is updated in sync mode, so upon MDS fail, it won't allocate duplicate FID.

            replay-single.sh 2d failed because it happens to allocate a new super sequence before fail, If it's run separately it can pass.

            laisiyao Lai Siyao added a comment - FID seq file is updated in sync mode, so upon MDS fail, it won't allocate duplicate FID. replay-single.sh 2d failed because it happens to allocate a new super sequence before fail, If it's run separately it can pass.

            One problem we have with ZFS TXG sync is that this doesn't force a commit, only block the thread until the TXG commit happens. That should be equivalent from the point of view of the client, but it is possible the test doesn't expect this?

            Until we have ZIL support, it would be nice to have a way to force TXG commit, especially with flash devices that have no seek latency.

            adilger Andreas Dilger added a comment - One problem we have with ZFS TXG sync is that this doesn't force a commit, only block the thread until the TXG commit happens. That should be equivalent from the point of view of the client, but it is possible the test doesn't expect this? Until we have ZIL support, it would be nice to have a way to force TXG commit, especially with flash devices that have no seek latency.

            People

              laisiyao Lai Siyao
              egryaznova Elena Gryaznova
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: