[LU-10740] replay-single test_2d: FAIL: checkstat -v /mnt/lustre/d2d.replay-single check failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0
Labels:
- dne
- zfs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

2.7 (IEEL) based server, 2.10.58 client

== replay-single test 2d: setdirstripe replay ======================================================== 00:25:51 (1519604751)
UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID       344152        2752      317200   1% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID       344152        2492      317460   1% /mnt/lustre[MDT:1]
lustre-OST0000_UUID       664400       17712      597688   3% /mnt/lustre[OST:0]
lustre-OST0001_UUID       664400       17712      597688   3% /mnt/lustre[OST:1]

filesystem_summary:      1328800       35424     1195376   3% /mnt/lustre

Failing mds1 on fre1209
Stopping /mnt/lustre-mds1 (opts:) on fre1209
pdsh@fre1211: fre1209: ssh exited with exit code 1
reboot facets: mds1
Failover mds1 to fre1209
00:26:02 (1519604762) waiting for fre1209 network 900 secs ...
00:26:02 (1519604762) network interface is UP
mount facets: mds1
Starting mds1: -o rw,user_xattr  /dev/vdb /mnt/lustre-mds1
pdsh@fre1211: fre1209: ssh exited with exit code 1
pdsh@fre1211: fre1209: ssh exited with exit code 1
Started lustre-MDT0000
fre1212: fre1212: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
fre1211: fre1211: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
Can't lstat /mnt/lustre/d2d.replay-single: No such file or directory
 replay-single test_2d: @@@@@@ FAIL: checkstat -v  /mnt/lustre/d2d.replay-single check failed

100% reproducible with review-dne-zfs-part-4

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

5a935964f72e62d872670430.zip
424 kB
28/Feb/18 4:06 PM

Issue Links

duplicates

LU-10143 LBUG dt_object.h:2166:dt_declare_record_write

Resolved

is duplicated by

LU-9157 replay-single test_80c: rmdir failed

Resolved

is related to

LU-11336 replay-single test 80d hangs on MDT unmount

Open

LU-11366 replay-single timeout test 80f: rm: cannot remove '/mnt/lustre/d80f.replay-single/remote_dir': Input/output error

Resolved

mentioned in: Page No Confluence page found with the given URL.; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(6 mentioned in)

Activity

[LU-10740] replay-single test_2d: FAIL: checkstat -v /mnt/lustre/d2d.replay-single check failed

Alex Zhuravlev added a comment - 24/Jan/19 6:43 AM

I think this is a dup of ~~LU-10143~~

Alex Zhuravlev added a comment - 24/Jan/19 6:43 AM I think this is a dup of LU-10143

Alex Zhuravlev added a comment - 10/Oct/18 4:19 PM

adilger, sorry, missed your question.. I'm using ramdisk.

Alex Zhuravlev added a comment - 10/Oct/18 4:19 PM adilger , sorry, missed your question.. I'm using ramdisk.

Gerrit Updater added a comment - 08/Sep/18 5:33 AM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33106/
Subject: ~~LU-10740~~ tests: disable tests for replay-dne-zfs-part-4
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 16e92e2d01a71c2a97cae89c70c58abf409c12c0

Gerrit Updater added a comment - 08/Sep/18 5:33 AM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33106/ Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 16e92e2d01a71c2a97cae89c70c58abf409c12c0

Gerrit Updater added a comment - 03/Sep/18 11:21 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33106
Subject: ~~LU-10740~~ tests: disable tests for replay-dne-zfs-part-4
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a7675ca33ec579525b4d39ad114b54ca3b462a62

Gerrit Updater added a comment - 03/Sep/18 11:21 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33106 Subject: LU-10740 tests: disable tests for replay-dne-zfs-part-4 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a7675ca33ec579525b4d39ad114b54ca3b462a62

Andreas Dilger added a comment - 03/Sep/18 10:27 PM

Alex, is your testing running with an HDD or SSD or ramdisk? On an HDD I can't see how a TXG commit could ever complete in less than 4x seek time (just for the überblock writes, 2x each at the start and end of the device), which is about 32-40 ms.

Andreas Dilger added a comment - 03/Sep/18 10:27 PM Alex, is your testing running with an HDD or SSD or ramdisk? On an HDD I can't see how a TXG commit could ever complete in less than 4x seek time (just for the überblock writes, 2x each at the start and end of the device), which is about 32-40 ms.

Alex Zhuravlev added a comment - 29/Aug/18 3:31 AM - edited

the patch I used:

@@ -276,6 +276,17 @@ static int zfs_osd_mntdev_seq_show(struct seq_file *m, void *data)
 }
 LPROC_SEQ_FOPS_RO(zfs_osd_mntdev);
 
+struct osd_sync_cb_data {
+	ktime_t time;
+};
+
+static void osd_sync_cb(void *cb_data, int error)
+{
+	struct osd_sync_cb_data *cb = cb_data;
+	printk("sync took %llu usec\n", ktime_us_delta(ktime_get(), cb->time));
+	OBD_FREE_PTR(cb);
+}
+
 static ssize_t
 lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer,
 				size_t count, loff_t *off)
@@ -288,6 +299,19 @@ lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer,
 	rc = lu_env_init(&env, LCT_LOCAL);
 	if (rc)
 		return rc;
+	{
+		struct osd_device  *osd = osd_dt_dev(dt);
+		struct osd_sync_cb_data *cb;
+		dmu_tx_t *tx;
+
+		OBD_ALLOC_PTR(cb);
+		cb->time = ktime_get();
+		tx = dmu_tx_create(osd->od_os);
+		dmu_tx_assign(tx, TXG_WAIT);
+		dmu_tx_callback_register(tx, osd_sync_cb, cb);
+		dmu_tx_commit(tx);
+	}

i.e. I registered a commit callback and just calc/printk when it's called.

txg_wait_synced() asks sync thread to initiate txg commit:

	if (tx->tx_sync_txg_waiting < txg)
		tx->tx_sync_txg_waiting = txg;
...
		cv_broadcast(&tx->tx_sync_more_cv);

Alex Zhuravlev added a comment - 29/Aug/18 3:31 AM - edited the patch I used: @@ -276,6 +276,17 @@ static int zfs_osd_mntdev_seq_show(struct seq_file *m, void *data) } LPROC_SEQ_FOPS_RO(zfs_osd_mntdev); +struct osd_sync_cb_data { + ktime_t time; +}; + + static void osd_sync_cb(void *cb_data, int error) +{ + struct osd_sync_cb_data *cb = cb_data; + printk( "sync took %llu usec\n" , ktime_us_delta(ktime_get(), cb->time)); + OBD_FREE_PTR(cb); +} + static ssize_t lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *off) @@ -288,6 +299,19 @@ lprocfs_osd_force_sync_seq_write(struct file *file, const char __user *buffer, rc = lu_env_init(&env, LCT_LOCAL); if (rc) return rc; + { + struct osd_device *osd = osd_dt_dev(dt); + struct osd_sync_cb_data *cb; + dmu_tx_t *tx; + + OBD_ALLOC_PTR(cb); + cb->time = ktime_get(); + tx = dmu_tx_create(osd->od_os); + dmu_tx_assign(tx, TXG_WAIT); + dmu_tx_callback_register(tx, osd_sync_cb, cb); + dmu_tx_commit(tx); + } i.e. I registered a commit callback and just calc/printk when it's called. txg_wait_synced() asks sync thread to initiate txg commit: if (tx->tx_sync_txg_waiting < txg) tx->tx_sync_txg_waiting = txg; ... cv_broadcast(&tx->tx_sync_more_cv);

Andreas Dilger added a comment - 28/Aug/18 8:00 PM - edited

Alex, what are you using to force a TXG commit in your tests, and are you sure that the commit is actually making it to disk? It seems very fast in some cases, 0.1ms for a commit...

static int osd_trans_stop(const struct lu_env *env, struct dt_device *dt,
                          struct thandle *th)
{
        if (sync) {
                if (osd_txg_sync_delay_us < 0)
                        txg_wait_synced(dmu_objset_pool(osd->od_os), txg);
                else
                        udelay(osd_txg_sync_delay_us);
        }

My understanding is that txg_wait_synced() will just wait for the TXG commit to happen, it can't necessarily force a TXG commit like jbd2_journal_stop() can. While we don't want every thread to be forcing a TXT commit, IMHO the jbd2 code has it right - wait a short time for more threads to join the TXG and commit, then force the commit. The length of time to wait should be based on the current storage commit time, as jbd2_journal_stop() does, so that it is fast on flash and slower on disks to be most efficient.

Andreas Dilger added a comment - 28/Aug/18 8:00 PM - edited Alex, what are you using to force a TXG commit in your tests, and are you sure that the commit is actually making it to disk? It seems very fast in some cases, 0.1ms for a commit... static int osd_trans_stop( const struct lu_env *env, struct dt_device *dt, struct thandle *th) { if (sync) { if (osd_txg_sync_delay_us < 0) txg_wait_synced(dmu_objset_pool(osd->od_os), txg); else udelay(osd_txg_sync_delay_us); } My understanding is that txg_wait_synced() will just wait for the TXG commit to happen, it can't necessarily force a TXG commit like jbd2_journal_stop() can. While we don't want every thread to be forcing a TXT commit, IMHO the jbd2 code has it right - wait a short time for more threads to join the TXG and commit, then force the commit. The length of time to wait should be based on the current storage commit time, as jbd2_journal_stop() does, so that it is fast on flash and slower on disks to be most efficient.

Alex Zhuravlev added a comment - 28/Aug/18 7:25 PM

Andreas, I added a simple debugging data - print time it takes to commit TXG upon enforced commit and initiated that many times during sanity.sh:

sync took 8276 usec
sync took 26489 usec
sync took 15471 usec
sync took 8526 usec
sync took 36187 usec
sync took 16634 usec
sync took 283 usec
sync took 8142 usec
sync took 7335 usec
sync took 609 usec
sync took 641 usec
sync took 660 usec
sync took 9278 usec
sync took 396 usec
sync took 17179 usec
sync took 401 usec
sync took 455 usec
sync took 403 usec
sync took 532 usec
sync took 424 usec
sync took 9057 usec
sync took 444 usec
sync took 11116 usec
sync took 16007 usec
sync took 16611 usec
sync took 680 usec
sync took 240 usec
sync took 289 usec
sync took 256 usec
sync took 289 usec
sync took 311 usec
sync took 7353 usec
sync took 285 usec
sync took 9369 usec
sync took 13867 usec
sync took 275 usec
sync took 6440 usec
sync took 327 usec
sync took 14827 usec
sync took 15640 usec
sync took 17490 usec
sync took 113 usec
sync took 524 usec
sync took 268 usec
sync took 375 usec
sync took 15034 usec
sync took 297 usec
sync took 247 usec
sync took 401 usec
sync took 9492 usec

Alex Zhuravlev added a comment - 28/Aug/18 7:25 PM Andreas, I added a simple debugging data - print time it takes to commit TXG upon enforced commit and initiated that many times during sanity.sh: sync took 8276 usec sync took 26489 usec sync took 15471 usec sync took 8526 usec sync took 36187 usec sync took 16634 usec sync took 283 usec sync took 8142 usec sync took 7335 usec sync took 609 usec sync took 641 usec sync took 660 usec sync took 9278 usec sync took 396 usec sync took 17179 usec sync took 401 usec sync took 455 usec sync took 403 usec sync took 532 usec sync took 424 usec sync took 9057 usec sync took 444 usec sync took 11116 usec sync took 16007 usec sync took 16611 usec sync took 680 usec sync took 240 usec sync took 289 usec sync took 256 usec sync took 289 usec sync took 311 usec sync took 7353 usec sync took 285 usec sync took 9369 usec sync took 13867 usec sync took 275 usec sync took 6440 usec sync took 327 usec sync took 14827 usec sync took 15640 usec sync took 17490 usec sync took 113 usec sync took 524 usec sync took 268 usec sync took 375 usec sync took 15034 usec sync took 297 usec sync took 247 usec sync took 401 usec sync took 9492 usec

Lai Siyao added a comment - 27/Aug/18 12:27 AM

FID seq file is updated in sync mode, so upon MDS fail, it won't allocate duplicate FID.

replay-single.sh 2d failed because it happens to allocate a new super sequence before fail, If it's run separately it can pass.

Lai Siyao added a comment - 27/Aug/18 12:27 AM FID seq file is updated in sync mode, so upon MDS fail, it won't allocate duplicate FID. replay-single.sh 2d failed because it happens to allocate a new super sequence before fail, If it's run separately it can pass.

Andreas Dilger added a comment - 26/Aug/18 3:42 PM

One problem we have with ZFS TXG sync is that this doesn't force a commit, only block the thread until the TXG commit happens. That should be equivalent from the point of view of the client, but it is possible the test doesn't expect this?

Until we have ZIL support, it would be nice to have a way to force TXG commit, especially with flash devices that have no seek latency.

Andreas Dilger added a comment - 26/Aug/18 3:42 PM One problem we have with ZFS TXG sync is that this doesn't force a commit, only block the thread until the TXG commit happens. That should be equivalent from the point of view of the client, but it is possible the test doesn't expect this? Until we have ZIL support, it would be nice to have a way to force TXG commit, especially with flash devices that have no seek latency.

People

Assignee:: Lai Siyao

Reporter:: Elena Gryaznova

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/Feb/18 4:06 PM

Updated:: 26/Feb/19 7:44 AM

Resolved:: 26/Feb/19 7:44 AM