Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6087

sanity test_300b: mtime not change after create

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • 3
    • 16943

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/25cf25bc-9661-11e4-854d-5254006e85c2.

      The sub-test test_300b failed with the following error:

      mtime not change after create
      

      Please provide additional information about the failure here.

      Info required for matching: sanity 300b

      Attachments

        Issue Links

          Activity

            [LU-6087] sanity test_300b: mtime not change after create

            Patch landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13473/
            Subject: LU-6087 lod: use correct attrs in striped directory create
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4109062f3d6999353672e1d685b09bf38f9d6c37

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13473/ Subject: LU-6087 lod: use correct attrs in striped directory create Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4109062f3d6999353672e1d685b09bf38f9d6c37

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/13473
            Subject: LU-6087 lod: use correct attrs in striped directory create
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 518b4efade578cceabcd37f7730c1229889631bc

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/13473 Subject: LU-6087 lod: use correct attrs in striped directory create Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 518b4efade578cceabcd37f7730c1229889631bc
            jhammond John Hammond added a comment - - edited

            > It would be interesting to change the clock on the MDS and OSS to be wildly out of sync with the client and each other, and then run sanity and sanityN (it is still running on a single node, with two mountpoints) to see what fails. That would make it much more obvious where the bad timestamps are coming from.

            I have 4 VMs each set to a different day of the week.

            t0: -0day = Tue, mdt1
            t1: -1day = Mon, mdt2,3,4
            t2: -2day = Sun, ost*
            t3: -3day = Sat, client
            

            Here is what I ran on t3 (the client).

            (
              pdsh -Rssh -w "root@t[0-3]" umount -a -t ldiskfs
              (export CONFIG=/root/local.sh; source $CONFIG; sh $LUSTRE/tests/llmount.sh) &> /dev/null
              echo BEFORE
              pdsh -Rssh -w "root@t[0-3]" date | sort
              lfs mkdir -c4 /mnt/lustre/d0
              echo AFTER
              stat /mnt/lustre/d0
              pdsh -Rssh -w "root@t[0-3]" date | sort
              (export CONFIG=/root/local.sh; source $CONFIG; sh $LUSTRE/tests/llmountcleanup.sh) &> /dev/null
            
              ssh t0 insmod /root/lustre-release/ldiskfs/ldiskfs.ko
              ssh t1 insmod /root/lustre-release/ldiskfs/ldiskfs.ko
              ssh t0 mkdir -p /tmp/mdt1
              ssh t1 mkdir -p /tmp/mdt{2,3,4}
              ssh t0 mount /tmp/lustre-mdt1 /tmp/mdt1 -o loop,ro -t ldiskfs
              ssh t1 mount /tmp/lustre-mdt2 /tmp/mdt2 -o loop,ro -t ldiskfs
              ssh t1 mount /tmp/lustre-mdt3 /tmp/mdt3 -o loop,ro -t ldiskfs
              ssh t1 mount /tmp/lustre-mdt4 /tmp/mdt4 -o loop,ro -t ldiskfs
            
              ssh t0 stat /tmp/mdt1/ROOT/d0 /tmp/mdt1/ROOT/d0/*
              ssh t1 stat /tmp/mdt{2,3,4}/REMOTE_PARENT_DIR/*
            ) 2>&1 | tee mkdir.out
            

            Here is the edited output. As you would expect after a create, for each inode stated the a,c,m times were equal. So I only show one timestamp per inode.

            Host times before the create:
                    t0: Tue Jan 20 13:29:00 CST 2015
            	t1: Mon Jan 19 13:28:58 CST 2015
            	t2: Sun Jan 18 13:28:59 CST 2015
            	t3: Sat Jan 17 13:28:59 CST 2015
            /mnt/lustre/d0
            	2015-01-20 13:29:01.000000000 -0600 # Time seen by client, wrong, comes from t0.
            /tmp/mdt1/ROOT/d0
            	2015-01-17 13:28:59.000000000 -0600 # Correct.
            /tmp/mdt1/ROOT/d0/[0x400000401:0x1:0x0]:0
                    2015-01-20 13:29:01.139000330 -0600 # Wrong, from t0.
            /tmp/mdt1/ROOT/d0/[0x440000400:0x1:0x0]:1
                    2015-01-20 13:29:01.139000330 -0600 # Wrong but does not matter, from t0.
            /tmp/mdt1/ROOT/d0/[0x480000400:0x1:0x0]:2
                    2015-01-20 13:29:01.139000330 -0600 # ...
            /tmp/mdt1/ROOT/d0/[0x4c0000400:0x1:0x0]:3
                    2015-01-20 13:29:01.139000330 -0600 # ...
            /tmp/mdt2/REMOTE_PARENT_DIR/0x440000400:0x1:0x0
                    2015-01-17 13:28:59.000000000 -0600 # Correct.
            /tmp/mdt3/REMOTE_PARENT_DIR/0x480000400:0x1:0x0
                    2015-01-17 13:28:59.000000000 -0600 # Correct.
            /tmp/mdt4/REMOTE_PARENT_DIR/0x4c0000400:0x1:0x0
                    2015-01-17 13:28:59.000000000 -0600 # Correct.
            

            Here is the full output:

            BEFORE
            t0: Tue Jan 20 13:29:00 CST 2015
            t1: Mon Jan 19 13:28:58 CST 2015
            t2: Sun Jan 18 13:28:59 CST 2015
            t3: Sat Jan 17 13:28:59 CST 2015
            AFTER
              File: `/mnt/lustre/d0'
              Size: 16384     	Blocks: 8          IO Block: 4096   directory
            Device: 2c54f966h/743766374d	Inode: 288230393331580929  Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-20 13:29:01.000000000 -0600
            Modify: 2015-01-20 13:29:01.000000000 -0600
            Change: 2015-01-20 13:29:01.000000000 -0600
            t0: Tue Jan 20 13:29:01 CST 2015
            t1: Mon Jan 19 13:28:59 CST 2015
            t2: Sun Jan 18 13:29:00 CST 2015
            t3: Sat Jan 17 13:29:00 CST 2015
              File: `/tmp/mdt1/ROOT/d0'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 62558       Links: 6
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-17 13:28:59.000000000 -0600
            Modify: 2015-01-17 13:28:59.000000000 -0600
            Change: 2015-01-17 13:28:59.000000000 -0600
              File: `/tmp/mdt1/ROOT/d0/[0x400000401:0x1:0x0]:0'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 31302       Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-20 13:29:01.139000330 -0600
            Modify: 2015-01-20 13:29:01.139000330 -0600
            Change: 2015-01-20 13:29:01.139000330 -0600
              File: `/tmp/mdt1/ROOT/d0/[0x440000400:0x1:0x0]:1'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 62559       Links: 2
            Access: (0000/d---------)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-20 13:29:01.139000330 -0600
            Modify: 2015-01-20 13:29:01.139000330 -0600
            Change: 2015-01-20 13:29:01.139000330 -0600
              File: `/tmp/mdt1/ROOT/d0/[0x480000400:0x1:0x0]:2'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 62560       Links: 2
            Access: (0000/d---------)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-20 13:29:01.139000330 -0600
            Modify: 2015-01-20 13:29:01.139000330 -0600
            Change: 2015-01-20 13:29:01.139000330 -0600
              File: `/tmp/mdt1/ROOT/d0/[0x4c0000400:0x1:0x0]:3'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 62561       Links: 2
            Access: (0000/d---------)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-20 13:29:01.139000330 -0600
            Modify: 2015-01-20 13:29:01.139000330 -0600
            Change: 2015-01-20 13:29:01.139000330 -0600
              File: `/tmp/mdt2/REMOTE_PARENT_DIR/0x440000400:0x1:0x0'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 700h/1792d	Inode: 31302       Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-17 13:28:59.000000000 -0600
            Modify: 2015-01-17 13:28:59.000000000 -0600
            Change: 2015-01-17 13:28:59.000000000 -0600
              File: `/tmp/mdt3/REMOTE_PARENT_DIR/0x480000400:0x1:0x0'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 701h/1793d	Inode: 31302       Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-17 13:28:59.000000000 -0600
            Modify: 2015-01-17 13:28:59.000000000 -0600
            Change: 2015-01-17 13:28:59.000000000 -0600
              File: `/tmp/mdt4/REMOTE_PARENT_DIR/0x4c0000400:0x1:0x0'
              Size: 4096      	Blocks: 8          IO Block: 4096   directory
            Device: 702h/1794d	Inode: 62552       Links: 2
            Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2015-01-17 13:28:59.000000000 -0600
            Modify: 2015-01-17 13:28:59.000000000 -0600
            Change: 2015-01-17 13:28:59.000000000 -0600
            

            So we are wrongly using a server timestamp for the stripe 0 agent inode which is affecting the client visible timestamps.

            jhammond John Hammond added a comment - - edited > It would be interesting to change the clock on the MDS and OSS to be wildly out of sync with the client and each other, and then run sanity and sanityN (it is still running on a single node, with two mountpoints) to see what fails. That would make it much more obvious where the bad timestamps are coming from. I have 4 VMs each set to a different day of the week. t0: -0day = Tue, mdt1 t1: -1day = Mon, mdt2,3,4 t2: -2day = Sun, ost* t3: -3day = Sat, client Here is what I ran on t3 (the client). ( pdsh -Rssh -w "root@t[0-3]" umount -a -t ldiskfs (export CONFIG=/root/local.sh; source $CONFIG; sh $LUSTRE/tests/llmount.sh) &> /dev/ null echo BEFORE pdsh -Rssh -w "root@t[0-3]" date | sort lfs mkdir -c4 /mnt/lustre/d0 echo AFTER stat /mnt/lustre/d0 pdsh -Rssh -w "root@t[0-3]" date | sort (export CONFIG=/root/local.sh; source $CONFIG; sh $LUSTRE/tests/llmountcleanup.sh) &> /dev/ null ssh t0 insmod /root/lustre-release/ldiskfs/ldiskfs.ko ssh t1 insmod /root/lustre-release/ldiskfs/ldiskfs.ko ssh t0 mkdir -p /tmp/mdt1 ssh t1 mkdir -p /tmp/mdt{2,3,4} ssh t0 mount /tmp/lustre-mdt1 /tmp/mdt1 -o loop,ro -t ldiskfs ssh t1 mount /tmp/lustre-mdt2 /tmp/mdt2 -o loop,ro -t ldiskfs ssh t1 mount /tmp/lustre-mdt3 /tmp/mdt3 -o loop,ro -t ldiskfs ssh t1 mount /tmp/lustre-mdt4 /tmp/mdt4 -o loop,ro -t ldiskfs ssh t0 stat /tmp/mdt1/ROOT/d0 /tmp/mdt1/ROOT/d0/* ssh t1 stat /tmp/mdt{2,3,4}/REMOTE_PARENT_DIR/* ) 2>&1 | tee mkdir.out Here is the edited output. As you would expect after a create, for each inode stated the a,c,m times were equal. So I only show one timestamp per inode. Host times before the create: t0: Tue Jan 20 13:29:00 CST 2015 t1: Mon Jan 19 13:28:58 CST 2015 t2: Sun Jan 18 13:28:59 CST 2015 t3: Sat Jan 17 13:28:59 CST 2015 /mnt/lustre/d0 2015-01-20 13:29:01.000000000 -0600 # Time seen by client, wrong, comes from t0. /tmp/mdt1/ROOT/d0 2015-01-17 13:28:59.000000000 -0600 # Correct. /tmp/mdt1/ROOT/d0/[0x400000401:0x1:0x0]:0 2015-01-20 13:29:01.139000330 -0600 # Wrong, from t0. /tmp/mdt1/ROOT/d0/[0x440000400:0x1:0x0]:1 2015-01-20 13:29:01.139000330 -0600 # Wrong but does not matter, from t0. /tmp/mdt1/ROOT/d0/[0x480000400:0x1:0x0]:2 2015-01-20 13:29:01.139000330 -0600 # ... /tmp/mdt1/ROOT/d0/[0x4c0000400:0x1:0x0]:3 2015-01-20 13:29:01.139000330 -0600 # ... /tmp/mdt2/REMOTE_PARENT_DIR/0x440000400:0x1:0x0 2015-01-17 13:28:59.000000000 -0600 # Correct. /tmp/mdt3/REMOTE_PARENT_DIR/0x480000400:0x1:0x0 2015-01-17 13:28:59.000000000 -0600 # Correct. /tmp/mdt4/REMOTE_PARENT_DIR/0x4c0000400:0x1:0x0 2015-01-17 13:28:59.000000000 -0600 # Correct. Here is the full output: BEFORE t0: Tue Jan 20 13:29:00 CST 2015 t1: Mon Jan 19 13:28:58 CST 2015 t2: Sun Jan 18 13:28:59 CST 2015 t3: Sat Jan 17 13:28:59 CST 2015 AFTER File: `/mnt/lustre/d0' Size: 16384 Blocks: 8 IO Block: 4096 directory Device: 2c54f966h/743766374d Inode: 288230393331580929 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-20 13:29:01.000000000 -0600 Modify: 2015-01-20 13:29:01.000000000 -0600 Change: 2015-01-20 13:29:01.000000000 -0600 t0: Tue Jan 20 13:29:01 CST 2015 t1: Mon Jan 19 13:28:59 CST 2015 t2: Sun Jan 18 13:29:00 CST 2015 t3: Sat Jan 17 13:29:00 CST 2015 File: `/tmp/mdt1/ROOT/d0' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 62558 Links: 6 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-17 13:28:59.000000000 -0600 Modify: 2015-01-17 13:28:59.000000000 -0600 Change: 2015-01-17 13:28:59.000000000 -0600 File: `/tmp/mdt1/ROOT/d0/[0x400000401:0x1:0x0]:0' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 31302 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-20 13:29:01.139000330 -0600 Modify: 2015-01-20 13:29:01.139000330 -0600 Change: 2015-01-20 13:29:01.139000330 -0600 File: `/tmp/mdt1/ROOT/d0/[0x440000400:0x1:0x0]:1' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 62559 Links: 2 Access: (0000/d---------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-20 13:29:01.139000330 -0600 Modify: 2015-01-20 13:29:01.139000330 -0600 Change: 2015-01-20 13:29:01.139000330 -0600 File: `/tmp/mdt1/ROOT/d0/[0x480000400:0x1:0x0]:2' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 62560 Links: 2 Access: (0000/d---------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-20 13:29:01.139000330 -0600 Modify: 2015-01-20 13:29:01.139000330 -0600 Change: 2015-01-20 13:29:01.139000330 -0600 File: `/tmp/mdt1/ROOT/d0/[0x4c0000400:0x1:0x0]:3' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 62561 Links: 2 Access: (0000/d---------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-20 13:29:01.139000330 -0600 Modify: 2015-01-20 13:29:01.139000330 -0600 Change: 2015-01-20 13:29:01.139000330 -0600 File: `/tmp/mdt2/REMOTE_PARENT_DIR/0x440000400:0x1:0x0' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 700h/1792d Inode: 31302 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-17 13:28:59.000000000 -0600 Modify: 2015-01-17 13:28:59.000000000 -0600 Change: 2015-01-17 13:28:59.000000000 -0600 File: `/tmp/mdt3/REMOTE_PARENT_DIR/0x480000400:0x1:0x0' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 701h/1793d Inode: 31302 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-17 13:28:59.000000000 -0600 Modify: 2015-01-17 13:28:59.000000000 -0600 Change: 2015-01-17 13:28:59.000000000 -0600 File: `/tmp/mdt4/REMOTE_PARENT_DIR/0x4c0000400:0x1:0x0' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 702h/1794d Inode: 62552 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-01-17 13:28:59.000000000 -0600 Modify: 2015-01-17 13:28:59.000000000 -0600 Change: 2015-01-17 13:28:59.000000000 -0600 So we are wrongly using a server timestamp for the stripe 0 agent inode which is affecting the client visible timestamps.
            jhammond John Hammond added a comment -

            In my local testing http://review.whamcloud.com/#/c/13396/ did not fix this issue. I still saw about 20% failure.

            jhammond John Hammond added a comment - In my local testing http://review.whamcloud.com/#/c/13396/ did not fix this issue. I still saw about 20% failure.
            di.wang Di Wang added a comment -

            I checked the code, and it seems the ctime/mtime got from client has been set on every stripe correctly.

            di.wang Di Wang added a comment - I checked the code, and it seems the ctime/mtime got from client has been set on every stripe correctly.

            Perhaps the stripe dir code used server time mistakenly somehow? I didn't see from the code yet. Di, could you take a look when you have time?

            Current timestamps update policy doesn't make sense to me, I think we'd fix it anyway: http://review.whamcloud.com/#/c/13396/

            niu Niu Yawei (Inactive) added a comment - Perhaps the stripe dir code used server time mistakenly somehow? I didn't see from the code yet. Di, could you take a look when you have time? Current timestamps update policy doesn't make sense to me, I think we'd fix it anyway: http://review.whamcloud.com/#/c/13396/

            Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13396
            Subject: LU-6087 mdd: update timestamps arbitrarily
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 216d148039da83c9af16c76b15f86ad940cf29e0

            gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13396 Subject: LU-6087 mdd: update timestamps arbitrarily Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 216d148039da83c9af16c76b15f86ad940cf29e0

            I think this is just hiding an underlying bug. With a single client, all the timestamps for files should come from the client and not the server. That syncing the clocks between the client and server VMs fixes the problem only points more toward the fact that a server timestamp is being used somewhere.

            It would be interesting to change the clock on the MDS and OSS to be wildly out of sync with the client and each other, and then run sanity and sanityN (it is still running on a single node, with two mountpoints) to see what fails. That would make it much more obvious where the bad timestamps are coming from.

            adilger Andreas Dilger added a comment - I think this is just hiding an underlying bug. With a single client, all the timestamps for files should come from the client and not the server. That syncing the clocks between the client and server VMs fixes the problem only points more toward the fact that a server timestamp is being used somewhere. It would be interesting to change the clock on the MDS and OSS to be wildly out of sync with the client and each other, and then run sanity and sanityN (it is still running on a single node, with two mountpoints) to see what fails. That would make it much more obvious where the bad timestamps are coming from.
            jhammond John Hammond added a comment -

            On 4 local VMs I had about 0.5 second drift between the client and mdt0:

            $ time pdsh -l root -Rssh -w t[0-3] date '+%s.%N'
            t0: 1420658127.115688630 # mdt0
            t1: 1420658127.471434603 # mdt1-3
            t2: 1420658127.660439409 # ost0-7
            t3: 1420658126.647549504 # client
            
            real    0m0.133s
            user    0m0.048s
            sys     0m0.024s
            

            With 1 second sleeps I saw 6 failures in 10 runs of 300b.

            With 2 second sleeps I saw no failures in 10 runs of 300b.

            Then I synced my VMs:

            $ pdsh -l root -Rssh -w t[0-3] ntpdate ntp1.sbcglobal.net
            t3:  7 Jan 13:16:48 ntpdate[25263]: step time server 68.94.156.17 offset 15.446993 sec
            t2:  7 Jan 13:16:48 ntpdate[15588]: step time server 68.94.156.17 offset 14.437126 sec
            t0:  7 Jan 13:16:48 ntpdate[10770]: step time server 68.94.156.17 offset 14.969598 sec
            t1:  7 Jan 13:16:48 ntpdate[12428]: step time server 68.94.156.17 offset 14.613771 sec
            $ time pdsh -l root -Rssh -w t[0-3] date '+%s.%N'
            t0: 1420658219.028891881
            t1: 1420658219.031173939
            t3: 1420658219.031840292
            t2: 1420658219.036044339
            
            real    0m0.135s
            user    0m0.048s
            sys     0m0.028s
            

            With 1 second sleeps I saw no failures in 10 runs of 300b. I also see no failures when running everything on a single node.

            jhammond John Hammond added a comment - On 4 local VMs I had about 0.5 second drift between the client and mdt0: $ time pdsh -l root -Rssh -w t[0-3] date '+%s.%N' t0: 1420658127.115688630 # mdt0 t1: 1420658127.471434603 # mdt1-3 t2: 1420658127.660439409 # ost0-7 t3: 1420658126.647549504 # client real 0m0.133s user 0m0.048s sys 0m0.024s With 1 second sleeps I saw 6 failures in 10 runs of 300b. With 2 second sleeps I saw no failures in 10 runs of 300b. Then I synced my VMs: $ pdsh -l root -Rssh -w t[0-3] ntpdate ntp1.sbcglobal.net t3: 7 Jan 13:16:48 ntpdate[25263]: step time server 68.94.156.17 offset 15.446993 sec t2: 7 Jan 13:16:48 ntpdate[15588]: step time server 68.94.156.17 offset 14.437126 sec t0: 7 Jan 13:16:48 ntpdate[10770]: step time server 68.94.156.17 offset 14.969598 sec t1: 7 Jan 13:16:48 ntpdate[12428]: step time server 68.94.156.17 offset 14.613771 sec $ time pdsh -l root -Rssh -w t[0-3] date '+%s.%N' t0: 1420658219.028891881 t1: 1420658219.031173939 t3: 1420658219.031840292 t2: 1420658219.036044339 real 0m0.135s user 0m0.048s sys 0m0.028s With 1 second sleeps I saw no failures in 10 runs of 300b. I also see no failures when running everything on a single node.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: