Details

    • Technical task
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.5.0
    • Patches submitted to autotest
    • 9548

    Description

      from https://maloo.whamcloud.com/test_sets/0afc2c56-fc86-11e2-8ce2-52540035b04c

      This sanity-hsm test 21 seems to be hitting a lot right now
      Wrong block number is one of the errors seen.

      test_21 	
      
          Error: 'wrong block number'
          Failure Rate: 33.00% of last 100 executions [all branches] 
      
      == sanity-hsm test 21: Simple release tests == 23:18:20 (1375510700)
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.353933 s, 5.9 MB/s
       sanity-hsm test_21: @@@@@@ FAIL: wrong block number 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4202:error_noexit()
      

      Attachments

        Issue Links

          Activity

            [LU-3700] sanity-hsm test_21 Error: 'wrong block number'

            I am closing this ticket per email comments from Andreas:
            LU-3700 has been worked around for now at the test level, so it can probably be closed. The other bugs (LU-4388 and LU4389) are tracking the root cause of the LU-3700 failure.

            Cheers, Andreas

            jlevi Jodi Levi (Inactive) added a comment - I am closing this ticket per email comments from Andreas: LU-3700 has been worked around for now at the test level, so it can probably be closed. The other bugs ( LU-4388 and LU4389) are tracking the root cause of the LU-3700 failure. Cheers, Andreas
            yong.fan nasf (Inactive) added a comment - Another failure instance: https://maloo.whamcloud.com/test_sets/459d8ae0-6e86-11e3-b713-52540035b04c

            Patch http://review.whamcloud.com/8575 was landed, so hopefully this test will now pass. The related failures (IMHO even more serious) still need to be fixed.

            adilger Andreas Dilger added a comment - Patch http://review.whamcloud.com/8575 was landed, so hopefully this test will now pass. The related failures (IMHO even more serious) still need to be fixed.

            No OST_SYNC from fsync() is LU-4388
            New blockcount from OST_SYNC not being kept is LU-4389

            utopiabound Nathaniel Clark added a comment - No OST_SYNC from fsync() is LU-4388 New blockcount from OST_SYNC not being kept is LU-4389

            Looks like two different bugs.

            The conv=fsync option should indeed result in a single OST_SYNC RPC sent at the end of the writes. I suspect that this was previously skipped because OST writes were always synchronous (so fsync() was a no-op), and the async journal commit feature was developed on b1_8 and this wasn't fixed in the CLIO code when it landed. It should be noted that the Lustre OST_SYNC allows syncing a range of data on a single object, so the mapping of the VFS sync_page_range() method should map its range to the RPC, and extract that from the RPC on the server side(it migh already do this.

            The second problem about the client not updating the blocks count based on reply values should also be investigated. I expect that the ZFS block count is not updated by the time the write is submitted, so it doesn't reply with the new block count to the client. However, the subsequent OST_SYNC should result in the right blocks count being returned to the client and then being cached under the DLM lock.

            adilger Andreas Dilger added a comment - Looks like two different bugs. The conv=fsync option should indeed result in a single OST_SYNC RPC sent at the end of the writes. I suspect that this was previously skipped because OST writes were always synchronous (so fsync() was a no-op), and the async journal commit feature was developed on b1_8 and this wasn't fixed in the CLIO code when it landed. It should be noted that the Lustre OST_SYNC allows syncing a range of data on a single object, so the mapping of the VFS sync_page_range() method should map its range to the RPC, and extract that from the RPC on the server side(it migh already do this. The second problem about the client not updating the blocks count based on reply values should also be investigated. I expect that the ZFS block count is not updated by the time the write is submitted, so it doesn't reply with the new block count to the client. However, the subsequent OST_SYNC should result in the right blocks count being returned to the client and then being cached under the DLM lock.
            utopiabound Nathaniel Clark added a comment - - edited

            conv=fsync causes an MDS_SYNC to be sent to the MDT (but no syncs are sent to the OSTs), it does not cause the OST_BRW_ASYNC flag to be cleared in the OST_WRITE, so the ost does not think it needs to sync the data to disk.

            So even changing conv=fsync to oflag=sync (which causes OST_SYNCs to be sent) here is the wire traffic:
            OST_WRITE sent (OST_BRW_ASYNC is set), returns 1 block
            OST_SYNC sent, returns 2053 blocks
            OST_WRITE sent (OST_BRW_ASYNC is set), returns 2053 blocks
            OST_SYNC sent, returns 4101 bocks (the correct amount)

            A stat of the file at this point only shows 2053 (the last OST_WRITE amount).

            For ldiskfs the wire traffic is:
            OST_WRITE sent (OST_BRW_ASYNC is set), returns 2048 block
            OST_SYNC sent, returns 2048 blocks
            OST_WRITE sent (OST_BRW_ASYNC is set), returns 4096 blocks (the correct amount)
            OST_SYNC sent, returns 4096 bocks

            Looking at the client code, on OST_SYNC's processing, the oa is ignored.

            utopiabound Nathaniel Clark added a comment - - edited conv=fsync causes an MDS_SYNC to be sent to the MDT (but no syncs are sent to the OSTs), it does not cause the OST_BRW_ASYNC flag to be cleared in the OST_WRITE, so the ost does not think it needs to sync the data to disk. So even changing conv=fsync to oflag=sync (which causes OST_SYNCs to be sent) here is the wire traffic: OST_WRITE sent (OST_BRW_ASYNC is set), returns 1 block OST_SYNC sent, returns 2053 blocks OST_WRITE sent (OST_BRW_ASYNC is set), returns 2053 blocks OST_SYNC sent, returns 4101 bocks (the correct amount) A stat of the file at this point only shows 2053 (the last OST_WRITE amount). For ldiskfs the wire traffic is: OST_WRITE sent (OST_BRW_ASYNC is set), returns 2048 block OST_SYNC sent, returns 2048 blocks OST_WRITE sent (OST_BRW_ASYNC is set), returns 4096 blocks (the correct amount) OST_SYNC sent, returns 4096 bocks Looking at the client code, on OST_SYNC's processing, the oa is ignored.

            BTW, since my change #8467 also enable full debug logs and has landed already, we may be able to use such logs gathered during recent failures of sanity-hsm/test_20 now with the "wrong block number after archive: …" symptom, to troubleshoot.

            bfaccini Bruno Faccini (Inactive) added a comment - BTW, since my change #8467 also enable full debug logs and has landed already, we may be able to use such logs gathered during recent failures of sanity-hsm/test_20 now with the "wrong block number after archive: …" symptom, to troubleshoot.

            I didn't see this comment before I inspected the patch. However, this implies a different bug in ZFS. Namely, the "small_file" helper in sanity-hsm is creating files with conv=fsync, so the fsync on close should flush all the blocks from the client cache and onto disk on the OST. After that point the blocks count for the file should not change. Likewise, if the locks are cancelled at the client, it should flush all the dirty blocks from cache to disk on the OST.

            This makes me wonder if ZFS is implementing fsync properly at the server.

            adilger Andreas Dilger added a comment - I didn't see this comment before I inspected the patch. However, this implies a different bug in ZFS. Namely, the "small_file" helper in sanity-hsm is creating files with conv=fsync, so the fsync on close should flush all the blocks from the client cache and onto disk on the OST. After that point the blocks count for the file should not change. Likewise, if the locks are cancelled at the client, it should flush all the dirty blocks from cache to disk on the OST. This makes me wonder if ZFS is implementing fsync properly at the server.

            My previous patch actually skipped wrong test.

            http://review.whamcloud.com/8575

            utopiabound Nathaniel Clark added a comment - My previous patch actually skipped wrong test. http://review.whamcloud.com/8575

            ZFS reports blocks written to disk, and not # of blocks used if the file were fully written to disk, so the number actually changes after the write completes in ZFS. It's not a locking or consistency issue in Lustre.

            utopiabound Nathaniel Clark added a comment - ZFS reports blocks written to disk, and not # of blocks used if the file were fully written to disk, so the number actually changes after the write completes in ZFS. It's not a locking or consistency issue in Lustre.

            People

              bfaccini Bruno Faccini (Inactive)
              keith Keith Mannthey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: