Details
-
Technical task
-
Resolution: Duplicate
-
Blocker
-
Lustre 2.5.0
-
Patches submitted to autotest
-
9548
Description
from https://maloo.whamcloud.com/test_sets/0afc2c56-fc86-11e2-8ce2-52540035b04c
This sanity-hsm test 21 seems to be hitting a lot right now
Wrong block number is one of the errors seen.
test_21 Error: 'wrong block number' Failure Rate: 33.00% of last 100 executions [all branches]
== sanity-hsm test 21: Simple release tests == 23:18:20 (1375510700) 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.353933 s, 5.9 MB/s sanity-hsm test_21: @@@@@@ FAIL: wrong block number Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4202:error_noexit()
Attachments
Issue Links
- is related to
-
LU-3704 sanity-hsm test_21 - test_24 Error: 'could not release file'
-
- Resolved
-
Looks like two different bugs.
The conv=fsync option should indeed result in a single OST_SYNC RPC sent at the end of the writes. I suspect that this was previously skipped because OST writes were always synchronous (so fsync() was a no-op), and the async journal commit feature was developed on b1_8 and this wasn't fixed in the CLIO code when it landed. It should be noted that the Lustre OST_SYNC allows syncing a range of data on a single object, so the mapping of the VFS sync_page_range() method should map its range to the RPC, and extract that from the RPC on the server side(it migh already do this.
The second problem about the client not updating the blocks count based on reply values should also be investigated. I expect that the ZFS block count is not updated by the time the write is submitted, so it doesn't reply with the new block count to the client. However, the subsequent OST_SYNC should result in the right blocks count being returned to the client and then being cached under the DLM lock.