[LU-2304] Test failure sanityn test_16: dual-mount fsx data read error Created: 08/Nov/12  Updated: 14/Dec/12  Resolved: 08/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0, Lustre 2.1.4

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: NFBlocker

Issue Links:
Related
is related to LU-2452 parallel-scale test_write_append_trun... Closed
Severity: 3
Rank (Obsolete): 5513

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/9c6f1590-2978-11e2-8600-52540035b04c.

The sub-test test_16 failed with the following error in the test output:

Chance of close/open is 1 in 50
Seed set to 2417
fd 0: /mnt/lustre/f.sanityn.16
fd 1: /mnt/lustre2/f.sanityn.16
1: 1352342417.730543 MAPWRITE 0x32aa5e thru 0x33283c (0x7ddf bytes)
2: 1352342417.751680 READ     0x1f84c9 thru 0x200ba0 (0x86d8 bytes)
3: 1352342417.872135 WRITE    0x1b3674 thru 0x1bed98 (0xb725 bytes)
4: 1352342417.881507 MAPREAD  0xf1843 thru 0xf5284 (0x3a42 bytes)
5: 1352342417.893255 READ     0x224e35 thru 0x230af5 (0xbcc1 bytes)
6: 1352342417.900307 TRUNCATE UP	from 0x33283d to 0x6979fe
7: 1352342417.918382 WRITE    0x5f4a1 thru 0x6e924 (0xf484 bytes)
8: 1352342417.967363 WRITE    0x4b29fa thru 0x4c0ff5 (0xe5fc bytes)
9: 1352342417.977098 MAPREAD  0x4164fe thru 0x4207e6 (0xa2e9 bytes)
10: 1352342418.100538 WRITE    0x2fad28 thru 0x303bb0 (0x8e89 bytes)
11: 1352342418.201485 TRUNCATE DOWN	from 0x6979fe to 0x258c7a
12: 1352342418.219849 MAPREAD  0xfa3d6 thru 0x100b0f (0x673a bytes)
13: 1352342418.347736 WRITE    0x303e61 thru 0x30d252 (0x93f2 bytes) HOLE
14: 1352342418.353891 MAPWRITE 0x4e321 thru 0x5ce11 (0xeaf1 bytes)
15: 1352342418.394662 WRITE    0x896fe thru 0x93ead (0xa7b0 bytes)	***WWWW
16: 1352342418.400602 MAPWRITE 0x44df8f thru 0x452e00 (0x4e72 bytes)
17: 1352342418.419486 WRITE    0x25ac40 thru 0x25c836 (0x1bf7 bytes)
18: 1352342418.423533 WRITE    0x45f04d thru 0x4698a9 (0xa85d bytes) HOLE
19: 1352342418.483128 TRUNCATE DOWN	from 0x4698aa to 0x17f453
20: 1352342418.725636 MAPWRITE 0x9302ea thru 0x93fc06 (0xf91d bytes)
21: 1352342418.747719 MAPREAD  0x222ff1 thru 0x232993 (0xf9a3 bytes)
22: 1352342418.800646 MAPREAD  0x4a069d thru 0x4a6124 (0x5a88 bytes)
23: 1352342418.826136 WRITE    0x20567a thru 0x20c51b (0x6ea2 bytes)
24: 1352342418.885348 WRITE    0x2e5f90 thru 0x2e63cd (0x43e bytes)
25: 1352342418.893594 MAPREAD  0x93b057 thru 0x93fc06 (0x4bb0 bytes)
26: 1352342418.954895 MAPWRITE 0x3ed692 thru 0x3f6eaa (0x9819 bytes)
27: 1352342418.998428 MAPREAD  0x32aa46 thru 0x32ef20 (0x44db bytes)
28: 1352342419.095306 READ     0x917b3 thru 0x97e3f (0x668d bytes)	***RRRR***

Info required for matching: sanityn 16



 Comments   
Comment by Jinshan Xiong (Inactive) [ 08/Nov/12 ]

I will take a look at this.

Comment by Andreas Dilger [ 08/Nov/12 ]

Also failed in:
https://maloo.whamcloud.com/sub_tests/210f9e8e-17ac-11e2-ad4e-52540035b04c
https://maloo.whamcloud.com/sub_tests/c946a9c4-fe2c-11e1-a707-52540035b04c

Comment by Andreas Dilger [ 08/Nov/12 ]

Debugging patch for printing fd number for log dump for multi-fd fsx: http://review.whamcloud.com/4498

Comment by Bob Glossman (Inactive) [ 19/Nov/12 ]

Also failed in:
https://maloo.whamcloud.com/sub_tests/231187a6-3093-11e2-9075-52540035b04c

Comment by Jinshan Xiong (Inactive) [ 20/Nov/12 ]

From the log, it seems like the lock was canceled but there was NO write RPC issued. I'm reproducing this issue on toro and if I can't I will work out a debug patch for this problem

Comment by Jinshan Xiong (Inactive) [ 20/Nov/12 ]

It turns out this problem is due to cl_lock again - a [0,EOF) truncate lock was matched so LDLM_FL_DISCARD_DATA was (wrongly) transmitted to cancel a write mode lock which then caused data corruption. This patch is easier to be seen on wide stripe files. I will work out a patch soon.

Comment by Jinshan Xiong (Inactive) [ 21/Nov/12 ]

patch is at: http://review.whamcloud.com/4651

Comment by Peter Jones [ 08/Dec/12 ]

Landed for 2.4

Generated at Sat Feb 10 01:24:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.