[LU-11155] hidden failures of sanity 804 test Created: 18/Jul/18  Updated: 11/Oct/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: Hongchao Zhang
Resolution: Unresolved Votes: 0
Labels: dne

Issue Links:
Related
is related to LU-11142 test-framework.sh run_e2fsck masks re... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 804 never fails in recent test runs:

https://testing.whamcloud.com/sub_tests/query?utf8=%E2%9C%93&warn%5Bnotice%5D=&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=fb652518-c4a9-11e7-8027-52540065bddc&query_bugs=&builds=&hosts=&commit_id=&horizon=&window%5Bstart_date%5D=2018-01-01&window%5Bend_date%5D=2018-07-17&os_type_id=&distribution_type_id=&architecture_type_id=&file_system_type_id=&lustre_branch_id=24a6947e-04a9-11e1-bb5f-52540025f9af&network_type_id=&commit=Update+results&num_results=250

 

the results are either green (PASS) or yellow (SKIP).

but, deeper look at the test output finds not caught e2fsck erros.

For example, first green entry at the top:

 

https://testing.whamcloud.com/sub_tests/6e7f8e5e-eeb1-11e7-8c23-52540065bddc

and its full test log:

https://testing.whamcloud.com/test_logs/6f699440-eeb1-11e7-8c23-52540065bddc/show_text

contains e2fsck complains about fs corruptions:

 

trevis-4vm4: e2fsck 1.42.13.wc6 (05-Feb-2017)
trevis-4vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (14385152, 401) != expected (14381056, 400)
trevis-4vm4: [QUOTA WARNING] Usage inconsistent for ID 500:actual (278528, 1) != expected (282624, 2)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 288k/136k (123k/166k), time:  0.12/ 0.04/ 0.02
Pass 1: I/O read: 51MB, write: 0MB, rate: 410.09MB/s
Pass 2: Checking directory structure
Entry '..' in .../??? (8487) has deleted/unused inode 8486.  Clear? no

Pass 2: Memory used: 288k/272k (77k/212k), time:  0.01/ 0.00/ 0.00
Pass 2: I/O read: 3MB, write: 0MB, rate: 523.83MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 288k/272k (78k/211k), time:  0.14/ 0.04/ 0.03
Unconnected directory inode 8487 (.../???)
Connect to /lost+found? no

'..' in ... (8487) is ... (8486), should be <The NULL inode> (0).
Fix? no

Pass 3: Memory used: 288k/272k (75k/214k), time:  0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1432.66MB/s
Pass 4: Checking reference counts
Inode 213 ref count is 23, should be 22.  Fix? no

Inode 8487 ref count is 4, should be 3.  Fix? no

Unattached inode 8505
Connect to /lost+found? no

Pass 4: Memory used: 288k/0k (70k/219k), time:  0.02/ 0.02/ 0.00
Pass 4: I/O read: 1MB, write: 0MB, rate: 51.02MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 288k/0k (68k/221k), time:  0.00/ 0.00/ 0.00
Pass 5: I/O read: 1MB, write: 0MB, rate: 290.78MB/s
Update quota info for quota type 1? no


lustre-MDT0000: ********** WARNING: Filesystem still has errors **********


         411 inodes used (0.05%, out of 838864)
          29 non-contiguous files (7.1%)
           3 non-contiguous directories (0.7%)
             # of inodes with ind/dind/tind blocks: 22/0/0
      235546 blocks used (44.93%, out of 524288)
           0 bad blocks
           1 large file

         187 regular files
         209 directories
           0 character device files
           0 block device files
           0 fifos
  4294967294 links
           6 symbolic links (6 fast symbolic links)
           0 sockets
------------
         400 files
Memory used: 288k/0k (68k/221k), time:  0.17/ 0.07/ 0.03
I/O read: 54MB, write: 0MB, rate: 325.54MB/s

 

 another 804 test run:

https://testing.whamcloud.com/sub_tests/f2aca174-efe9-11e7-8c43-52540065bddc
and test log https://testing.whamcloud.com/test_logs/f36d7cf0-efe9-11e7-8c43-52540065bddc/show_text :

Pass 3: Checking directory connectivity
Peak memory: Memory used: 288k/272k (79k/210k), time:  0.24/ 0.04/ 0.02
Unconnected directory inode 8512 (.../???)
Connect to /lost+found? no

'..' in ... (8512) is ... (8511), should be <The NULL inode> (0).
Fix? no

Pass 3: Memory used: 288k/272k (76k/213k), time:  0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 6622.52MB/s
Pass 4: Checking reference counts
Inode 213 ref count is 23, should be 22.  Fix? no

Inode 8512 ref count is 4, should be 3.  Fix? no

Unattached inode 8530
Connect to /lost+found? no

I am not sure the fs gets damaged exactly in test_804, but I couldn't reproduce the failures locally.

I already filed LU-11142 for test-framework.sh::run_e2fsck changes.



 Comments   
Comment by Peter Jones [ 08/Aug/18 ]

Hongchao

Can you please investigate?

Thanks

Peter

Comment by Hongchao Zhang [ 10/Aug/18 ]

this failure could be caused by the previous tests in sanity, especially those tests which do failover tests.

Generated at Sat Feb 10 02:41:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.