[LU-2243] sanity test 223 lockup Created: 28/Oct/12  Updated: 05/Nov/12  Resolved: 05/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File lu2243.txt.gz     File lu2243.txt.gz    
Severity: 3
Rank (Obsolete): 5306

 Description   

Sanity test 223 just hung for me overnight:

== sanity test 223: osc reenqueue if without AGL lock granted ========================= 05:05:29 (1351415129)
total: 10 creates in 0.05 seconds: 207.45 creates/second
fail_loc=0x31b

in dmesg we see:

[23752.752386] Lustre: DEBUG MARKER: == sanity test 223: osc reenqueue if without AGL lock granted ========================= 05:05:29 (1351415129)
[23752.998114] Lustre: DEBUG MARKER: cancel_lru_locks mdc start
[23753.324413] Lustre: DEBUG MARKER: cancel_lru_locks mdc stop
[23753.355089] Lustre: DEBUG MARKER: cancel_lru_locks osc start
[23753.778938] Lustre: DEBUG MARKER: cancel_lru_locks osc stop
[23753.949412] Lustre: *** cfs_fail_loc=31b, val=0***
[23753.954921] Lustre: *** cfs_fail_loc=31b, val=0***
[23754.458383] Lustre: *** cfs_fail_loc=31b, val=0***
[23754.459018] Lustre: Skipped 218 previous similar messages
[23755.466633] Lustre: *** cfs_fail_loc=31b, val=0***
[23755.467565] Lustre: Skipped 453 previous similar messages
[23757.474975] Lustre: *** cfs_fail_loc=31b, val=0***
[23757.475763] Lustre: Skipped 870 previous similar messages
[23761.483329] Lustre: *** cfs_fail_loc=31b, val=0***
[23761.484489] Lustre: Skipped 1846 previous similar messages
[23769.488618] Lustre: *** cfs_fail_loc=31b, val=0***
[23769.489239] Lustre: Skipped 3626 previous similar messages
[23785.498116] Lustre: *** cfs_fail_loc=31b, val=0***
[23785.498750] Lustre: Skipped 7390 previous similar messages
[23817.504664] Lustre: *** cfs_fail_loc=31b, val=0***
[23817.505302] Lustre: Skipped 14498 previous similar messages
[23881.515154] Lustre: *** cfs_fail_loc=31b, val=0***
[23881.515911] Lustre: Skipped 28874 previous similar messages
[24009.521279] Lustre: *** cfs_fail_loc=31b, val=0***
[24009.522050] Lustre: Skipped 57944 previous similar messages
[24265.530087] Lustre: *** cfs_fail_loc=31b, val=0***
[24265.530861] Lustre: Skipped 116106 previous similar messages
[24777.537716] Lustre: *** cfs_fail_loc=31b, val=0***
[24777.538350] Lustre: Skipped 230115 previous similar messages
[25377.546223] Lustre: *** cfs_fail_loc=31b, val=0***
[25377.546842] Lustre: Skipped 269071 previous similar messages
[25977.554606] Lustre: *** cfs_fail_loc=31b, val=0***
[25977.555370] Lustre: Skipped 268666 previous similar messages
[26577.561567] Lustre: *** cfs_fail_loc=31b, val=0***
[26577.562207] Lustre: Skipped 268156 previous similar messages
[27177.569705] Lustre: *** cfs_fail_loc=31b, val=0***
[27177.570331] Lustre: Skipped 267088 previous similar messages
[27777.578689] Lustre: *** cfs_fail_loc=31b, val=0***
[27777.579310] Lustre: Skipped 268369 previous similar messages
[28377.585314] Lustre: *** cfs_fail_loc=31b, val=0***
[28377.585949] Lustre: Skipped 266675 previous similar messages
[28977.597867] Lustre: *** cfs_fail_loc=31b, val=0***
[28977.598488] Lustre: Skipped 268815 previous similar messages
[29577.600562] Lustre: *** cfs_fail_loc=31b, val=0***
[29577.601574] Lustre: Skipped 268170 previous similar messages
[30177.610466] Lustre: *** cfs_fail_loc=31b, val=0***
[30177.611086] Lustre: Skipped 269189 previous similar messages
[30777.619453] Lustre: *** cfs_fail_loc=31b, val=0***
[30777.620405] Lustre: Skipped 268770 previous similar messages
[31377.631747] Lustre: *** cfs_fail_loc=31b, val=0***
[31377.632384] Lustre: Skipped 268322 previous similar messages
[31977.632423] Lustre: *** cfs_fail_loc=31b, val=0***
[31977.633127] Lustre: Skipped 269037 previous similar messages
[32577.641103] Lustre: *** cfs_fail_loc=31b, val=0***
[32577.641830] Lustre: Skipped 270733 previous similar messages
[33177.649053] Lustre: *** cfs_fail_loc=31b, val=0***
[33177.649688] Lustre: Skipped 268753 previous similar messages
[33777.657873] Lustre: *** cfs_fail_loc=31b, val=0***
[33777.658509] Lustre: Skipped 268431 previous similar messages
[34377.665257] Lustre: *** cfs_fail_loc=31b, val=0***
[34377.665888] Lustre: Skipped 269508 previous similar messages
[34977.672950] Lustre: *** cfs_fail_loc=31b, val=0***
[34977.673591] Lustre: Skipped 268557 previous similar messages
[35577.682906] Lustre: *** cfs_fail_loc=31b, val=0***
[35577.683531] Lustre: Skipped 268901 previous similar messages
[36177.688874] Lustre: *** cfs_fail_loc=31b, val=0***
[36177.689564] Lustre: Skipped 270506 previous similar messages
[36777.698725] Lustre: *** cfs_fail_loc=31b, val=0***
[36777.699347] Lustre: Skipped 271424 previous similar messages
[37377.705699] Lustre: *** cfs_fail_loc=31b, val=0***
[37377.706331] Lustre: Skipped 269142 previous similar messages
[37977.713160] Lustre: *** cfs_fail_loc=31b, val=0***
[37977.713781] Lustre: Skipped 269425 previous similar messages
[38577.722217] Lustre: *** cfs_fail_loc=31b, val=0***
[38577.722839] Lustre: Skipped 267177 previous similar messages
[39177.729286] Lustre: *** cfs_fail_loc=31b, val=0***
[39177.729999] Lustre: Skipped 269502 previous similar messages
[39777.742248] Lustre: *** cfs_fail_loc=31b, val=0***
[39777.742901] Lustre: Skipped 268396 previous similar messages
[40377.746108] Lustre: *** cfs_fail_loc=31b, val=0***
[40377.746728] Lustre: Skipped 269319 previous similar messages
[40977.752869] Lustre: *** cfs_fail_loc=31b, val=0***
[40977.753696] Lustre: Skipped 268510 previous similar messages
[41577.761215] Lustre: *** cfs_fail_loc=31b, val=0***
[41577.761897] Lustre: Skipped 268610 previous similar messages
[42177.770401] Lustre: *** cfs_fail_loc=31b, val=0***
[42177.771032] Lustre: Skipped 268705 previous similar messages
[42777.779077] Lustre: *** cfs_fail_loc=31b, val=0***
[42777.779727] Lustre: Skipped 269748 previous similar messages
[43377.788792] Lustre: *** cfs_fail_loc=31b, val=0***
[43377.789409] Lustre: Skipped 268091 previous similar messages
[43977.795030] Lustre: *** cfs_fail_loc=31b, val=0***
[43977.795857] Lustre: Skipped 269256 previous similar messages
[44577.805860] Lustre: *** cfs_fail_loc=31b, val=0***
[44577.806478] Lustre: Skipped 268173 previous similar messages
[45177.817530] Lustre: *** cfs_fail_loc=31b, val=0***
[45177.818148] Lustre: Skipped 267868 previous similar messages
[45777.825993] Lustre: *** cfs_fail_loc=31b, val=0***
[45777.826617] Lustre: Skipped 268294 previous similar messages
[46377.838321] Lustre: *** cfs_fail_loc=31b, val=0***
[46377.838961] Lustre: Skipped 270992 previous similar messages
[46977.842412] Lustre: *** cfs_fail_loc=31b, val=0***
[46977.843150] Lustre: Skipped 269106 previous similar messages
[47577.849210] Lustre: *** cfs_fail_loc=31b, val=0***
[47577.849893] Lustre: Skipped 270012 previous similar messages
[48177.859006] Lustre: *** cfs_fail_loc=31b, val=0***
[48177.860243] Lustre: Skipped 267790 previous similar messages


 Comments   
Comment by Oleg Drokin [ 28/Oct/12 ]

Here's the lctl dk from the node when I discovered it in it's sorry state.

Comment by Oleg Drokin [ 03/Nov/12 ]

Again overnight two more noes entered this state.
Attached is a log at -1 level. We clearly see there that clio spins somewhere inside retrying those failed AGL requests even though it supposedly should not.

I don't think it's a matter of adding one-shot option for the failloc too.

Comment by Oleg Drokin [ 04/Nov/12 ]

FanYong advises this bug hits due to patch in http://review.whamcloud.com/#change,4316

Comment by nasf (Inactive) [ 04/Nov/12 ]

This failure related with the patch http://review.whamcloud.com/#change,4316

Comment by Oleg Drokin [ 05/Nov/12 ]

Ok, this is an invalid report, since an unlanded patch caused it.

Generated at Sat Feb 10 01:23:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.