[LU-7306] OST reported as good but not used after error Created: 14/Oct/15  Updated: 05/Aug/20  Resolved: 05/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Paul Kline (Inactive) Assignee: chroma triage
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Running on LHC with build https://jenkins.iml.intel.com:8080/job/chroma/7714/


Attachments: File chroma-diagnostics_20151014T064740_lotus-23.iml.intel.com.tar.lzma     File chroma-diagnostics_20151014T064803_lotus-24.iml.intel.com.tar.lzma     File chroma-diagnostics_20151014T064824_lotus-25.iml.intel.com.tar.lzma     File chroma-diagnostics_20151014T064836_lotus-26.iml.intel.com.tar.lzma     File chroma-diagnostics_20151014T064851_lotus-27.iml.intel.com.tar.lzma     File lotus-21vm9.iml.intel.com-messages    
Severity: 3
Project: Hydra
Rank (Obsolete): 9223372036854775807

 Description   

After an error occurred on OST0003 the OST is reported as good in IML but not used in write operations:

Log error:

Oct 14 05:20:13 lotus-27 pengine[15316]:   notice: process_pe_message: Calculated Transition 56: /var/lib/pacemaker/pengine/pe-input-56.bz2
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: te_rsc_command: Initiating action 18: monitor masterfs-OST0003_16129c_monitor_5000 on lotus-27.iml.intel.com (local)
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: te_rsc_command: Initiating action 21: monitor masterfs-OST0007_730503_monitor_5000 on lotus-27.iml.intel.com (local)
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: te_rsc_command: Initiating action 24: monitor masterfs-OST0001_1db880_monitor_5000 on lotus-27.iml.intel.com (local)
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: te_rsc_command: Initiating action 27: monitor masterfs-OST0000_8d9981_monitor_5000 on lotus-26.iml.intel.com
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: te_rsc_command: Initiating action 28: start masterfs-OST0005_49bf3b_start_0 on lotus-27.iml.intel.com (local)
Oct 14 05:20:13 lotus-27.iml.intel.com kernel: Lustre: masterfs-OST0001: precreate FID 0x0:275010873 is over 100000 larger than the LAST_ID 0x0:0, only precreating the last 10000 objects.
Oct 14 05:20:13 lotus-27.iml.intel.com kernel: LustreError: 39256:0:(ost_handler.c:170:ost_validate_obdo()) masterfs-OST0003: client 10.14.80.179@tcp sent bad object 0x0:0: rc = -71
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: process_lrm_event: Operation masterfs-OST0007_730503_monitor_5000: ok (node=lotus-27.iml.intel.com, call=42, rc=0, cib-update=113, confirmed=false)
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: process_lrm_event: Operation masterfs-OST0001_1db880_monitor_5000: ok (node=lotus-27.iml.intel.com, call=43, rc=0, cib-update=114, confirmed=false)
Oct 14 05:20:13 lotus-27 crmd[15317]:   notice: process_lrm_event: Operation masterfs-OST0003_16129c_monitor_5000: ok (node=lotus-27.iml.intel.com, call=41, rc=0, cib-update=115, confirmed=false)
Oct 14 05:20:14 lotus-27.iml.intel.com kernel: LDISKFS-fs (dm-8): mounted filesystem with ordered data mode. quota=on. Opts:

Output of LFS DF:

[root@lotus-21vm9 ~]# lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
masterfs-MDT0000_UUID   491695680       81228   458689168   0% /mnt/masterfs[MDT:0]
masterfs-OST0000_UUID   653933816     3217728   617782340   1% /mnt/masterfs[OST:0]
masterfs-OST0001_UUID   653933816     3217984   617781060   1% /mnt/masterfs[OST:1]
masterfs-OST0002_UUID   653933816     3216704   617783364   1% /mnt/masterfs[OST:2]
masterfs-OST0003_UUID   653933816       71860   620936672   0% /mnt/masterfs[OST:3]
masterfs-OST0004_UUID   653933816     3218752   617781316   1% /mnt/masterfs[OST:4]
masterfs-OST0005_UUID   653933816     3216704   617783364   1% /mnt/masterfs[OST:5]
masterfs-OST0006_UUID   653933816     3218752   617781316   1% /mnt/masterfs[OST:6]
masterfs-OST0007_UUID   653933816     3217728   617782340   1% /mnt/masterfs[OST:7]

filesystem summary:   5231470528    22596212  4945411772   0% /mnt/masterfs


 Comments   
Comment by Paul Kline (Inactive) [ 14/Oct/15 ]

Re-set striping and checked OST status, OST0003 is still not being used:

[root@lotus-21vm9 ~]# lfs check osts
masterfs-OST0002-osc-ffff88007dbf6c00: active
masterfs-OST0001-osc-ffff88007dbf6c00: active
masterfs-OST0006-osc-ffff88007dbf6c00: active
masterfs-OST0003-osc-ffff88007dbf6c00: active
masterfs-OST0004-osc-ffff88007dbf6c00: active
masterfs-OST0007-osc-ffff88007dbf6c00: active
masterfs-OST0000-osc-ffff88007dbf6c00: active
masterfs-OST0005-osc-ffff88007dbf6c00: active 
Comment by Brad Hoagland (Inactive) [ 15/Oct/15 ]

HYD-Triage: Needs converted to LDEV ticket.

Generated at Sat Feb 10 02:07:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.