Details
-
Bug
-
Resolution: Cannot Reproduce
-
Blocker
-
None
-
None
-
lola
build: build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild '20151104.1'
-
3
-
9223372036854775807
Description
Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration.
OSS nodes are neither restarted nor failed over.
Symptom:
- OSS node (lola-3) shows high load to large number of blocked processes. No iowait or high disk load + long queue and wait times can seen
- List of blocked process can be seen from 'w' and 't' sysrq-trigger iniiated at Nov 5 08:19:12 PST 2015, and 08:23:3 PST 2015 respectively (see attached messages file)
- Problems most likely started at Nov 4, 18:50
see messages file and debug log file (lustre-log.1446691819.85273.bz2) attached - 220 additional debug log files have been written which could be provided on demand
Issue was not reproduced