Details
-
Question/Request
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.5.1
-
None
-
CentOS release 6.4, lustre client 2.5.1, lustre server 2.6.32-358.23.2.el6_lustre.es52.x86_64
-
9223372036854775807
Description
We are facing some issue with Lustre clients (Compute nodes).
PBS jobs are getting killed due to Lustre error on scratch file system. Scratch area has been defined in PBS and due to Lustre error PBS service is shutting down and jobs are killed.
See below the errors of Lustre in messages logs of specific compute nodes, This one is from compute node 34 but the same is happening in others also.
------------------------------------------------------------------------------------------
OSS/MDS Server Error-
Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180] Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: Skipped 1 previous similar message
-----------------------------------------------------------------------------
------------------------------------------------------------------------------
Compute Nodes Error:-
Oct 11 21:37:59 cn034 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-431.el6.x86_64/kernel/drivers/cryp
to/padlock-sha.ko): No such device
Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180]
Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
b, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
b, operation ost_connect failed with -19.
-------------------------------------------------------------------------------------
Attached logs files from MDS1 and MDS2 and compute nodes. Kindly let me know if you need more details.
Looking forward for your support on the same.
Thank you
Attachments
Issue Links
- is related to
-
LU-6664 (ost_handler.c:1765:ost_blocking_ast()) Error -2 syncing data on lock cancel
- Resolved