[LU-7308] LustreError: 16956:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel Created: 16/Oct/15  Updated: 12/Jul/21  Resolved: 12/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Amol Thute (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

CentOS release 6.4, lustre client 2.5.1, lustre server 2.6.32-358.23.2.el6_lustre.es52.x86_64


Attachments: Text File lctl.txt     Text File mds1-dmesg.txt     Text File mds1-messages.txt     Text File mds2-dmesg.txt     Text File node34-logs.txt     Text File node41-dmesg.txt     Text File node41-logs.txt    
Issue Links:
Related
is related to LU-6664 (ost_handler.c:1765:ost_blocking_ast(... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

We are facing some issue with Lustre clients (Compute nodes).

PBS jobs are getting killed due to Lustre error on scratch file system. Scratch area has been defined in PBS and due to Lustre error PBS service is shutting down and jobs are killed.

See below the errors of Lustre in messages logs of specific compute nodes, This one is from compute node 34 but the same is happening in others also.

------------------------------------------------------------------------------------------
OSS/MDS Server Error-

Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180] Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2ib, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: Skipped 1 previous similar message
-----------------------------------------------------------------------------

------------------------------------------------------------------------------
Compute Nodes Error:-

Oct 11 21:37:59 cn034 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-431.el6.x86_64/kernel/drivers/cryp
to/padlock-sha.ko): No such device
Oct 11 21:38:03 cn034 kernel: Lustre: Lustre: Build Version: 2.5.1.ddn1-g45c890c-PRISTINE-2.6.32-431.el6.x86_64
Oct 11 21:38:03 cn034 kernel: LNet: Added LNI 10.20.30.34@o2ib [8/256/0/180]
Oct 11 21:38:03 cn034 kernel: Lustre: Layout lock feature supported.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0004-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
b, operation ost_connect failed with -19.
Oct 11 21:38:03 cn034 kernel: LustreError: 11-0: scratch-OST0003-osc-ffff880c05bbbc00: Communicating with 10.20.30.103@o2i
b, operation ost_connect failed with -19.
-------------------------------------------------------------------------------------
Attached logs files from MDS1 and MDS2 and compute nodes. Kindly let me know if you need more details.

Looking forward for your support on the same.

Thank you


Generated at Sat Feb 10 02:07:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.