Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0, Lustre 2.5.0
-
3
-
Orion
-
2852
Description
When running IOR's writing to our Orion test system, I am seeing many Lustre error messages of the form:
Jan 4 16:24:52 zwicky142 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:52 zwicky146 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:53 zwicky150 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:53 zwicky125 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:53 zwicky123 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:53 zwicky151 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:53 zwicky115 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2 Jan 4 16:24:54 zwicky147 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
These are all messages from the clients, with 10.1.1.211 being the address of our MDS.
The clients are running:
$ rpm -qa | grep lustre-modules lustre-modules-2.1.0-15chaos_2.6.32_220.1chaos.ch5.x86_64.x86_64
And the servers:
# zwicky1 /root > rpm -qa | grep lustre-orion-modules lustre-orion-modules-2.2.49.50-2chaos_2.6.32_220.2.1.1chaos.ch5.x86_64.x86_64
Attachments
Issue Links
- is related to
-
LU-2611 Unified Target
-
- Resolved
-
Chris, what are the current spamming messages. The llog code has undergone some significant restructuring already, and I can't find either llog_origin_handle_create() or "operation failed" in the current code.
I'm happy to land a patch to quiet the spammy messages, but the unified target patch is 1500 lines, and at this point we are down to the last few blockers, and trying to limit the size/complexity of landed patches.