[LU-1039] data corruption in check_set Created: 26/Jan/12  Updated: 22/Dec/12  Resolved: 02/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.1.4
Fix Version/s: Lustre 2.2.0

Type: Bug Priority: Blocker
Reporter: Alexey Lyashkov Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None
Environment:

any lustre


Attachments: Text File 0001-MRP-303-handle-bulk-IO-errors-correctly.patch    
Issue Links:
Duplicate
is duplicated by LU-2260 2.1.3 Client LBUG Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-1791 sanity.sh test_224b takes too long to... Technical task Resolved WC Triage  
Severity: 3
Rank (Obsolete): 4702

 Description   

ost_brw_read set a number bytes as rq_status, that confuses check_set() function.
that easy see if checksumming enabled.
i found it's with testing solution to avoid panic in check_set() with request reorder and failed bulk read.
attached patch solves both issues, but broke a request flags policy, and don't resend a bulk request so that can be short time solution, until check_set will cleared.



 Comments   
Comment by Alexey Lyashkov [ 26/Jan/12 ]

Jan 26 05:58:31 rhel6-64 kernel: Lustre: DEBUG MARKER: cancel_lru_locks osc stop
LustreError: 4233:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=515 ***
LustreError: 4233:0:(events.c:201:client_bulk_callback()) event type 5, status -5, desc ffff880050adab90
Jan 26 05:58:31 LustreError: 133-1: lustre-OST0001-osc-ffff880053338348: BAD READ CHECKSUM: from 0@lo via 0@<0:0> inode [0x0:0x0:0x0] object 18499/0 extent [0-409
5]
rhel6-64 kernel: LustreError: 42LustreError: 4233:0:(osc_request.c:1660:osc_brw_fini_request()) client ffffffff, server 6706be76, cksum_type 4
33:0:(libcfs_fail.h:81:cfs_fail_LustreError: 4233:0:(osc_request.c:1749:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8800855368a0 x139203734877
0001/t0(0) o3->lustre-OST0001-osc-ffff880053338348@0@lo:6/4 lens 456/400 e 0 to 0 dl 1327550318 ref 2 fl Interpret:R/0/0 rc 4096/4096
check_set()) *** cfs_fail_loc=515 ***
Jan 26 05:58:31 rhel6-64 kernel: LustreError: 4233:0:(events.c:201:client_bulk_callback()) event type 5, status -5, desc ffff880050adab90
Jan 26 05:58:31 rhel6-64 kernel: LustreError: 133-1: lustre-OST0001-osc-ffff880053338348: BAD READ CHECKSUM: from 0@lo via 0@<0:0> inode [0x0:0x0:0x0] object 1849
9/0 extent [0-4095]
Jan 26 05:58:31 rhel6-64 kernel: LustreError: 4233:0:(osc_request.c:1660:osc_brw_fini_request()) client ffffffff, server 6706be76, cksum_type 4
Jan 26 05:58:31 rhel6-64 kernel: LustreError: 4233:0:(osc_request.c:1749:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff8800855368a0 x13920373487
70001/t0(0) o3->lustre-OST0001-osc-ffff880053338348@0@lo:6/4 lens 456/400 e 0 to 0 dl 1327550318 ref 2 fl Interpret:R/0/0 rc 4096/4096
LustreError: 133-1: lustre-OST0001-osc-ffff880053338348: BAD READ CHECKSUM: from 0@lo via 0@<0:0> inode [0x0:0x0:0x0] object 18499/0 extent [0-4095]
LustreError: 4233:0:(osc_request.c:1660:osc_brw_fini_request()) client ffffffff, server 6706be76, cksum_type 4
LustreError: 4233:0:(osc_request.c:1749:osc_brw_redo_request()) @@@ redo for recoverable error req@ffff880083a3b5b8 x1392037348770002/t0(0) o3->lustre-OST0001-os
c-ffff880053338348@0@lo:6/4 lens 456/400 e 0 to 0 dl 1327550324 ref 2 fl Interpret:R/0/0 rc 4096/4096
Jan 26 05:58:37 rhel6-64 kernel: LustreError: 133-1: lustre-OST0001-osc-ffff880053338348: BAD READ CHECKSUM: from 0@lo via 0@<0:0> inode [0x0:0x0:0x0] object 1849
9/0 extent [0-4095]

Comment by Peter Jones [ 26/Jan/12 ]

Shadow

Thanks for the report. Could you please upload your patch into gerrit? That will help us with reviewing and landing this fix.

Thanks

Peter

Comment by Alexey Lyashkov [ 26/Jan/12 ]

remote: New Changes:
remote: http://review.whamcloud.com/2023

Comment by Peter Jones [ 26/Jan/12 ]

Thanks Shadow!

Comment by Peter Jones [ 26/Jan/12 ]

Ah, but could you please fix the formatting error in the summary line - thanks!

Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
Comment by Peter Jones [ 02/Mar/12 ]

Landed for 2.2

Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/client.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ost/ost_handler.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,client,el5,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,server,el5,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/tests/sanity.sh
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/events.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,client,el6,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
  • lustre/ptlrpc/client.c
Comment by Build Master (Inactive) [ 02/Mar/12 ]

Integrated in lustre-master » i686,server,el6,ofa #498
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/ptlrpc/client.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/ost/ost_handler.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/events.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/ptlrpc/events.c
  • lustre/ost/ost_handler.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/ptlrpc/client.c
  • lustre/include/obd_support.h
  • lustre/ost/ost_handler.c
  • lustre/tests/sanity.sh
  • lustre/ptlrpc/events.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
LU-1039 ptlrpc: handle bulk IO errors correctly. (Revision c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf)

Result = SUCCESS
Oleg Drokin : c9590221dc43dd5e7a7ede389f0a7d9cf566e5bf
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/ptlrpc/client.c
  • lustre/ptlrpc/events.c
  • lustre/ost/ost_handler.c
Comment by Emoly Liu [ 09/Nov/12 ]

b2_1 port at http://review.whamcloud.com/4499

Comment by Emoly Liu [ 21/Nov/12 ]

b2_1 port has been successfully cherry-picked as 6ba8b7b5d4fbf8d123adbb6b870abf9995eb39cb.

Generated at Sat Feb 10 01:12:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.