[LU-15933] client hang with NULL pointer dereference, at iov_iter_advance Created: 11/Jun/22  Updated: 04/Oct/22  Resolved: 06/Jul/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.1

Type: Bug Priority: Major
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

client kernel version >= 5.13


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lustre clients hang with following call trace.

      BUG: kernel NULL pointer dereference, address: 0000000000000008
      RIP: 0010:iov_iter_advance+0x1ef/0x260
      Call Trace:
       vvp_io_rw_lock+0x240/0x7e0 [lustre]
       vvp_io_read_lock+0x3f/0xd0 [lustre]
       cl_io_lock+0x62/0x2d0 [obdclass]
       cl_io_loop+0x8b/0x1f0 [obdclass]
       ll_file_io_generic+0x436/0xd60 [lustre]
       ll_file_read_iter+0x42c/0x5d0 [lustre]
       generic_file_splice_read+0xf7/0x1a0
       do_splice_to+0x81/0xb0
       splice_direct_to_actor+0xbc/0x230
       ? opipe_prep.part.0+0xb0/0xb0
       do_splice_direct+0x89/0xd0
       do_sendfile+0x303/0x450



 Comments   
Comment by Zhenyu Xu [ 11/Jun/22 ]

the iov_iter type compile testing is

208 LB_CHECK_COMPILE([if iov_iter has member type],                                 
 209 iov_iter_has_type_member, [                                                     
 210         #include <linux/uio.h>                                                  
 211 ],[                                                                             
 212         struct iov_iter iter = { .type = ITER_KVEC };                           
 213         (void)iter;                                                             
 214 ],[                                                                             
 215         AC_DEFINE(HAVE_IOV_ITER_HAS_TYPE_MEMBER, 1,                             
 216                 [if iov_iter has member type])                                  
 217 ])                                           

while kernel commit 8cd54c1c84803 (v5.13) has changed the member ->type to ->iter_type, and I think that's caused the mess.

 struct iov_iter {
-       /*
-        * Bit 0 is the read/write bit, set if we're writing.
-        * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and
-        * the caller isn't expecting to drop a page reference when done.
-        */
-       unsigned int type;
+       u8 iter_type;
+       bool data_source;
        size_t iov_offset;
        size_t count;
        union {
Comment by Gerrit Updater [ 11/Jun/22 ]

"Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/47601
Subject: LU-15933 libcfs: fix configure check for iov_iter member
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c0cf40a3d0846ca5c0a72dd2022e5b9c7f46c52a

Comment by Gerrit Updater [ 24/Jun/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47756
Subject: LU-15933 libcfs: fix configure check for iov_iter member
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: d48d845e3c906a2090c6409926360362bd922c55

Comment by Gerrit Updater [ 27/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47601/
Subject: LU-15933 libcfs: fix configure check for iov_iter member
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ae53ef4a6ab0491c01b723322f45c931e22f6140

Comment by Gerrit Updater [ 06/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47756/
Subject: LU-15933 libcfs: fix configure check for iov_iter member
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 8a2a5f3ffea68b0866e983302ccab152b12207c9

Generated at Sat Feb 10 03:22:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.