Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Hit a crash with a new racer rename test in v4 of patch https://review.whamcloud.com/47643 "LU-15913 tests: add rename stress test via racer" that was still using complex file layouts (I've removed most complex layouts in v5 to focus on actual rename testing for LU-17426). It looks like an ldiskfs credits issue with the large file layout (note xattr_set: 2/103/0), as I don't see any crashes with ZFS.
      https://testing.whamcloud.com/gerrit-janitor/38859/results.html

      Lustre: lod_parse_striping()) lustre-MDT0000-mdtlov: EXTENSION flags=40 set on component[2]=1 of non-SEL file [0x200000402:0x2402:0x0] with magic=0xbd60bd0
      [few seconds pass, but included to show SEL is definitely in use]
      Lustre: osd_trans_exec_check()) lustre-MDT0000: opcode 7: used 14, used now 2, reserved 12
      Lustre: osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
      Lustre: osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 2/103/0
      Lustre: osd_trans_dump_creds())   write: 3/12/14, punch: 0/0/0, quota 1/3/2
      Lustre: osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
      Lustre: osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
      LustreError: osd_trans_exec_check()) LBUG
      Pid: 14124, comm: mdt00_019 3.10.0-7.9-debug #1 SMP Sat Mar 26 23:28:42 EDT 2022
      Call Trace:
       libcfs_call_trace+0x90/0xf0 [libcfs]
       lbug_with_loc+0x4c/0xa0 [libcfs]
       osd_write+0x929/0xcc0 [osd_ldiskfs]
       dt_record_write+0x33/0x130 [obdclass]
       llog_osd_write_rec+0xfdb/0x1c80 [obdclass]
       llog_write_rec+0x290/0x590 [obdclass]
       llog_cat_add_rec+0x1d9/0xa50 [obdclass]
       llog_add+0x17f/0x1f0 [obdclass]
       osp_sync_add+0x1f9/0x760 [osp]
       osp_attr_set+0x3cd/0x680 [osp]
       lod_sub_attr_set+0x1d7/0x500 [lod]
       lod_obj_stripe_attr_set_cb+0x40/0x100 [lod]
       lod_obj_for_each_stripe+0x12d/0x310 [lod]
       lod_attr_set+0x51a/0xb60 [lod]
       lod_layout_change+0x367/0x3f0 [lod]
       dt_layout_change+0x20/0xc0 [mdd]
       mdd_layout_change+0x1712/0x1db0 [mdd]
       mdt_layout_change+0x2bf/0x450 [mdt]
       mdt_intent_layout+0x910/0xeb0 [mdt]
       mdt_intent_opc+0x1c8/0xc50 [mdt]
       mdt_intent_policy+0xfa/0x460 [mdt]
       ldlm_lock_enqueue+0x3b1/0xbb0 [ptlrpc]
       ldlm_handle_enqueue+0x359/0x17c0 [ptlrpc]
       tgt_enqueue+0x68/0x240 [ptlrpc]
       tgt_request_handle+0x74e/0x19d0 [ptlrpc]
       ptlrpc_server_handle_request+0x26e/0xcf0 [ptlrpc]
       ptlrpc_main+0xc37/0x16d0 [ptlrpc]
      

      Attachments

        Issue Links

          Activity

            [LU-17448] LBUG in racer with layout change
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54302/
            Subject: LU-17448 lod: don't skip uninited components
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 35b1076aef8fbb2840de2b831765a20ec937d034

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54302/ Subject: LU-17448 lod: don't skip uninited components Project: fs/lustre-release Branch: master Current Patch Set: Commit: 35b1076aef8fbb2840de2b831765a20ec937d034

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54302
            Subject: LU-17448 lod: don't skip uninited components
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 23a4bee801d878d3eb0769cc6c8dede716a428e8

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54302 Subject: LU-17448 lod: don't skip uninited components Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 23a4bee801d878d3eb0769cc6c8dede716a428e8

            Yes, I've seen this several times in my racer rename test:
            https://review.whamcloud.com/47643
            It looks like a client-side deadlock, because none of the server threads have a stack trace of interest, AFAICS.

            adilger Andreas Dilger added a comment - Yes, I've seen this several times in my racer rename test: https://review.whamcloud.com/47643 It looks like a client-side deadlock, because none of the server threads have a stack trace of interest, AFAICS.

            also, sometimes I see very strange "deadlock":

            schedule,rwsem_down_write_slowpath,do_rmdir
            	PIDs(2): "rm":41084 "rm":41766 
            
            schedule,schedule_preempt_disabled,__mutex_lock,lock_rename,do_renameat2,__x64_sys_rename
            	PIDs(7): "mrename":41114 "mrename":41184 "mrename":41248 "mrename":41291 "mrename":41406 "mrename":41519 "mrename":41556 
            
            schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,do_readlinkat,__x64_sys_readlink
            	PIDs(1): "lfs":41134 
            
            schedule,rwsem_down_write_slowpath,vfs_rename,do_renameat2,__x64_sys_rename
            	PIDs(1): "mrename":41171 
            
            schedule,rwsem_down_write_slowpath,lock_rename,do_renameat2,__x64_sys_rename
            	PIDs(4): "mrename":41177 "mrename":41192 "mrename":41198 "mrename":41202 
            
            schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,path_openat,do_filp_open,do_sys_openat2,do_sys_open
            	PIDs(2): "lfs":41186 "lfs":41213 
            
            schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newlstat
            	PIDs(2): "lfs":41895 "lfs":47609 
            
            schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newfstatat
            	PIDs(1): "rm":42331 
            
            schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_statx
            	PIDs(1): "lfs":43870 
            
            schedule,rwsem_down_write_slowpath,filename_create,do_mkdirat
            	PIDs(3): "mkdir":44262 "mkdir":44273 "mkdir":47557 
            
            bzzz Alex Zhuravlev added a comment - also, sometimes I see very strange "deadlock": schedule,rwsem_down_write_slowpath,do_rmdir PIDs(2): "rm" :41084 "rm" :41766 schedule,schedule_preempt_disabled,__mutex_lock,lock_rename,do_renameat2,__x64_sys_rename PIDs(7): "mrename" :41114 "mrename" :41184 "mrename" :41248 "mrename" :41291 "mrename" :41406 "mrename" :41519 "mrename" :41556 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,do_readlinkat,__x64_sys_readlink PIDs(1): "lfs" :41134 schedule,rwsem_down_write_slowpath,vfs_rename,do_renameat2,__x64_sys_rename PIDs(1): "mrename" :41171 schedule,rwsem_down_write_slowpath,lock_rename,do_renameat2,__x64_sys_rename PIDs(4): "mrename" :41177 "mrename" :41192 "mrename" :41198 "mrename" :41202 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,path_openat,do_filp_open,do_sys_openat2,do_sys_open PIDs(2): "lfs" :41186 "lfs" :41213 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newlstat PIDs(2): "lfs" :41895 "lfs" :47609 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_newfstatat PIDs(1): "rm" :42331 schedule,schedule_preempt_disabled,rwsem_down_read_slowpath,down_read,lookup_slow,walk_component,path_lookupat,filename_lookup,vfs_statx,__se_sys_statx PIDs(1): "lfs" :43870 schedule,rwsem_down_write_slowpath,filename_create,do_mkdirat PIDs(3): "mkdir" :44262 "mkdir" :44273 "mkdir" :47557

            Hit this again on racer test_1, so it isn't just related to the new rename test_2 I added. It also has to do with a layout change:
            https://testing.whamcloud.com/gerrit-janitor/38926/testresults/racer-special3-ldiskfs-centos7_x86_64-centos7_x86_64/

            adilger Andreas Dilger added a comment - Hit this again on racer test_1, so it isn't just related to the new rename test_2 I added. It also has to do with a layout change: https://testing.whamcloud.com/gerrit-janitor/38926/testresults/racer-special3-ldiskfs-centos7_x86_64-centos7_x86_64/

            People

              bzzz Alex Zhuravlev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: