Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14781

sanity-flr test 70: mirror create and split race crash

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangshilong <wshilong@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/515ea0f6-6502-4f1c-96ad-2df998d95993

      [ 6188.499816] Lustre: DEBUG MARKER: == sanity-flr test 70: mirror create and split race ================================================== 03:31:01 (1624246261)
      [ 6231.305351] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 6231.306707] IP: [<ffffffffc130bb3d>] osp_object_free+0x4d/0x490 [osp]
      [ 6231.307887] PGD 0
      [ 6231.308240] Oops: 0000 1 SMP
      [ 6231.308838] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel dm_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev virtio_balloon pcspkr i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix libata 8139too crct10dif_pclmul crct10dif_common crc32c_intel serio_raw
      [ 6231.321323] virtio_pci 8139cp virtio_ring mii virtio floppy [last unloaded: dm_flakey]
      [ 6231.322546] CPU: 1 PID: 23593 Comm: mdt00_003 Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.21.1.el7_lustre.ddn15.x86_64 #1
      [ 6231.324404] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 6231.325248] task: ffff9e48d5310000 ti: ffff9e48d5098000 task.ti: ffff9e48d5098000
      [ 6231.326340] RIP: 0010:[<ffffffffc130bb3d>] [<ffffffffc130bb3d>] osp_object_free+0x4d/0x490 [osp]
      [ 6231.327661] RSP: 0018:ffff9e48d509b7c0 EFLAGS: 00010217
      [ 6231.328447] RAX: ffff9e48e0eadae8 RBX: 0000000000000000 RCX: dead000000000200
      [ 6231.329493] RDX: ffff9e48d509b7f8 RSI: ffff9e48e0eadb08 RDI: ffff9e48e0eadab8
      [ 6231.330537] RBP: ffff9e48d509b7e8 R08: ffff9e48e0eadb20 R09: 0000000000000105
      [ 6231.331579] R10: 0000000000000195 R11: 000000000007ffff R12: ffff9e48e0eadb20
      [ 6231.332623] R13: ffff9e48e0eadbd8 R14: 00001fffffffffff R15: ffff9e48e0eadab8
      [ 6231.333687] FS: 0000000000000000(0000) GS:ffff9e48ffd00000(0000) knlGS:0000000000000000
      [ 6231.334876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6231.335722] CR2: 0000000000000000 CR3: 0000000077340000 CR4: 00000000001606e0
      [ 6231.336779] Call Trace:
      [ 6231.337803] [<ffffffffc0a94ba2>] lu_object_free.isra.30+0xf2/0x170 [obdclass]
      [ 6231.338948] [<ffffffffc0a989a6>] lu_object_find_at+0x496/0x930 [obdclass]
      [ 6231.340080] [<ffffffffa59870d2>] ? mutex_lock+0x12/0x2f
      [ 6231.341069] [<ffffffffc1220fcb>] ? lod_putref+0x6b/0x980 [lod]
      [ 6231.341954] [<ffffffffc121f5f0>] ? lod_free_comp_buffer+0x2b0/0x460 [lod]
      [ 6231.342971] [<ffffffffc1225ca4>] lod_initialize_objects+0x3e4/0xba0 [lod]
      [ 6231.343982] [<ffffffffc1226af3>] lod_parse_striping+0x693/0xc20 [lod]
      [ 6231.344946] [<ffffffffc1227332>] lod_striping_load+0x2b2/0x660 [lod]
      [ 6231.345940] [<ffffffffc123d31b>] lod_declare_destroy+0x12b/0x600 [lod]
      [ 6231.346920] [<ffffffffc123e62a>] ? lod_declare_attr_set+0xba/0x6b0 [lod]
      [ 6231.348138] [<ffffffffc12bd209>] ? mdd_env_info+0x19/0x90 [mdd]
      [ 6231.349034] [<ffffffffc12a52d1>] mdd_declare_finish_unlink+0x91/0x210 [mdd]
      [ 6231.350078] [<ffffffffc12b50cf>] mdd_unlink+0x48f/0xab0 [mdd]
      [ 6231.351294] [<ffffffffc11322c2>] mdt_reint_unlink+0xc32/0x1550 [mdt]
      [ 6231.353207] [<ffffffffc0d79e27>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [ 6231.354201] [<ffffffffc1136c43>] mdt_reint_rec+0x83/0x210 [mdt]
      [ 6231.355116] [<ffffffffc1111f01>] mdt_reint_internal+0x6e1/0xb00 [mdt]
      [ 6231.356087] [<ffffffffc111e2f7>] mdt_reint+0x67/0x140 [mdt]
      [ 6231.357167] [<ffffffffc0de748e>] tgt_request_handle+0xaee/0x15f0 [ptlrpc]
      [ 6231.358336] [<ffffffffc0853297>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 6231.359350] [<ffffffffc0d897cb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [ 6231.360536] [<ffffffffa52d32c3>] ? __wake_up+0x13/0x20
      [ 6231.361346] [<ffffffffc0d8d134>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [ 6231.362274] [<ffffffffa52d4c1e>] ? finish_task_switch+0x4e/0x1c0
      [ 6231.363178] [<ffffffffa598899f>] ? __schedule+0x3af/0x860
      [ 6231.364013] [<ffffffffc0d8c600>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [ 6231.365129] [<ffffffffa52c5da1>] kthread+0xd1/0xe0
      [ 6231.365862] [<ffffffffa52c5cd0>] ? insert_kthread_work+0x40/0x40
      [ 6231.366775] [<ffffffffa5995df7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 6231.367753] [<ffffffffa52c5cd0>] ? insert_kthread_work+0x40/0x40
      [ 6231.368653] Code: 8d 7f 50 4d 8d af 20 01 00 00 49 be ff ff ff ff ff 1f 00 00 e8 95 df 78 ff 48 89 df e8 8d 8e 78 ff 49 8b 9f 20 01 00 00 4c 39 eb <4c> 8b 23 48 89 df 0f 85 ce 00 00 00 e9 9b 01 00 00 66 90 4c 01
      [ 6231.373352] RIP [<ffffffffc130bb3d>] osp_object_free+0x4d/0x490 [osp]
      [ 6231.374350] RSP <ffff9e48d509b7c0>
      [ 6231.374883] CR2: 0000000000000000

      Attachments

        Issue Links

          Activity

            [LU-14781] sanity-flr test 70: mirror create and split race crash
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45442/
            Subject: LU-14781 osp: osp_object_free access NULL pointer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 86e6d90f69d8c34f6af4c6e7eb7cad3e60b7278e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45442/ Subject: LU-14781 osp: osp_object_free access NULL pointer Project: fs/lustre-release Branch: master Current Patch Set: Commit: 86e6d90f69d8c34f6af4c6e7eb7cad3e60b7278e

            "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/45442
            Subject: LU-14781 osp: osp_object_free access NULL pointer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 20dde5a8d428b3f9bf2d0421b333a09545be1c65

            gerrit Gerrit Updater added a comment - "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/45442 Subject: LU-14781 osp: osp_object_free access NULL pointer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 20dde5a8d428b3f9bf2d0421b333a09545be1c65
            bobijam Zhenyu Xu added a comment -

            another hit (https://testing.whamcloud.com/test_sets/8e467190-cbc4-43f6-8b42-89c274f86735) reveals that osp_object free could still access NULL memory

            bobijam Zhenyu Xu added a comment - another hit ( https://testing.whamcloud.com/test_sets/8e467190-cbc4-43f6-8b42-89c274f86735)  reveals that osp_object free could still access NULL memory
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44055/
            Subject: LU-14781 osp: osp object header could be NULL
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 90dab62d635666001b8b4e2b7aa8cc1f23bc6f5b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44055/ Subject: LU-14781 osp: osp object header could be NULL Project: fs/lustre-release Branch: master Current Patch Set: Commit: 90dab62d635666001b8b4e2b7aa8cc1f23bc6f5b
            bobijam Zhenyu Xu added a comment -

            The stack trace of the Oops shows that osp_object_free encountered NULL pointer

            Call Trace:
            [ 6532.245084]  [] lu_object_free.isra.30+0xf2/0x170 [obdclass]
            [ 6532.246905]  [] lu_object_find_at+0x496/0x930 [obdclass]
            [ 6532.251132]  [] lod_initialize_objects+0x3e4/0xba0 [lod]
            [ 6532.252240]  [] lod_parse_striping+0x693/0xc20 [lod]
            [ 6532.253293]  [] lod_striping_load+0x2b2/0x660 [lod]
            [ 6532.254347]  [] lod_declare_destroy+0x12b/0x600 [lod]
            [ 6532.258781]  [] mdd_declare_finish_unlink+0x91/0x210 [mdd]
            [ 6532.259911]  [] mdd_unlink+0x48f/0xab0 [mdd]
            [ 6532.260957]  [] mdt_reint_unlink+0xc32/0x1550 [mdt]
            [ 6532.263377]  [] mdt_reint_rec+0x83/0x210 [mdt]
            [ 6532.264354]  [] mdt_reint_internal+0x6e1/0xb00 [mdt]
            [ 6532.265410]  [] mdt_reint+0x67/0x140 [mdt]
            [ 6532.266424]  [] tgt_request_handle+0xaee/0x15f0 [ptlrpc]
            [ 6532.268678]  [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
            [ 6532.270797]  [] ptlrpc_main+0xb34/0x1470 [ptlrpc]
            [ 6532.273053]  [] kthread+0xd1/0xe0
            [ 6532.274830]  [] ret_from_fork_nospec_begin+0x21/0x21
            [ 6532.276848] Code: 8d 7f 50 4d 8d af 20 01 00 00 49 be ff ff ff ff ff 1f 00 00 e8 75 e1 79 ff 48 89 df e8 6d 90 79 ff 49 8b 9f 20 01 00 00 4c 39 eb  8b 23 48 89 df 0f 85 ce 00 00 00 e9 9b 01 00 00 66 90 4c 01 
            [ 6532.281936] RIP  [] osp_object_free+0x4d/0x490 [osp]
            [ 6532.283028]  RSP 
            [ 6532.283602] CR2: 0000000000000000
            

            And

             
            (gdb) l *(osp_object_free+0x4d)
            0x7b6d is in osp_object_free (/root/work/lustre/lustre/osp/osp_object.c:2296).
            2289		struct lu_object_header	*h = o->lo_header;
            2290		struct osp_xattr_entry *oxe;
            2291		struct osp_xattr_entry *tmp;
            2292		int			count;
            2293	
            2294		dt_object_fini(&obj->opo_obj);
            2295		lu_object_header_fini(h);
            2296		list_for_each_entry_safe(oxe, tmp, &obj->opo_xattr_list, oxe_list) {
            2297			list_del(&oxe->oxe_list);
            2298			count = atomic_read(&oxe->oxe_ref);
            2299			LASSERTF(count == 1,
            2300				 "Still has %d users on the xattr entry %.*s\n",
            

            So the object's lu_object_header is NULL? Which leads me to

            osp_object_alloc()
            111 static struct lu_object *osp_object_alloc(const struct lu_env *env,             
             112                                           const struct lu_object_header *hdr,   
             113                                           struct lu_device *d)                  
             114 {                                                                               
             115         struct lu_object_header *h = NULL;                                      
             116         struct osp_object       *o;                                             
             117         struct lu_object        *l;                                             
             118                                                                                 
             119         OBD_SLAB_ALLOC_PTR_GFP(o, osp_object_kmem, GFP_NOFS);                   
             120         if (o != NULL) {                                                        
             121                 l = &o->opo_obj.do_lu;                                          
             122                                                                                 
             123                 /* If hdr is NULL, it means the object is not built             
             124                  * from the top dev(MDT/OST), usually it happens when           
             125                  * building striped object, like data object on MDT or          
             126                  * striped object for directory */                              
             127                 if (hdr == NULL) {                                              
             128                         h = &o->opo_header;                                     
             129                         lu_object_header_init(h);                               
             130                         dt_object_init(&o->opo_obj, h, d);                      
             131                         lu_object_add_top(h, l);                                
             132                 } else {                                                        
             133                         dt_object_init(&o->opo_obj, h, d);                              
             134                 }                                                               
             135                                                                                 
             136                 l->lo_ops = &osp_lu_obj_ops;                                    
             137                                                                                 
             138                 return l;                                                       
             139         } else {                                                                
             140                 return NULL;                                                    
             141         }                                                                       
             142 }                   
            

            It seems to me that line 133 is faulty, it should use hdr instead of h.

            bobijam Zhenyu Xu added a comment - The stack trace of the Oops shows that osp_object_free encountered NULL pointer Call Trace: [ 6532.245084] [] lu_object_free.isra.30+0xf2/0x170 [obdclass] [ 6532.246905] [] lu_object_find_at+0x496/0x930 [obdclass] [ 6532.251132] [] lod_initialize_objects+0x3e4/0xba0 [lod] [ 6532.252240] [] lod_parse_striping+0x693/0xc20 [lod] [ 6532.253293] [] lod_striping_load+0x2b2/0x660 [lod] [ 6532.254347] [] lod_declare_destroy+0x12b/0x600 [lod] [ 6532.258781] [] mdd_declare_finish_unlink+0x91/0x210 [mdd] [ 6532.259911] [] mdd_unlink+0x48f/0xab0 [mdd] [ 6532.260957] [] mdt_reint_unlink+0xc32/0x1550 [mdt] [ 6532.263377] [] mdt_reint_rec+0x83/0x210 [mdt] [ 6532.264354] [] mdt_reint_internal+0x6e1/0xb00 [mdt] [ 6532.265410] [] mdt_reint+0x67/0x140 [mdt] [ 6532.266424] [] tgt_request_handle+0xaee/0x15f0 [ptlrpc] [ 6532.268678] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 6532.270797] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [ 6532.273053] [] kthread+0xd1/0xe0 [ 6532.274830] [] ret_from_fork_nospec_begin+0x21/0x21 [ 6532.276848] Code: 8d 7f 50 4d 8d af 20 01 00 00 49 be ff ff ff ff ff 1f 00 00 e8 75 e1 79 ff 48 89 df e8 6d 90 79 ff 49 8b 9f 20 01 00 00 4c 39 eb 8b 23 48 89 df 0f 85 ce 00 00 00 e9 9b 01 00 00 66 90 4c 01 [ 6532.281936] RIP [] osp_object_free+0x4d/0x490 [osp] [ 6532.283028] RSP [ 6532.283602] CR2: 0000000000000000 And (gdb) l *(osp_object_free+0x4d) 0x7b6d is in osp_object_free (/root/work/lustre/lustre/osp/osp_object.c:2296). 2289 struct lu_object_header *h = o->lo_header; 2290 struct osp_xattr_entry *oxe; 2291 struct osp_xattr_entry *tmp; 2292 int count; 2293 2294 dt_object_fini(&obj->opo_obj); 2295 lu_object_header_fini(h); 2296 list_for_each_entry_safe(oxe, tmp, &obj->opo_xattr_list, oxe_list) { 2297 list_del(&oxe->oxe_list); 2298 count = atomic_read(&oxe->oxe_ref); 2299 LASSERTF(count == 1, 2300 "Still has %d users on the xattr entry %.*s\n" , So the object's lu_object_header is NULL? Which leads me to osp_object_alloc() 111 static struct lu_object *osp_object_alloc( const struct lu_env *env, 112 const struct lu_object_header *hdr, 113 struct lu_device *d) 114 { 115 struct lu_object_header *h = NULL; 116 struct osp_object *o; 117 struct lu_object *l; 118 119 OBD_SLAB_ALLOC_PTR_GFP(o, osp_object_kmem, GFP_NOFS); 120 if (o != NULL) { 121 l = &o->opo_obj.do_lu; 122 123 /* If hdr is NULL, it means the object is not built 124 * from the top dev(MDT/OST), usually it happens when 125 * building striped object, like data object on MDT or 126 * striped object for directory */ 127 if (hdr == NULL) { 128 h = &o->opo_header; 129 lu_object_header_init(h); 130 dt_object_init(&o->opo_obj, h, d); 131 lu_object_add_top(h, l); 132 } else { 133 dt_object_init(&o->opo_obj, h, d); 134 } 135 136 l->lo_ops = &osp_lu_obj_ops; 137 138 return l; 139 } else { 140 return NULL; 141 } 142 } It seems to me that line 133 is faulty, it should use hdr instead of h.
            bzzz Alex Zhuravlev added a comment - on master https://testing.whamcloud.com/test_sessions/c74ca3d7-6927-41c3-abca-65bf5108eb71
            gerrit Gerrit Updater added a comment - - edited

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/44055
            Subject: LU-14781 osp: init proper lu_object_header
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 2
            Commit: 051cd86ccd395283473e9edd66448694bb7001e6

            gerrit Gerrit Updater added a comment - - edited Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/44055 Subject: LU-14781 osp: init proper lu_object_header Project: fs/lustre-release Branch: master Current Patch Set: 2 Commit: 051cd86ccd395283473e9edd66448694bb7001e6

            People

              bobijam Zhenyu Xu
              bobijam Zhenyu Xu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: