Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10595

Use after free in mgc_process_cfg_log

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      Just had two or three instances of this crash on current master:

      [ 9714.914126] Lustre: DEBUG MARKER: == conf-sanity test 80: mgc import reconnect race ==================================================== 18:53:56 (1517442836)
      [ 9726.765708] Lustre: MGS: Connection restored to MGC192.168.123.219@tcp_0 (at 0@lo)
      [ 9726.769718] Lustre: Skipped 5 previous similar messages
      [ 9728.389908] Lustre: Setting parameter lustre-MDT0000.mdt.identity_upcall in log lustre-MDT0000
      [ 9728.392105] Lustre: Skipped 4 previous similar messages
      [ 9728.916729] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
      [ 9728.950481] Lustre: lustre-MDT0000: new disk, initializing
      [ 9731.840889] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
      [ 9732.428932] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
      [ 9763.590945] Lustre: lustre-OST0000: new disk, initializing
      [ 9763.610198] Lustre: srv-lustre-OST0000: No data found on store. Initialize space
      [ 9763.611457] Lustre: Skipped 1 previous similar message
      [ 9764.792295] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180
      [ 9767.273735] Lustre: 25479:0:(genops.c:1759:obd_export_evict_by_uuid()) MGS: evicting e9e169f7-485c-0d53-9888-f0d9fa325f2a at adminstrative request
      [ 9769.188319] LustreError: 166-1: MGC192.168.123.219@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
      [ 9769.190799] Lustre: *** cfs_fail_loc=906, val=2147483648***
      [ 9775.191600] Lustre: MGS: Received new LWP connection from 0@lo, removing former export from same NID
      [ 9775.194433] Lustre: Evicted from MGS (at 192.168.123.219@tcp) after server handle changed from 0xe819d599f6959e7b to 0xe819d599f6959ff5
      [ 9810.655763] Lustre: lustre-OST0001: new disk, initializing
      [ 9810.680854] Lustre: srv-lustre-OST0001: No data found on store. Initialize space
      [ 9811.278451] Lustre: lustre-MDT0000: Connection restored to 192.168.123.219@tcp (at 0@lo)
      [ 9811.279949] Lustre: Skipped 6 previous similar messages
      [ 9811.492792] Lustre: lustre-OST0001: Imperative Recovery not enabled, recovery window 60-180
      [ 9815.865193] Lustre: server umount lustre-MDT0000 complete
      [ 9823.266665] Lustre: 31033:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1517442938/real 1517442938]  req@ffff8802f4f8ec00 x1591153820441296/t0(0) o400->lustre-MDT0000-lwp-OST0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1517442945 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 9823.270325] Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [ 9860.012641] Lustre: 21310:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1517442938/real 1517442938]  req@ffff8802ecd8bc00 x1591153820441264/t0(0) o101->MGC192.168.123.219@tcp@0@lo:26/25 lens 328/344 e 0 to 1 dl 1517442982 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [ 9860.061544] Lustre: 21310:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
      [ 9860.062498] LustreError: 166-1: MGC192.168.123.219@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
      [ 9860.064815] LustreError: 21310:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
      [ 9860.319901] Lustre: server umount lustre-OST0000 complete
      [ 9880.065686] BUG: unable to handle kernel paging request at ffff8800a25b1e20
      [ 9880.066512] IP: [<ffffffffa01b863d>] mgc_process_cfg_log+0x5d/0x1030 [mgc]
      [ 9880.067345] PGD 2e75067 PUD 33fa01067 PMD 33f8ee067 PTE 80000000a25b1060
      [ 9880.068112] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 9880.068885] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea ata_generic sysfillrect sysimgblt pata_acpi ttm drm_kms_helper ata_piix i2c_piix4 drm libata serio_raw virtio_balloon virtio_console pcspkr i2c_core virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
      [ 9880.075609] CPU: 13 PID: 21310 Comm: ll_cfg_requeue Tainted: P        W  OE  ------------   3.10.0-debug #2
      [ 9880.076999] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 9880.077705] task: ffff8800aa5a8200 ti: ffff8802d2dc0000 task.ti: ffff8802d2dc0000
      [ 9880.079168] RIP: 0010:[<ffffffffa01b863d>]  [<ffffffffa01b863d>] mgc_process_cfg_log+0x5d/0x1030 [mgc]
      [ 9880.080517] RSP: 0018:ffff8802d2dc3ca8  EFLAGS: 00010286
      [ 9880.081028] RAX: ffff8800a25b1800 RBX: 0000000000000000 RCX: 0000000000000000
      [ 9880.081505] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8802d1a7e9c0
      [ 9880.083252] RBP: ffff8802d2dc3d08 R08: 0000000000000000 R09: 0000000000000000
      [ 9880.083807] R10: ffff8802a4dbc2ca R11: 0000000000000000 R12: ffff8802d6659e00
      [ 9880.084362] R13: ffff8802a17cd000 R14: ffff88032397a1c0 R15: ffff8800aa5a8200
      [ 9880.084884] FS:  0000000000000000(0000) GS:ffff88033e5a0000(0000) knlGS:0000000000000000
      [ 9880.085839] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 9880.086372] CR2: ffff8800a25b1e20 CR3: 00000000bb3ff000 CR4: 00000000000006e0
      [ 9880.086984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 9880.087508] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 9880.088054] Stack:
      [ 9880.088518]  0000000000000246 00000001d6659e00 ffffffffa01c4538 0000000000000246
      [ 9880.090260]  ffffffffa01c4520 ffff8800aa5a8200 ffff880200000001 0000000000000000
      [ 9880.091773]  ffff8800aa5a8200 ffff8802a17cd000 ffff8802d6659e00 ffff8800aa5a8200
      [ 9880.092920] Call Trace:
      [ 9880.093433]  [<ffffffffa01bc673>] mgc_process_log+0x3d3/0x8a0 [mgc]
      [ 9880.093947]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
      [ 9880.094467]  [<ffffffffa01bea70>] mgc_requeue_thread+0x2c0/0x870 [mgc]
      [ 9880.095022]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
      [ 9880.095795]  [<ffffffffa01be7b0>] ? mgc_process_config+0x1400/0x1400 [mgc]
      [ 9880.096607]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [ 9880.097352]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [ 9880.098189]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [ 9880.099016]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [ 9880.099840] Code: 04 00 00 4d 85 e4 0f 84 5a 0d 00 00 41 8b 84 24 c0 00 00 00 83 f8 01 0f 84 17 0d 00 00 49 8b 44 24 30 48 85 c0 0f 84 6b 04 00 00 <4c> 8b b8 20 06 00 00 48 8b 3d c5 8d ae e2 ba 38 00 00 00 be 50 
      [ 9880.103060] RIP  [<ffffffffa01b863d>] mgc_process_cfg_log+0x5d/0x1030 [mgc]
      
      (gdb) l *(mgc_process_cfg_log+0x5d)
      0x363d is in mgc_process_cfg_log (/home/green/git/lustre-release/lustre/mgc/mgc_request.c:1913).
      1908	
      1909		LASSERT(cld);
      1910		LASSERT(mutex_is_locked(&cld->cld_lock));
      1911	
      1912		if (cld->cld_cfg.cfg_sb)
      1913			lsi = s2lsi(cld->cld_cfg.cfg_sb);
      1914	
      1915		OBD_ALLOC_PTR(env);
      1916		if (env == NULL)
      1917			RETURN(-ENOMEM);
      (gdb) 
      

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: