Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12010

qsd_upd_thread use after free on server shutdown

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      This crash happens at time on server shutdown for me:

      [170816.014563] Lustre: server umount lustre-MDT0000 complete
      [170820.769998] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [170820.770106] Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [170820.774258] Lustre: Skipped 3 previous similar messages
      [170820.775154] Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping)
      [170820.775163] Lustre: Skipped 4 previous similar messages
      [170820.788095] LustreError: Skipped 2 previous similar messages
      [170821.231402] LustreError: 26613:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8802eab93b40 x1626148381397504/t0(0) o41->lustre-MDT0002-osp-MDT0001@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [170821.235073] LustreError: 26613:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message
      [170822.774970] Lustre: 26620:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550816084/real 1550816084]  req@ffff8802fa0f7b40 x1626148381397248/t0(0) o400->MGC192.168.123.169@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1550816091 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [170822.781474] LustreError: 166-1: MGC192.168.123.169@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
      [170823.045059] BUG: unable to handle kernel paging request at ffff8802b8f5ddcc
      [170823.045059] IP: [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0
      [170823.045059] PGD 241b067 PUD 33ebfa067 PMD 33ea32067 PTE 80000002b8f5d060
      [170823.055931] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [170823.055931] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_piix4 virtio_balloon virtio_console ip_tables rpcsec_gss_krb5 ata_generic drm_kms_helper pata_acpi ttm crct10dif_pclmul drm ata_piix drm_panel_orientation_quirks crct10dif_common serio_raw crc32c_intel i2c_core libata virtio_blk floppy [last unloaded: libcfs]
      [170823.055931] 
      [170823.055931] CPU: 7 PID: 27695 Comm: lquota_wb_lustr Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #1
      [170823.055931] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [170823.055931] task: ffff8802c3b30c40 ti: ffff8802aa504000 task.ti: ffff8802aa504000
      [170823.055931] RIP: 0010:[<ffffffff813fb685>]  [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0
      [170823.055931] RSP: 0018:ffff8802aa507d70  EFLAGS: 00010002
      [170823.055931] RAX: 0000000000000246 RBX: 0000000000000246 RCX: 0000000000000000
      [170823.055931] RDX: ffff8802aa507fd8 RSI: 0000000000000003 RDI: ffff8802b8f5ddc8
      [170823.055931] RBP: ffff8802aa507d78 R08: 0000000000000010 R09: ffff8802daa98700
      [170823.055931] R10: 0000000000000000 R11: 000000000000000f R12: ffff8802b8f5dc00
      [170823.055931] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
      [170823.055931] FS:  0000000000000000(0000) GS:ffff88033dbc0000(0000) knlGS:0000000000000000
      [170823.055931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [170823.055931] CR2: ffff8802b8f5ddcc CR3: 0000000001c10000 CR4: 00000000001607e0
      [170823.055931] Call Trace:
      [170823.055931]  [<ffffffff817b9e30>] _raw_spin_lock_irqsave+0x30/0x40
      [170823.055931]  [<ffffffff810c2db3>] __wake_up+0x23/0x50
      [170823.055931]  [<ffffffffa0acfbb2>] qsd_upd_thread+0xb02/0xc90 [lquota]
      [170823.055931]  [<ffffffff817b6cd0>] ? __schedule+0x410/0xa00
      [170823.055931]  [<ffffffff810caae0>] ? wake_up_state+0x20/0x20
      [170823.055931]  [<ffffffffa0acf0b0>] ? qsd_upd_add+0xf0/0xf0 [lquota]
      [170823.055931]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [170823.055931]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      [170823.055931]  [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21
      [170823.055931]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      

      The call is in

      0x13be0 is in qsd_upd_thread (/home/green/git/lustre-release/lustre/quota/qsd_writeback.c:512).
      507				qsd_start_reint_thread(qsd->qsd_type_array[qtype]);
      508		}
      509		lu_env_fini(env);
      510		OBD_FREE_PTR(env);
      511		thread_set_flags(thread, SVC_STOPPED);
      512		wake_up(&thread->t_ctl_waitq);
      

      so by the time we got there, the thread structure was already freed, clearly a race.

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: