Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15059

Setting several tbf rules at the same time causes crashes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      I was looking to reproduce an another TBF bug (LU-15056) when I trigger this one.

      reproducer

      start rule_name1 uid={500} rate=100 & \
      start rule_name2 uid={1000} rate=2000 &
      

      crash

      [ 5940.060061] BUG: unable to handle kernel paging request at 00000000deadbeef
      [ 5940.060112] IP: [<ffffffffa978f100>] strlen+0x0/0x30
      [ 5940.060157] PGD 80000000d084d067 PUD 0
      [ 5940.060188] Oops: 0000 [#1] SMP 
      [ 5940.060198] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) mbcache jbd2 libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq aesni_intel snd_seq_device snd_pcm lrw gf128mul glue_helper ablk_helper cryptd snd_timer snd pcspkr sg soundcore parport_pc vboxguest(OE) i2c_piix4 parport video ip_tables xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm libahci crct10dif_pclmul crct10dif_common
      [ 5940.060571]  drm ata_piix serio_raw crc32c_intel libata e1000 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
      [ 5940.060641] CPU: 3 PID: 8460 Comm: lctl Kdump: loaded Tainted: G           OE  ------------   3.10.0-1127.8.2.el7_lustre.x86_64 #1
      [ 5940.060682] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [ 5940.060722] task: ffff8bac04d9d230 ti: ffff8bac9a8f8000 task.ti: ffff8bac9a8f8000
      [ 5940.060768] RIP: 0010:[<ffffffffa978f100>]  [<ffffffffa978f100>] strlen+0x0/0x30
      [ 5940.060786] RSP: 0018:ffff8bac9a8fbe20  EFLAGS: 00010246
      [ 5940.060798] RAX: 0000000000000001 RBX: ffff8baccc8dc000 RCX: 0000000000000000
      [ 5940.060813] RDX: 0000000000000713 RSI: 0000000000000000 RDI: 00000000deadbeef
      [ 5940.060828] RBP: ffff8bac9a8fbe38 R08: 000000000000076b R09: 0000000000000000
      [ 5940.060843] R10: 0000000000000000 R11: 000000000000000f R12: 00000000deadbeef
      [ 5940.060858] R13: 0000000000000000 R14: 0000000000000003 R15: ffff8bac741ea240
      [ 5940.060874] FS:  00007f56cf31c740(0000) GS:ffff8bacdfd80000(0000) knlGS:0000000000000000
      [ 5940.060891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5940.060904] CR2: 00000000deadbeef CR3: 0000000009fb6000 CR4: 00000000000606e0
      [ 5940.060921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 5940.060960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 5940.060997] Call Trace:
      [ 5940.061054]  [<ffffffffc0e867f3>] ? nrs_tbf_generic_cmd_fini+0x53/0x100 [ptlrpc]
      [ 5940.061118]  [<ffffffffc0e86905>] nrs_tbf_cmd_fini.part.30+0x65/0xd0 [ptlrpc]
      [ 5940.061607]  [<ffffffffc0e8a91d>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0xbdd/0x1030 [ptlrpc]
      [ 5940.062062]  [<ffffffffa964d1b0>] vfs_write+0xc0/0x1f0
      [ 5940.062579]  [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a
      [ 5940.063031]  [<ffffffffa964df7f>] SyS_write+0x7f/0xf0
      [ 5940.063540]  [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a
      [ 5940.063979]  [<ffffffffa9b92ed2>] system_call_fastpath+0x25/0x2a
      [ 5940.064482]  [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a
      [ 5940.064923] Code: 89 f8 48 89 e5 f6 82 20 ad c5 a9 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 20 ad c5 a9 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80
      [ 5940.066420] RIP  [<ffffffffa978f100>] strlen+0x0/0x30
      [ 5940.066856]  RSP <ffff8bac9a8fbe20>
      [ 5940.067317] CR2: 00000000deadbeef
      

      Analysis
      After some search, it seem that the crash is cause by a double kfree on "nrs_tbf_generic_cmd_fini() cmd->u.tc_start.ts_conds_str" pointer:

      00000100:00000010:3.0:1633132460.865076:0:8460:0:(nrs_tbf.c:1770:nrs_tbf_expression_free()) kfreed 'expr': 48 at ffff8bac99516340.
      00000100:00000010:1.0:1633132460.865077:0:8459:0:(nrs_tbf.c:1808:nrs_tbf_generic_cmd_fini()) kfreed 'cmd->u.tc_start.ts_conds_str': 35 at ffff8bac99516f40.  <--------------------------
      00000100:00000010:3.0:1633132460.865077:0:8460:0:(nrs_tbf.c:1786:nrs_tbf_conjunction_free()) kfreed 'conjunction': 32 at ffff8bac90937840.
      00000100:00000010:1.0:1633132460.865078:0:8459:0:(nrs_tbf.c:3641:ptlrpc_lprocfs_nrs_tbf_rule_seq_write()) kfreed 'cmd': 144 at ffff8baccc8dc000.
      00000100:00000010:1.0:1633132460.865078:0:8459:0:(nrs_tbf.c:3643:ptlrpc_lprocfs_nrs_tbf_rule_seq_write()) kfreed 'kernbuf': 4096 at ffff8bacc96eb000.
      00000100:00000010:3.0:1633132460.865078:0:8460:0:(nrs_tbf.c:1808:nrs_tbf_generic_cmd_fini()) kfreed 'cmd->u.tc_start.ts_conds_str': 35 at ffff8bac99516f40.  <--------------------------
      

      This is possible because of the following code:

      static ssize_t
      ptlrpc_lprocfs_nrs_sbf_rule_seq_write(struct file *file,
                                            const char __user *buffer,
                                            size_t count, loff_t *off)
      {
              struct seq_file           *m = file->private_data;
              struct ptlrpc_service     *svc = m->private;
              char                      *kernbuf;
              char                      *val;
              int                        rc;
              static struct nrs_tbf_cmd *cmd;                                         <----------------
              enum ptlrpc_nrs_queue_type queue = PTLRPC_NRS_QUEUE_BOTH;
      ...
      
              cmd = nrs_tbf_parse_cmd(val, length, nrs_tbf_type_flag(svc, queue));    <----------------
              if (IS_ERR(cmd))
                      GOTO(out_free_kernbuff, rc = PTR_ERR(cmd));
      

      The "cmd" static pointer is overwriten by the 2sd thread.
      I don't why this pointer should be static: it is allocated and free in the ptlrpc_lprocfs_nrs_sbf_rule_seq_write() function.

      Attachments

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: