Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
I was looking to reproduce an another TBF bug (LU-15056) when I trigger this one.
reproducer
start rule_name1 uid={500} rate=100 & \ start rule_name2 uid={1000} rate=2000 &
crash
[ 5940.060061] BUG: unable to handle kernel paging request at 00000000deadbeef [ 5940.060112] IP: [<ffffffffa978f100>] strlen+0x0/0x30 [ 5940.060157] PGD 80000000d084d067 PUD 0 [ 5940.060188] Oops: 0000 [#1] SMP [ 5940.060198] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) mbcache jbd2 libcfs(OE) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq aesni_intel snd_seq_device snd_pcm lrw gf128mul glue_helper ablk_helper cryptd snd_timer snd pcspkr sg soundcore parport_pc vboxguest(OE) i2c_piix4 parport video ip_tables xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm libahci crct10dif_pclmul crct10dif_common [ 5940.060571] drm ata_piix serio_raw crc32c_intel libata e1000 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [ 5940.060641] CPU: 3 PID: 8460 Comm: lctl Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.8.2.el7_lustre.x86_64 #1 [ 5940.060682] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 5940.060722] task: ffff8bac04d9d230 ti: ffff8bac9a8f8000 task.ti: ffff8bac9a8f8000 [ 5940.060768] RIP: 0010:[<ffffffffa978f100>] [<ffffffffa978f100>] strlen+0x0/0x30 [ 5940.060786] RSP: 0018:ffff8bac9a8fbe20 EFLAGS: 00010246 [ 5940.060798] RAX: 0000000000000001 RBX: ffff8baccc8dc000 RCX: 0000000000000000 [ 5940.060813] RDX: 0000000000000713 RSI: 0000000000000000 RDI: 00000000deadbeef [ 5940.060828] RBP: ffff8bac9a8fbe38 R08: 000000000000076b R09: 0000000000000000 [ 5940.060843] R10: 0000000000000000 R11: 000000000000000f R12: 00000000deadbeef [ 5940.060858] R13: 0000000000000000 R14: 0000000000000003 R15: ffff8bac741ea240 [ 5940.060874] FS: 00007f56cf31c740(0000) GS:ffff8bacdfd80000(0000) knlGS:0000000000000000 [ 5940.060891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5940.060904] CR2: 00000000deadbeef CR3: 0000000009fb6000 CR4: 00000000000606e0 [ 5940.060921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5940.060960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5940.060997] Call Trace: [ 5940.061054] [<ffffffffc0e867f3>] ? nrs_tbf_generic_cmd_fini+0x53/0x100 [ptlrpc] [ 5940.061118] [<ffffffffc0e86905>] nrs_tbf_cmd_fini.part.30+0x65/0xd0 [ptlrpc] [ 5940.061607] [<ffffffffc0e8a91d>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0xbdd/0x1030 [ptlrpc] [ 5940.062062] [<ffffffffa964d1b0>] vfs_write+0xc0/0x1f0 [ 5940.062579] [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a [ 5940.063031] [<ffffffffa964df7f>] SyS_write+0x7f/0xf0 [ 5940.063540] [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a [ 5940.063979] [<ffffffffa9b92ed2>] system_call_fastpath+0x25/0x2a [ 5940.064482] [<ffffffffa9b92e15>] ? system_call_after_swapgs+0xa2/0x13a [ 5940.064923] Code: 89 f8 48 89 e5 f6 82 20 ad c5 a9 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 20 ad c5 a9 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80 [ 5940.066420] RIP [<ffffffffa978f100>] strlen+0x0/0x30 [ 5940.066856] RSP <ffff8bac9a8fbe20> [ 5940.067317] CR2: 00000000deadbeef
Analysis
After some search, it seem that the crash is cause by a double kfree on "nrs_tbf_generic_cmd_fini() cmd->u.tc_start.ts_conds_str" pointer:
00000100:00000010:3.0:1633132460.865076:0:8460:0:(nrs_tbf.c:1770:nrs_tbf_expression_free()) kfreed 'expr': 48 at ffff8bac99516340. 00000100:00000010:1.0:1633132460.865077:0:8459:0:(nrs_tbf.c:1808:nrs_tbf_generic_cmd_fini()) kfreed 'cmd->u.tc_start.ts_conds_str': 35 at ffff8bac99516f40. <-------------------------- 00000100:00000010:3.0:1633132460.865077:0:8460:0:(nrs_tbf.c:1786:nrs_tbf_conjunction_free()) kfreed 'conjunction': 32 at ffff8bac90937840. 00000100:00000010:1.0:1633132460.865078:0:8459:0:(nrs_tbf.c:3641:ptlrpc_lprocfs_nrs_tbf_rule_seq_write()) kfreed 'cmd': 144 at ffff8baccc8dc000. 00000100:00000010:1.0:1633132460.865078:0:8459:0:(nrs_tbf.c:3643:ptlrpc_lprocfs_nrs_tbf_rule_seq_write()) kfreed 'kernbuf': 4096 at ffff8bacc96eb000. 00000100:00000010:3.0:1633132460.865078:0:8460:0:(nrs_tbf.c:1808:nrs_tbf_generic_cmd_fini()) kfreed 'cmd->u.tc_start.ts_conds_str': 35 at ffff8bac99516f40. <--------------------------
This is possible because of the following code:
static ssize_t ptlrpc_lprocfs_nrs_sbf_rule_seq_write(struct file *file, const char __user *buffer, size_t count, loff_t *off) { struct seq_file *m = file->private_data; struct ptlrpc_service *svc = m->private; char *kernbuf; char *val; int rc; static struct nrs_tbf_cmd *cmd; <---------------- enum ptlrpc_nrs_queue_type queue = PTLRPC_NRS_QUEUE_BOTH; ... cmd = nrs_tbf_parse_cmd(val, length, nrs_tbf_type_flag(svc, queue)); <---------------- if (IS_ERR(cmd)) GOTO(out_free_kernbuff, rc = PTR_ERR(cmd));
The "cmd" static pointer is overwriten by the 2sd thread.
I don't why this pointer should be static: it is allocated and free in the ptlrpc_lprocfs_nrs_sbf_rule_seq_write() function.
Attachments
Issue Links
- is related to
-
LU-17452 fix interop sanityn tests with b2_15
-
- Resolved
-
I pushed a fix in
LU-17452.