Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
[ 880.747314] Lustre: *** cfs_fail_loc=198, val=0*** [ 880.747550] BUG: unable to handle kernel NULL pointer dereference at 0000000000000298 [ 880.747600] IP: [<ffffffff816b68a9>] _raw_read_lock+0x9/0x20 [ 880.747603] PGD 0 [ 880.747605] Oops: 0002 [#1] SMP [ 880.747650] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ppdev fb_sys_fops drm virtio_balloon joydev pcspkr i2c_piix4 parport_pc i2c_core parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_blk ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy [ 880.747654] CPU: 1 PID: 11067 Comm: mdt_rdpg00_002 Tainted: G OE ------------ 3.10.0-693.21.1.x3.1.143.x86_64 #1 [ 880.747655] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 880.747656] task: ffff88008d7abf40 ti: ffff8800a7b70000 task.ti: ffff8800a7b70000 [ 880.747660] RIP: 0010:[<ffffffff816b68a9>] [<ffffffff816b68a9>] _raw_read_lock+0x9/0x20 [ 880.747667] RSP: 0018:ffff8800a7b737b0 EFLAGS: 00010213 [ 880.747668] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 880.747669] RDX: ffff8800a7b73810 RSI: 0000000000000000 RDI: 0000000000000298 [ 880.747670] RBP: ffff8800a7b737b0 R08: ffff8800a7b73914 R09: 0000000000000000 [ 880.747671] R10: ffffffffc0ad529e R11: 0000000000000000 R12: 0000000000000000 [ 880.747672] R13: ffff8800a7b73810 R14: ffff8800a7b73880 R15: ffffffffc0699500 [ 880.747674] FS: 0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 [ 880.747676] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 880.747677] CR2: 0000000000000298 CR3: 00000000360ce000 CR4: 00000000000006e0 [ 880.747704] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 880.747724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 880.747725] Call Trace: [ 880.747830] [<ffffffffc050181a>] ldiskfs_es_lookup_extent+0x2a/0x180 [ldiskfs] [ 880.747841] [<ffffffffc04cf06d>] ldiskfs_map_blocks+0x5d/0x700 [ldiskfs] [ 880.747918] [<ffffffffc0a8c35c>] ? qsd_op_end+0x7c/0x6e0 [lquota] [ 880.748111] [<ffffffffc0645169>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 880.748122] [<ffffffffc04cf775>] ldiskfs_getblk+0x65/0x200 [ldiskfs] [ 880.748131] [<ffffffffc04cf937>] ldiskfs_bread+0x27/0xc0 [ldiskfs] [ 880.748206] [<ffffffffc0aefe16>] iam_node_read+0x66/0x100 [osd_ldiskfs] [ 880.748222] [<ffffffffc0af2d7d>] iam_lfix_guess+0x2d/0xd0 [osd_ldiskfs] [ 880.748225] [<ffffffff816b24b2>] ? mutex_lock+0x12/0x2f [ 880.748235] [<ffffffffc0aef5bc>] iam_container_setup+0x5c/0x120 [osd_ldiskfs] [ 880.748248] [<ffffffffc0ad552c>] osd_index_try+0x49c/0x690 [osd_ldiskfs] [ 880.748255] [<ffffffffc0a75abd>] lquota_disk_slv_find_create+0x71d/0x850 [lquota] [ 880.748271] [<ffffffffc0a9b3c5>] qmt_pool_new_conn+0x2f5/0x360 [lquota] [ 880.748280] [<ffffffffc0a93dcc>] qmt_intent_policy+0x65c/0xe50 [lquota] [ 880.748608] [<ffffffffc089e0d0>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] [ 880.748709] [<ffffffffc0c860da>] mdt_intent_opc+0x21a/0xae0 [mdt] [ 880.748754] [<ffffffffc08a2550>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc] [ 880.748776] [<ffffffffc0645169>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 880.748795] [<ffffffffc0c8df63>] mdt_intent_policy+0x1a3/0x360 [mdt] [ 880.748824] [<ffffffffc0851f0e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc] [ 880.748887] [<ffffffffc043d67e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [ 880.748922] [<ffffffffc087a753>] ldlm_handle_enqueue0+0x8f3/0x13e0 [ptlrpc] [ 880.748959] [<ffffffffc08a25d0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [ 880.749059] [<ffffffffc0900b32>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 880.749103] [<ffffffffc09044da>] tgt_request_handle+0x92a/0x13b0 [ptlrpc]
I think, the next steps happened
- srcub 1b test called a scrub_prep
- it set MDT failloc to 0x198
- it prevented the inserting of osd_obj to oi, and broke the logic of osd_fid_lookup for searching quota slv file
- osd_fid_lookup returned osd object with oo_inode equal to 0x0
- and a bit later, osd_index_try got BUG cause it tried to access to &LDISKFS_I(inode)->i_es_lock