LFSCK 4: improve LFSCK performance
(LU-6361)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Technical task | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lfsck | ||
| Environment: |
OpenSFS cluster running lustre 2.7.0-RC-4 build # 29 with two MDSs with two MDTs each, three OSSs with two OSTs each and three clients. |
||
| Issue Links: |
|
||||
| Rank (Obsolete): | 17773 | ||||
| Description |
|
While running the stability test from the LFSCK Phase 3 test plan, the primary MDS, containing MDT0 and MDT2, crashed. The stability test crates 150 directories with 10,000 object each; files, striped directories, links, etc. . Then one process calls LFSCK namespace over and over while another process deletes the 150 directories and objects and all other processes create directories with files, striped directories, etc. The first LFSCK namespace on all four MDTs runs and completes. The second time LFSCK is called, the primary MDS crashes. When the MDT comes back, I see that the status of LFSCK is “crashed” : do_facet mds1 lctl get_param -n mdd.scratch-MDT0000.lfsck_namespace status = crashed When the MDS comes back, I see the following errors in dmesg: Lustre: scratch-MDT0002-osp-MDT0000: Connection restored to scratch-MDT0002 (at 0@lo) Lustre: scratch-MDT0002: Recovery over after 0:05, of 9 clients 9 recovered and 0 were evicted. LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166586, ql: 1, comp: 8, conn: 9, next: 4328166588, last_committed: 4328166570) LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166599, ql: 1, comp: 8, conn: 9, next: 4328166601, last_committed: 4328166570) LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166607, ql: 1, comp: 8, conn: 9, next: 4328166609, last_committed: 4328166570) LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166613, ql: 1, comp: 8, conn: 9, next: 4328166615, last_committed: 4328166570) Lustre: scratch-MDT0000-osp-MDT0002: Connection restored to scratch-MDT0000 (at 0@lo) Lustre: scratch-MDT0000: Recovery over after 0:35, of 9 clients 9 recovered and 0 were evicted. From the vmcore-dmesg: <1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000ae <1>IP: [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs] <4>PGD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 9 <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldi skfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrp c(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic crc3 2c_intel jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq _ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_um ad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor_support serio _raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core io atdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp p ps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic at a_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_ mod [last unloaded: libcfs] <4> <4>Pid: 8092, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH <4>RIP: 0010:[<ffffffffa0dee512>] [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs] <4>RSP: 0018:ffff880b446cdaa0 EFLAGS: 00010246 <4>RAX: 0000000000000000 RBX: ffff8807e2a16900 RCX: ffff880a0bedeb14 <4>RDX: ffff8801f990d070 RSI: ffff8807e2a16900 RDI: ffff880191c11e40 <4>RBP: ffff880b446cdb30 R08: fffffffffffffffe R09: ffffffffa0dee430 <4>R10: 0000000000000000 R11: 0000000000000002 R12: ffff880191c11e40 <4>R13: ffff880191c11e40 R14: ffff880a0bedeb14 R15: ffff880a0bedeb14 <4>FS: 0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00000000000000ae CR3: 0000000001a85000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process lfsck_namespace (pid: 8092, threadinfo ffff880b446cc000, task ffff880c28df0040) <4>Stack: <4> ffff8801ac526000 ffff880a0bedeb14 000000000000001a ffff8801f990d350 <4><d> ffff880a0bedeae8 0000000000004000 ffff880b446cdb30 ffffffff8128daa4 <4><d> ffff8801f990d070 ffff880b446cdb40 ffff880b446cdb00 ffffffffa05e22c3 <4>Call Trace: <4> [<ffffffff8128daa4>] ? snprintf+0x34/0x40 <4> [<ffffffffa05e22c3>] ? fld_server_lookup+0x53/0x330 [fld] <4> [<ffffffffa0eee082>] lfsck_namespace_check_exist+0xd2/0x410 [lfsck] <4> [<ffffffffa0f24fc6>] lfsck_namespace_handle_striped_master+0x1b6/0xb50 [lfsck] <4> [<ffffffffa0868931>] ? lu_object_find_at+0xb1/0xe0 [obdclass] <4> [<ffffffffa0ef1532>] lfsck_namespace_assistant_handler_p1+0xb52/0x2310 [lfsck] <4> [<ffffffff81170b79>] ? __drain_alien_cache+0x89/0xa0 <4> [<ffffffffa0ee16e6>] lfsck_assistant_engine+0x496/0x1de0 [lfsck] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0ee1250>] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck] <4> [<ffffffff8109abf6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 05 04 af 04 00 a3 16 00 00 48 c7 05 05 af 04 00 00 00 00 00 c7 05 f3 ae 04 00 01 00 00 00 e8 76 5c 76 ff 4c 8b 45 90 48 8b 43 40 <0f> b7 80 ae 00 00 00 25 00 f0 00 00 3d 00 40 00 00 0f 85 75 0a <1>RIP [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs] <4> RSP <ffff880b446cdaa0> <4>CR2: 00000000000000ae I will upload the vmcore. |
| Comments |
| Comment by James Nunez (Inactive) [ 07/Mar/15 ] |
|
vmcore and vmcore-dmesg.txt are at uploads/ |
| Comment by James Nunez (Inactive) [ 07/Mar/15 ] |
|
On one of the OSTs: Lustre: MGC192.168.2.125@o2ib: Connection restored to MGS (at 192.168.2.125@o2ib) Lustre: Skipped 88 previous similar messages LustreError: 167-0: scratch-MDT0002-lwp-OST0005: This client was evicted by scratch-MDT0002; in progress operations using this service will fail. LustreError: Skipped 69 previous similar messages Lustre: scratch-OST0005: deleting orphan objects from 0x0:686571 to 0x0:686625 Lustre: scratch-OST0004: deleting orphan objects from 0x0:686568 to 0x0:686625 LustreError: 3986:0:(ofd_grant.c:183:ofd_grant_sanity_check()) ofd_statfs: tot_granted 262912 != fo_tot_granted 85590784 LustreError: 3986:0:(ofd_grant.c:189:ofd_grant_sanity_check()) ofd_statfs: tot_dirty 0 != fo_tot_dirty 1048576 |
| Comment by nasf (Inactive) [ 08/Mar/15 ] |
|
Under some cases, when the LFSCK locate the object via its FID, it does not check whether it exists or not, then further using such object may access NULL-pointed local object (inode for ldiskfs). Part of the issue has been fixed in the patch: http://review.whamcloud.com/#/c/13993/. But it is not enough. I will make another path for the left issues. |
| Comment by Gerrit Updater [ 08/Mar/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/14009 |
| Comment by Gerrit Updater [ 28/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14009/ |