[LU-9725] Mount commands don't return for targets in LFS with DNE and 3 MDTs Created: 30/Jun/17  Updated: 19/Dec/18  Resolved: 14/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Major
Reporter: Tom Nabarro (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File chroma-agent-console.log.txt     Text File chroma-agent.log.txt     Text File job_scheduler.log.txt     Text File messages.txt     HTML File sysrq-t     Text File yum.log.txt    
Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
is related to LU-9376 Recovery bug exposed during sanity 10... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

kernel version: 3.10.0-514.21.1.el7_lustre.x86_64
lustre version: 2.10.0_RC1-1.el7
OS: CentOS Linux release 7.3.1611 (Core)

Failure consistently occurs in test_filesystem_dne.py test_md0_undeleteable() during IML SSI automated test runs testing against lustre b2.10

This is the only test we have which creates a filesystem with 3 MDTs

On recreating LFS (outside of test infrastructure) in a similar configuration with mgs, 3*mdts and 1 ost through IML, all other targets mount commands return successfully but ost mount command never returns.

During when the MDT mount commands are being issued, lots of activity in the kernel messages log including multiple LustreErrors and stack traces, warnings of high cpu usage and then

kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [lwp_notify_fs1-:13630]

This is on a LDISKF only lfs with DNE enabled. The OST mount command used is as follows and the MDT mount commands are of a similar format:

mount -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk5 /mnt/fs1-OST0000

The following gists show excerpts from the /var/log/messages log during instances of this type of failure (MDT mounting in DNE):

https://gist.github.com/tanabarr/1adb35a7e7da2581be79df8f45417411
https://gist.github.com/tanabarr/70d3bfa66c4fc474b82c7c02adcda511
https://gist.github.com/tanabarr/9f54584621aacfdeb3899f59687cb918

The last gist link is an extended excerpt giving more contextual log information regarding the attempted mounting of the MDTs and the subsequent CPU load warnings. The entire logfile for that failure instance (in addition to other IML related log files) is attached to this ticket.

original IML ticket: https://github.com/intel-hpdd/intel-manager-for-lustre/issues/108



 Comments   
Comment by Peter Jones [ 30/Jun/17 ]

Niu

Can you please advise on this one?

Thanks

Peter

Comment by Brian Murrell (Inactive) [ 10/Jul/17 ]

Here's the stuck thread from a full sysrq-t log from a node in such a state:

[65093.076421] ll_mgs_0002     D ffff880023613a30     0 28707      2 0x00000080
[65093.077960]  ffff8800236139e8 0000000000000046 ffff880079ef4e70 ffff880023613fd8
[65093.079596]  ffff880023613fd8 ffff880023613fd8 ffff880079ef4e70 ffff8800366b8800
[65093.081208]  000000000000000e ffff8800366b8890 ffff8800366b8828 ffff880023613a30
[65093.082782] Call Trace:
[65093.083839]  [<ffffffff8168c849>] schedule+0x29/0x70
[65093.085105]  [<ffffffffa0f01875>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[65093.086556]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[65093.087929]  [<ffffffffa0efa4d3>] jbd2_journal_stop+0x343/0x3d0 [jbd2]
[65093.089347]  [<ffffffffa0f9210b>] ? __ldiskfs_handle_dirty_metadata+0x8b/0x220 [ldiskfs]
[65093.090905]  [<ffffffffa0efaf02>] ? jbd2_journal_get_write_access+0x32/0x40 [jbd2]
[65093.092447]  [<ffffffffa0f91c5c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
[65093.093918]  [<ffffffffa0ab532e>] osd_trans_stop+0x18e/0x830 [osd_ldiskfs]
[65093.095407]  [<ffffffffa0acddfb>] ? osd_write+0x15b/0x5b0 [osd_ldiskfs]
[65093.096956]  [<ffffffffa087ec73>] ? lu_context_init+0xd3/0x1f0 [obdclass]
[65093.098633]  [<ffffffffa0b41262>] mgs_ir_update+0x2e2/0xb70 [mgs]
[65093.100328]  [<ffffffffa0b21d6f>] mgs_target_reg+0x77f/0x1370 [mgs]
[65093.102179]  [<ffffffffa107ce7f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[65093.104399]  [<ffffffffa107d001>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[65093.106275]  [<ffffffffa10de895>] tgt_request_handle+0x915/0x1360 [ptlrpc]
[65093.108552]  [<ffffffffa1088133>] ptlrpc_server_handle_request+0x233/0xa90 [ptlrpc]
[65093.110875]  [<ffffffffa1085928>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[65093.112320]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
[65093.113727]  [<ffffffff810ba628>] ? __wake_up_common+0x58/0x90
[65093.115111]  [<ffffffffa108c110>] ptlrpc_main+0xaa0/0x1dd0 [ptlrpc]
[65093.116585]  [<ffffffffa108b670>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
[65093.118178]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[65093.119471]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[65093.120843]  [<ffffffff81697798>] ret_from_fork+0x58/0x90
[65093.122169]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140


Comment by Niu Yawei (Inactive) [ 10/Jul/17 ]

The stack trace posted by Brian indicates a MGS thread is waiting for journal commit, it's not necessary a problem if it didn't wait forever.

As for the CPU softlock up problem:

Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [lwp_notify_fs1-:13630]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache libcfs(OE) snd_intel8x0 snd_ac97_codec ppdev ac97_bus snd_seq snd_seq_device sg pcspkr virtio_balloon snd_pcm parport_pc parport snd_timer snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi virtio_blk virtio_net virtio_scsi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw floppy drm virtio_pci virtio_ring
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: virtio ata_piix libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CPU: 1 PID: 13630 Comm: lwp_notify_fs1- Tainted: P           OEL ------------   3.10.0-514.21.1.el7_lustre.x86_64 #1
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: task: ffff880078071f60 ti: ffff8800388b8000 task.ti: ffff8800388b8000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RIP: 0010:[<ffffffff81327649>]  [<ffffffff81327649>] __write_lock_failed+0x9/0x20
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RSP: 0018:ffff8800388bbe40  EFLAGS: 00000297
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RAX: ffff880038e1f800 RBX: ffff8800388bbe18 RCX: 0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RDX: 000000000000002e RSI: ffff88003c4c3ec4 RDI: ffff880046ca1384
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: RBP: ffff8800388bbe40 R08: 0000000000019b20 R09: ffffffffa065a9a1
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: R10: ffff88007fd19b20 R11: ffffea0000e21500 R12: ffff8800388bbdd0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: R13: ffff880044384d80 R14: ffff8800388bbe18 R15: 0000000000000028
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: CR2: 00007ff777278f30 CR3: 00000000019be000 CR4: 00000000000006e0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Stack:
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff8800388bbe50 ffffffff8168e827 ffff8800388bbe70 ffffffffa0f4795a
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff88003c4c3e80 ffff880038e1fc00 ffff8800388bbe98 ffffffffa0674dda
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: ffff880038e1fc00 ffff880035b1f900 ffff880035b1f9b0 ffff8800388bbec0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: Call Trace:
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffff8168e827>] _raw_write_lock+0x17/0x20
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffffa0f4795a>] qsd_conn_callback+0x5a/0x160 [lquota]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffffa0674dda>] lustre_notify_lwp_list+0xba/0x100 [obdclass]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffffa1385af6>] lwp_notify_main+0x56/0xc0 [osp]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffffa1385aa0>] ? lwp_import_event+0xb0/0xb0 [osp]
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffff81697798>] ret_from_fork+0x58/0x90
Jun 29 08:16:06 lotus-32vm7.lotus.hpdd.lab.intel.com kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

Looks the thread is blocked on a spinlock in qsd_conn_callback()>write_lock(&qsd>qsd_lock), I'll investigate it more.

Comment by Gerrit Updater [ 11/Jul/17 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: https://review.whamcloud.com/27987
Subject: LU-9725 lwp: wait on deregister
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a21f2bbb8b7a7c76528d461e06072b5b9759be43

Comment by Brian Murrell (Inactive) [ 13/Jul/17 ]

niu Could you add a brief summary to this ticket about what conditions cause this bug to happen?  The description seems to suggest that just a simple-ish configuration of 1 MGS, 3 MDTs and 1 OST will cause this to happen when the OST is started.  Is there something more subtle about this configuration that is causing this bug that we can try to avoid, so as to not hit it?

Comment by Niu Yawei (Inactive) [ 13/Jul/17 ]

There is a race that can result in cpu hang on qsd_conn_callback() when start/shutdown servers, it's not configuration related.
I presume this is a rare issue that can't be steadily reproduced, am I right?

Comment by Brian Murrell (Inactive) [ 13/Jul/17 ]

We can reproduce it fairly frequently.

Frequently enough that we had to disable a few tests that were fairly reliably reproducing it.

Comment by Niu Yawei (Inactive) [ 14/Jul/17 ]

Could you share with me what kind of tests can reliably reproduce it? Does the problem always happen when start/stop server? Could you help to verify if the patch really solve the problem? Thanks in advance.

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27987/
Subject: LU-9725 lwp: wait on deregister
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5d5702a3ec24cd1bc7effbadb13d272fa51dff05

Comment by Peter Jones [ 19/Jul/17 ]

So, Niu's patch has landed but have we confirmed that this fix meets the needs of the reporter?

Comment by Brian Murrell (Inactive) [ 19/Jul/17 ]

have we confirmed that this fix meets the needs of the reporter

Not yet. It's on our TODO list but there are some things that have to happen first before we are able to test IML with a review build.  I'm working on those right now.

Comment by Peter Jones [ 19/Jul/17 ]

If we expedite landing it to b2_10 would that help?

Comment by Brian Murrell (Inactive) [ 19/Jul/17 ]

Might not be necessary.  I might have enough pieces in place today to be able to install a review build of Lustre in IML.

Comment by Brian Murrell (Inactive) [ 20/Jul/17 ]

I'm afraid I have to renege.  We need the client support that is in the stack of reviews that I have outstanding in order to get a successful IML installation.

So verifying this issue is going to be blocked by getting that stack landed.

Comment by Gerrit Updater [ 20/Jul/17 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/28161
Subject: LU-9725 lwp: wait on deregister
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 183c78264bb067d49df9b76901a67ab631c2d751

Comment by Gerrit Updater [ 21/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28161/
Subject: LU-9725 lwp: wait on deregister
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: ab0d0c35b3de894f38f8941380d137848751c6eb

Comment by Peter Jones [ 21/Jul/17 ]

Brian

Are you now able to verify this fix?

Peter

Comment by Brian Murrell (Inactive) [ 21/Jul/17 ]

pjones: Not yet I'm afraid.  We need the stack of 3 packaging patches in combination with this fix to get a functional IML (with Lustre 2.10) system up with which to test the fix for this ticket.  Those three patches look (at least maybe only partially contentiously) in progress so I am hopeful.

Comment by Minh Diep [ 22/Jul/17 ]

landed in 2.11

Comment by Brian Murrell (Inactive) [ 25/Jul/17 ]

I'm afraid I don't think the patch fixed the problem.

Using:

Lustre: Build Version: 2.10.0_5_gbb3c407

which I believe should have the fix in it I'm still seeing:

[ 1604.696255] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lwp_notify_test:25841]
[ 1604.703478] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev snd_pcm sg snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper virtio_scsi syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata i2c_core serio_raw floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[ 1604.785347] CPU: 0 PID: 25841 Comm: lwp_notify_test Tainted: P OE ------------ 3.10.0-514.21.1.el7_lustre.x86_64 #1
[ 1604.798164] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1604.804864] task: ffff88001ac3edd0 ti: ffff88001a204000 task.ti: ffff88001a204000
[ 1604.812330] RIP: 0010:[<ffffffff81327649>] [<ffffffff81327649>] __write_lock_failed+0x9/0x20
[ 1604.820634] RSP: 0018:ffff88001a207e40 EFLAGS: 00000206
[ 1604.828340] RAX: ffff880020eb5400 RBX: ffff88001a207e18 RCX: 0000000000000000
[ 1604.834839] RDX: 0000000000000016 RSI: ffff88001ce8ecc7 RDI: ffff88007a9c1984
[ 1604.842076] RBP: ffff88001a207e40 R08: 0000000000019b20 R09: ffffffffa066f9c1
[ 1604.848577] R10: ffff88007fc19b20 R11: ffffea00006d0400 R12: ffff88001a207dd0
[ 1604.854735] R13: ffff88001ad083c0 R14: ffff88001a207e18 R15: 0000000000000028
[ 1604.861290] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1604.867586] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1604.873392] CR2: 00007fe96aa680e0 CR3: 000000001acfd000 CR4: 00000000000006f0
[ 1604.879558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 1604.885757] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[ 1604.891532] Stack: 
[ 1604.896912] ffff88001a207e50 ffffffff8168e827 ffff88001a207e70 ffffffffa0ff295a 
[ 1604.903591] ffff88001ce8ec80 ffff880020eb6400 ffff88001a207e98 ffffffffa0689eca 
[ 1604.910593] ffff880020eb6400 ffff880020a03000 ffff880020a030b0 ffff88001a207ec0 
[ 1604.916651] Call Trace: 
[ 1604.922930] [<ffffffff8168e827>] _raw_write_lock+0x17/0x20 
[ 1604.929711] [<ffffffffa0ff295a>] qsd_conn_callback+0x5a/0x160 [lquota] 
[ 1604.935637] [<ffffffffa0689eca>] lustre_notify_lwp_list+0xba/0x100 [obdclass]
[ 1604.941153] [<ffffffffa14d8af6>] lwp_notify_main+0x56/0xc0 [osp]
[ 1604.946248] [<ffffffffa14d8aa0>] ? lwp_import_event+0xb0/0xb0 [osp]
[ 1604.951715] [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[ 1604.956861] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 1604.962093] [<ffffffff81697798>] ret_from_fork+0x58/0x90
[ 1604.967130] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[ 1604.972471] Code: 66 90 48 89 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 90 <83> 3f 01 75 f9 f0 ff 0f 75 f1 5d c3 66 66 2e 0f 1f 84 00 00 00 

 

Comment by Peter Jones [ 26/Jul/17 ]

Lai

Could you please advise on this one?

Thanks

Peter

Comment by Lai Siyao [ 01/Aug/17 ]

Brian, is it true 2.10.0 include this fix? even tag 2.10.50 doesn't include this, can you test with master build?

Comment by Brian Murrell (Inactive) [ 01/Aug/17 ]

Brian, is it true 2.10.0 include this fix?

No, 2.10.0 does not include this fix. b2_10 does contain this fix though and that's what we are testing with. You can see from the comment above though that we tested with 2.10.0_5_gbb3c407 which is a build with this commit in it which is 5 commits newer than the landed patch for this ticket.  So we definitely did test the patch from this ticket.

even tag 2.10.50 doesn't include this, can you test with master build?

We cannot test with master due to issues that Jenkins has trying to be an HTTP server. But given the above, we really shouldn't need to test master given that we have tested the patch on b2_10.

Comment by Lai Siyao [ 01/Aug/17 ]

Thanks Brian, I'll continue checking the code.

Comment by Brian Murrell (Inactive) [ 01/Aug/17 ]

laisiyao: No problem. Let me know if there is anything else I can do to help.

Comment by Gerrit Updater [ 04/Aug/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/28356
Subject: LU-9725 quota: always deregister lwp
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1a5df6bc53e8356d8ae83d4031ea4397ebc03af3

Comment by Lai Siyao [ 04/Aug/17 ]

Brian, I uploaded a patch, could you help verify it?

Comment by Gerrit Updater [ 04/Aug/17 ]

Brian J. Murrell (brian.murrell@intel.com) uploaded a new patch: https://review.whamcloud.com/28357
Subject: LU-9725 quota: always deregister lwp
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 728454ee8f85d3074124933d0d83e42f10515500

Comment by Brian Murrell (Inactive) [ 04/Aug/17 ]

laisiyao: Sure.  I will do a cherry-pick to b2_10 and test from there.

Comment by James A Simmons [ 04/Aug/17 ]

I just tested this patch and this is the bug that preventing my debugfs port. Due to lwp not being totally unregistered the debugfs kobjects were not being freed so when it attempted to mount the second time the MDT it would fail due to the debugfs files already existing. You can't register debugfs file twice.

Comment by Gerrit Updater [ 13/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28356/
Subject: LU-9725 quota: always deregister lwp
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce8ca7d3564439285a56982430f380354b697f68

Comment by Peter Jones [ 14/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28357/
Subject: LU-9725 quota: always deregister lwp
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: e53c0fbeefc1c29d7b5256c6a4cc6ead96ae41e8

Generated at Sat Feb 10 02:28:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.