[LU-3270] ptlrpcd strnlen crash trying to log a message Created: 03/May/13  Updated: 10/Sep/18  Resolved: 21/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Lai Siyao
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by LU-6439 sanity test_120g: panic on client Resolved
Related
is related to LU-8312 (statahead.c:351:do_sa_entry_fini()) ... Open
is related to LU-4624 ll_stop_statahead deadlock Resolved
is related to LU-4451 Kernel Oops with NFS reexport using m... Resolved
Severity: 3
Rank (Obsolete): 8107

 Description   

Just hit this (use after free, I think) after a few days of recovery-small:

[307783.003678] Lustre: DEBUG MARKER: == recovery-small test complete, duration 1814 sec == 08:30:02 (1367584202)
[307783.153758] BUG: unable to handle kernel paging request at ffff880097114fa8
[307783.154227] IP: [<ffffffff8127cc39>] strnlen+0x9/0x40
[307783.154604] PGD 1a26063 PUD 501067 PMD 5ba067 PTE 8000000097114163
[307783.155028] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[307783.155408] last sysfs file: /sys/devices/system/cpu/possible
[307783.155791] CPU 0 
[307783.155847] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
[307783.156905] 
[307783.156905] Pid: 17616, comm: ptlrpcd_3 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs
[307783.156905] RIP: 0010:[<ffffffff8127cc39>]  [<ffffffff8127cc39>] strnlen+0x9/0x40
[307783.156905] RSP: 0018:ffff8800af8e9980  EFLAGS: 00010206
[307783.156905] RAX: ffffffff817a6cc1 RBX: ffff8800b00e1000 RCX: 0000000000000001
[307783.156905] RDX: ffff880097114fa8 RSI: 0000000000000017 RDI: ffff880097114fa8
[307783.156905] RBP: ffff8800af8e9980 R08: 000000000000002e R09: 0000000000000053
[307783.156905] R10: 0000000000000001 R11: 0000000000000104 R12: ffff8800b00e0a4b
[307783.156905] R13: ffff880097114fa8 R14: 00000000ffffffff R15: 0000000000000000
[307783.156905] FS:  00007fbed06dc700(0000) GS:ffff880006200000(0000) knlGS:0000000000000000
[307783.156905] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[307783.156905] CR2: ffff880097114fa8 CR3: 0000000001a25000 CR4: 00000000000006f0
[307783.156905] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[307783.156905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[307783.156905] Process ptlrpcd_3 (pid: 17616, threadinfo ffff8800af8e8000, task ffff8800b08aa5c0)
[307783.156905] Stack:
[307783.156905]  ffff8800af8e99c0 ffffffff8127def0 ffff880099facdf0 ffff8800b00e0a4b
[307783.156905] <d> ffffffffa13328e0 ffffffffa13328df ffff8800af8e9b40 ffff8800b00e1000
[307783.156905] <d> ffff8800af8e9a60 ffffffff8127f338 0000000000000004 0000000affffffff
[307783.156905] Call Trace:
[307783.156905]  [<ffffffff8127def0>] string+0x40/0x100
[307783.156905]  [<ffffffff8127f338>] vsnprintf+0x218/0x5e0
[307783.156905]  [<ffffffffa0a9127b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
[307783.156905]  [<ffffffffa0aa1b6b>] libcfs_debug_vmsg2+0x2fb/0xab0 [libcfs]
[307783.156905]  [<ffffffffa0aa2361>] libcfs_debug_msg+0x41/0x50 [libcfs]
[307783.156905]  [<ffffffffa0a9cde8>] ? libcfs_log_return+0x28/0x40 [libcfs]
[307783.156905]  [<ffffffffa132c2e5>] mdc_finish_intent_lock+0x535/0x7f0 [mdc]
[307783.156905]  [<ffffffffa0a9cde8>] ? libcfs_log_return+0x28/0x40 [libcfs]
[307783.156905]  [<ffffffffa132d388>] ? mdc_finish_enqueue+0x208/0x1180 [mdc]
[307783.156905]  [<ffffffffa132e626>] mdc_intent_getattr_async_interpret+0x326/0x4c0 [mdc]
[307783.156905]  [<ffffffffa0ea46b2>] ptlrpc_check_set+0x2b2/0x1da0 [ptlrpc]
[307783.156905]  [<ffffffffa0ed1e1b>] ptlrpcd_check+0x55b/0x590 [ptlrpc]
[307783.156905]  [<ffffffffa0ed2363>] ptlrpcd+0x233/0x390 [ptlrpc]
[307783.156905]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
[307783.156905]  [<ffffffffa0ed2130>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[307783.156905]  [<ffffffff8100c10a>] child_rip+0xa/0x20
[307783.156905]  [<ffffffffa0ed2130>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[307783.156905]  [<ffffffffa0ed2130>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[307783.156905]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
[307783.156905] Code: 66 90 48 83 c2 01 80 3a 00 75 f7 48 89 d0 48 29 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 89 e5 74 2e <80> 3f 00 74 29 48 83 ee 01 48 89 f8 eb 12 66 0f 1f 84 00 00 00 

Crashdump and modules are in /exports/crashdumps/192.168.10.217-2013-05-03-08\:30\:08/
source tag is 20130430



 Comments   
Comment by Oleg Drokin [ 05/May/13 ]

Hit it once more

[111421.423886] Lustre: DEBUG MARKER: == recovery-small test complete, duration 1966 sec == 21:24:24 (1367717064)
[111426.185618] BUG: unable to handle kernel paging request at ffff8800968dafa8
[111426.185917] IP: [<ffffffff8127cc39>] strnlen+0x9/0x40
[111426.186162] PGD 1a26063 PUD 501067 PMD 5b6067 PTE 80000000968da163
[111426.186438] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[111426.186686] last sysfs file: /sys/devices/system/cpu/possible
[111426.186934] CPU 7 
[111426.186970] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
[111426.189508] 
[111426.189508] Pid: 26777, comm: ptlrpcd_1 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs
[111426.189508] RIP: 0010:[<ffffffff8127cc39>]  [<ffffffff8127cc39>] strnlen+0x9/0x40
[111426.189508] RSP: 0018:ffff8800b787d980  EFLAGS: 00010202
[111426.189508] RAX: ffffffff817a6cc1 RBX: ffff8800650bc000 RCX: 0000000000000001
[111426.189508] RDX: ffff8800968dafa8 RSI: 000000000000001a RDI: ffff8800968dafa8
[111426.189508] RBP: ffff8800b787d980 R08: 000000000000002e R09: 0000000000000053
[111426.189508] R10: 0000000000000001 R11: 000000000000006e R12: ffff8800650bb205
[111426.189508] R13: ffff8800968dafa8 R14: 00000000ffffffff R15: 0000000000000000
[111426.189508] FS:  00007fbb99048700(0000) GS:ffff8800063c0000(0000) knlGS:0000000000000000
[111426.189508] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[111426.189508] CR2: ffff8800968dafa8 CR3: 0000000001a25000 CR4: 00000000000006e0
[111426.189508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[111426.189508] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[111426.189508] Process ptlrpcd_1 (pid: 26777, threadinfo ffff8800b787c000, task ffff8800b706e440)
[111426.189508] Stack:
[111426.189508]  ffff8800b787d9c0 ffffffff8127def0 ffff8800b06c3df0 ffff8800650bb205
[111426.189508] <d> ffffffffa08488e0 ffffffffa08488df ffff8800b787db40 ffff8800650bc000
[111426.189508] <d> ffff8800b787da60 ffffffff8127f338 0000000000000004 0000000affffffff
[111426.189508] Call Trace:
[111426.189508]  [<ffffffff8127def0>] string+0x40/0x100
[111426.189508]  [<ffffffff8127f338>] vsnprintf+0x218/0x5e0
[111426.189508]  [<ffffffffa08b627b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
[111426.189508]  [<ffffffffa08c6b6b>] libcfs_debug_vmsg2+0x2fb/0xab0 [libcfs]
[111426.189508]  [<ffffffffa08c7361>] libcfs_debug_msg+0x41/0x50 [libcfs]
[111426.189508]  [<ffffffffa08c1de8>] ? libcfs_log_return+0x28/0x40 [libcfs]
[111426.189508]  [<ffffffffa08422e5>] mdc_finish_intent_lock+0x535/0x7f0 [mdc]
[111426.189508]  [<ffffffffa08c1de8>] ? libcfs_log_return+0x28/0x40 [libcfs]
[111426.189508]  [<ffffffffa0843388>] ? mdc_finish_enqueue+0x208/0x1180 [mdc]
[111426.189508]  [<ffffffffa0844626>] mdc_intent_getattr_async_interpret+0x326/0x4c0 [mdc]
[111426.189508]  [<ffffffffa06346b2>] ptlrpc_check_set+0x2b2/0x1da0 [ptlrpc]
[111426.189508]  [<ffffffffa0661e2b>] ptlrpcd_check+0x55b/0x590 [ptlrpc]
[111426.189508]  [<ffffffffa0662373>] ptlrpcd+0x233/0x390 [ptlrpc]
[111426.189508]  [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20
[111426.189508]  [<ffffffffa0662140>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[111426.189508]  [<ffffffff8100c10a>] child_rip+0xa/0x20
[111426.189508]  [<ffffffffa0662140>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[111426.189508]  [<ffffffffa0662140>] ? ptlrpcd+0x0/0x390 [ptlrpc]
[111426.189508]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
[111426.189508] Code: 66 90 48 83 c2 01 80 3a 00 75 f7 48 89 d0 48 29 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 89 e5 74 2e <80> 3f 00 74 29 48 83 ee 01 48 89 f8 eb 12 66 0f 1f 84 00 00 00 

Crashdump in /exports/crashdumps/192.168.10.223-2013-05-04-21\:24\:32
source tag master-20130505

Comment by Oleg Drokin [ 15/May/13 ]

Well, this one seems to have turned into a very frequent occurence for me, so I dug into it some more.
The root reason is in statahead code.

statahead thread creates a statahead request and pushes it to ptlrpcd, then after a while it decides to exit before the reply to that request is received. sa_entry refcount at 1 then dropped to 0 (decreased from 2 at ll_statehaed_one):

static void ll_sa_entry_cleanup(struct ll_statahead_info *sai,
                                 struct ll_sa_entry *entry)
{
        struct md_enqueue_info *minfo = entry->se_minfo;
        struct ptlrpc_request  *req   = entry->se_req;

        if (minfo) {
                entry->se_minfo = NULL;
                ll_intent_release(&minfo->mi_it);
                iput(minfo->mi_dir);
                OBD_FREE_PTR(minfo);
        }

        if (req) {
                entry->se_req = NULL;
                ptlrpc_req_finished(req);
        }
}

This code frees minfo that is still accessed in mdc_intent_getattr_async_interpret:

static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
                                              struct ptlrpc_request *req,
                                              void *args, int rc)
{
        struct mdc_getattr_args  *ga = args;
        struct obd_export        *exp = ga->ga_exp;
        struct md_enqueue_info   *minfo = ga->ga_minfo;
...
        rc = mdc_finish_intent_lock(exp, req, &minfo->mi_data, it, lockh);
...
}

mdc_finish_intent_lock tries to print op_data from minfo and dies deep in vsprintf stepping on freed memory.

And here's a bit from the log while I have it:

00000100:00000040:6.0:1368112087.663178:0:6676:0:(ptlrpcd.c:271:ptlrpcd_add_req()) @@@ add req [ffff88005d64f7f0] to pc [ptlrpcd_7:7]  req@ffff88005d64f7f0 x1434567312028328/t0(0) o101->lustre-MDT0000-mdc-ffff8800682927f0@0@lo:12/10 lens 600/1080 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
00000080:00400000:6.0:1368112087.664185:0:6676:0:(statahead.c:1279:ll_statahead_thread()) stop agl thread: [pid 6676]
00000080:00000001:6.0:1368112087.664247:0:6676:0:(statahead.c:554:ll_sai_put()) Process leaving
00000080:00400000:6.0:1368112087.664247:0:6676:0:(statahead.c:1306:ll_statahead_thread()) statahead thread stopped: [pid 6676] [parent d51]
…
00000002:00100000:2.0:1368112087.664825:0:19458:0:(mdc_locks.c:579:mdc_finish_enqueue()) @@@ op: 8 disposition: 7, status: 0  req@ffff88005d64f7f0 x1434567312028328/t0(0) o101->lustre-MDT0000-mdc-ffff8800682927f0@0@lo:12/10 lens 600/536 e 0 to 0 dl 1368112104 ref 1 fl Interpret:R/0/0 rc 301/301
Comment by Oleg Drokin [ 15/May/13 ]

here's my patch created after discussion with Fan Yong: http://review.whamcloud.com/6348

Comment by nasf (Inactive) [ 16/May/13 ]

Hi Oleg, I found that the race may be NOT exist. Sorry I did NOT realise that when discussed with you.

static void ll_sa_entry_cleanup(struct ll_statahead_info *sai,
                                 struct ll_sa_entry *entry)
{
==>        struct md_enqueue_info *minfo = entry->se_minfo;
        struct ptlrpc_request  *req   = entry->se_req;

==>        if (minfo) {
                entry->se_minfo = NULL;
                ll_intent_release(&minfo->mi_it);
                iput(minfo->mi_dir);
                OBD_FREE_PTR(minfo);
        }

        if (req) {
                entry->se_req = NULL;
                ptlrpc_req_finished(req);
        }
}

That means if the statahead thread can free the "minfo", then the "entry->se_minfo" must be set. But in the whole logic, only the "ll_statahead_interpret()" can set the "entry->se_minfo" which is called by "mdc_intent_getattr_async_interpret()". But before "mdc_intent_getattr_async_interpret" calls "ll_statahead_interpret()" to set "entry->se_minfo", it has already hold reference count on the "entry". So at that time, the statahead thread should not trigger entry release processing.

static void ll_sa_entry_put(struct ll_statahead_info *sai,
                             struct ll_sa_entry *entry)
{
==>        if (cfs_atomic_dec_and_test(&entry->se_refcount)) {
                CDEBUG(D_READA, "free sa entry %.*s(%p) index "LPU64"\n",
                       entry->se_qstr.len, entry->se_qstr.name, entry,
                       entry->se_index);

                LASSERT(cfs_list_empty(&entry->se_link));
                LASSERT(cfs_list_empty(&entry->se_list));
                LASSERT(ll_sa_entry_unhashed(entry));

                ll_sa_entry_cleanup(sai, entry);
                if (entry->se_inode)
                        iput(entry->se_inode);

                OBD_FREE(entry, entry->se_size);
                cfs_atomic_dec(&sai->sai_cache_count);
        }
}

So there should be other issues we did not found yet...

Comment by Oleg Drokin [ 16/May/13 ]

Yes, you are right.
per our skype discussion, the real issue is the freeing of entry->name that mdc_finish_intent_lock tries to print afterwards.

Comment by nasf (Inactive) [ 17/May/13 ]

This is the new patch:

http://review.whamcloud.com/#change,6372

Comment by Lai Siyao [ 17/May/13 ]

Separating statahead thread (exit) from statahead RPCs makes code complicated. I'd prefer statahead thread wait for all inflight statahead RPCs finish, and clean them up and then exit. By this way statahead is not guaranteed to quit when ll_file_release() is called, but this should not be an issue. I'll implement this and if all goes well, I'll commit a patch separately.

Comment by Oleg Drokin [ 18/May/13 ]

also one of the solutions could be to just embedd the name into opdata instead of just hte pointeer.
That would also resolve (part of) LU3306 and probably some more bugs.

Comment by Lai Siyao [ 20/May/13 ]

My patch is on http://review.whamcloud.com/#change,6392

Comment by Ann Koehler (Inactive) [ 21/Nov/13 ]

The patch http://review.whamcloud.com/#change,6392 has not caused any regressions when run through Cray's vers system test suite.

We adopted the patch because we encountered the LBUG:

> 14:06:40 LustreError: 16861:0:(statahead.c:351:do_sa_entry_fini()) ASSERTION( !ll_sa_entry_unhashed(entry) ) failed: 
> 14:06:40 LustreError: 16861:0:(statahead.c:351:do_sa_entry_fini()) LBUG
> 14:06:40 Pid: 16861, comm: DeletionService
> 14:06:40 Call Trace:
> 14:06:40 [<ffffffff81005db9>] try_stack_unwind+0x169/0x1b0
> 14:06:40 [<ffffffff81004849>] dump_trace+0x89/0x450
> 14:06:40 [<ffffffffa02278d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
> 14:06:40 [<ffffffffa0227e37>] lbug_with_loc+0x47/0xc0 [libcfs]
> 14:06:40 [<ffffffffa0888b02>] ll_sai_unplug+0x4b2/0x580 [lustre]
> 14:06:40 [<ffffffffa088b45b>] do_statahead_enter+0x32b/0x1900 [lustre]
> 14:06:40 [<ffffffffa08792bc>] ll_lookup_it+0x9cc/0xb00 [lustre]
> 14:06:40 [<ffffffffa087947c>] ll_lookup_nd+0x8c/0x3e0 [lustre]
> 14:06:41 [<ffffffff8114dacc>] d_alloc_and_lookup+0x4c/0x80
> 14:06:41 [<ffffffff8114f13e>] do_lookup+0x2ae/0x3b0
> 14:06:41 [<ffffffff811519b3>] path_lookupat+0xc3/0x5f0
> 14:06:41 [<ffffffff81151f15>] do_path_lookup+0x35/0xd0
> 14:06:41 [<ffffffff81152c63>] user_path_at_empty+0x83/0xb0
> 14:06:41 [<ffffffff81152ca1>] user_path_at+0x11/0x20
> 14:06:41 [<ffffffff81147905>] vfs_fstatat+0x55/0x90
> 14:06:41 [<ffffffff81147a6b>] vfs_stat+0x1b/0x20
> 14:06:41 [<ffffffff81147a94>] sys_newstat+0x24/0x50
> 14:06:41 [<ffffffff8138ce6b>] system_call_fastpath+0x16/0x1b
> 14:06:41 [<00002aaaab5dc325>] 0x2aaaab5dc325
> 14:06:41 Kernel panic - not syncing: LBUG

Analysis of the associated dump showed that lli_sai was not NULL on entry to do_statahead_enter() but was NULL at the time of the ASSERTION. In the version of Lustre being run, do_statahead_enter() did not acquire the ll_sa_lock allowing the lli_sai to be changed. Patch 6392 adds lock/unlock calls on ll_sa_lock. With the patch, the LBUG has not been reproduced. The rest of the patch looks like it might have a positive effect on a ll_sa hang that we see occasionally but the verdict is still out.

Comment by Patrick Farrell (Inactive) [ 04/Dec/13 ]

This is the cover message from my Gerrit review, placed here so it's more readable:

We hit the following assertion while running with this patch:

> 02:54:14 LustreError: 17617:0:(statahead.c:1660:do_statahead_enter()) can't start ll_sa thread, rc: -2080
> 02:54:14 LustreError: 17617:0:(statahead.c:473:ll_sai_put()) ASSERTION( lli->u.d.d_opendir_pid == 0 ) failed:
> 02:54:14 LustreError: 17617:0:(statahead.c:473:ll_sai_put()) LBUG
> 02:54:14 Pid: 17617, comm: du
> 02:54:14 Call Trace:
> 02:54:14 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0
> 02:54:14 [<ffffffff81004919>] dump_trace+0x89/0x450
> 02:54:14 [<ffffffffa022a8d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
> 02:54:14 [<ffffffffa022ae37>] lbug_with_loc+0x47/0xc0 [libcfs]
> 02:54:14 [<ffffffffa085fd14>] ll_sai_put+0x574/0x6e0 [lustre]
> 02:54:14 [<ffffffffa0862e48>] do_statahead_enter+0x1598/0x19c0 [lustre]
> 02:54:14 [<ffffffffa08003d0>] ll_revalidate_it+0xb10/0x1b10 [lustre]
> 02:54:14 [<ffffffffa08015ce>] ll_revalidate_nd+0x1fe/0x3e0 [lustre]
> 02:54:14 [<ffffffff81158f64>] do_lookup+0x134/0x3b0
> 02:54:14 [<ffffffff8115b913>] path_lookupat+0xc3/0x5f0
> 02:54:14 [<ffffffff8115be75>] do_path_lookup+0x35/0xd0
> 02:54:14 [<ffffffff8115cbc3>] user_path_at_empty+0x83/0xb0
> 02:54:14 [<ffffffff8115cc01>] user_path_at+0x11/0x20
> 02:54:14 [<ffffffff81151775>] vfs_fstatat+0x55/0x90
> 02:54:14 [<ffffffff811517d4>] sys_newfstatat+0x24/0x50
> 02:54:14 [<ffffffff813c836b>] system_call_fastpath+0x16/0x1b
> 02:54:14 [<00002aaaaad977ee>] 0x2aaaaad977ee

This assertion was added in the statahead patch we took in the patch from 803507.

(Note: #define lli_opendir_pid u.d.d_opendir_pid)

In particular, without the patch, ll_sai_put sets lli->u.d.d_opendir_pid to 0. The patch changes it to assert that lli->u.d.d_opendir_pid is 0.

Looking at some of the uses of ll_sai_put, we can see that lli->u.d.d_opendir_pid is sometimes set to 0 before ll_sai_put is called called, but also that this is not done in do_statahead_enter.

The inline comments give more detail. I've pointed out some of the ll_sai_put calls that appear problematic, but it's also called from other places and I'm not sure the PID is necessarily zero in all of them.

Comment by Lai Siyao [ 05/Dec/13 ]

Patrick, you hit that assert because statahead is not started successfully. By design, sai has a refcount in allocation, and it will be put in ll_stop_statahead(), which will set opendir_pid to 0. And in normal situation, ll_stop_statahead() will be called in file release or statahead ratio is too low. So it means this function will be called before sai put its last refcount, which means this assert should be valid.

I'll update the patch to handle this corner case.

Comment by Ann Koehler (Inactive) [ 05/Dec/13 ]

"you hit that assert because statahead is not started successfully"

Any idea why statahead may not have gotten started? The clue is probably this error message:

(statahead.c:1660:do_statahead_enter()) can't start ll_sa thread, rc: -2080

We do have a dump for the LBUG Patrick reported. I just haven't had time to do the analysis. Any guesses would be appreciated.

Comment by Lai Siyao [ 06/Dec/13 ]

Hmm, 2080 doesn't look to be a valid error number. What kernel version are you using?

Comment by Ann Koehler (Inactive) [ 06/Dec/13 ]

The client, where the LBUG occurred, was running 2.5 on SLES 11 SP3 (Linux 3.0.82).
Servers were Lustre 2.5 on CentOS (2.6.32).

Comment by Phil Schwan (Inactive) [ 28/Dec/13 ]

For what it's worth, we've also hit the "can't start ll_sa thread" bug four times, though with a slightly different followup assertion:

LustreError: 14399:0:(statahead.c:1698:do_statahead_enter()) can't start ll_sa thread: rc: -2752
LustreError: 14399:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed

I don't have all four crashes, but I believe that rc was different each time.

Lustre: Lustre: Build Version: 2.5.0-RC1--PRISTINE-2.6.32-279.14.1.el6.x86_64
Linux hnod0032 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Dec 19 07:05:20 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment by Ann Koehler (Inactive) [ 07/Feb/14 ]

With patchset 20 backported to 2.5, an ll_agl_XXXX thread occasionally hits a general protection fault in __wake_up. In all cases, the corresponding ll_sa_XXXX thread has terminated. Looks like the sai struct has been deallocated so the ll_agl_XXXX thread panics when it goes to access it.

Dump uploaded to: ftp.whamcloud.com:/upload/LU-3270/LU-3270_ll_agl_gpf.tgz
Client is Lustre 2.5 with LU-3270 PS20 running on SLES11 SP3

crash> ps | grep ll_sa
  20979      2   2  ffff88080f12f7e0  IN   0.0       0      0  [ll_sa_20914]
crash> ps | grep ll_agl
> 20978      2   0  ffff8807d9783080  RU   0.0       0      0  [ll_agl_20915]
  20980      2   3  ffff88080f12f040  IN   0.0       0      0  [ll_agl_20914]
crash> bt
PID: 20978  TASK: ffff8807d9783080  CPU: 0   COMMAND: "ll_agl_20915"
 #0 [ffff8807d9c1fcc0] die at ffffffff81005f6b
 #1 [ffff8807d9c1fcf0] do_general_protection at ffffffff8100361a
 #2 [ffff8807d9c1fd20] general_protection at ffffffff813628df
    [exception RIP: __wake_up_common+49]
    RIP: ffffffff8102fc91  RSP: ffff8807d9c1fdd0  RFLAGS: 00010096
    RAX: 5a5a5a5a5a5a5a5a  RBX: ffff88080095b118  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 0000000000000003  RDI: ffff88080095b118
    RBP: ffff8807d9c1fe00   R8: 5a5a5a5a5a5a5a42   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000001  R14: ffff88080095b120  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #3 [ffff8807d9c1fe08] __wake_up at ffffffff81032b68
 #4 [ffff8807d9c1fe48] ll_agl_thread at ffffffffa0795aa2 [lustre]
 #5 [ffff8807d9c1fee8] kthread at ffffffff8106610e
 #6 [ffff8807d9c1ff48] kernel_thread_helper at ffffffff81364034

crash> ll_inode_info ffff8806325a9580 | grep magic
  lli_inode_magic = 287116773,
crash> ll_inode_info ffff8806325a9580 | grep sai
      d_sai = 0x0,
Comment by Ann Koehler (Inactive) [ 23/Apr/14 ]

>> Any idea why statahead may not have gotten started? The clue is probably this error message:
>>
>> (statahead.c:1660:do_statahead_enter()) can't start ll_sa thread, rc: -2080

Finally figured out the cause of this bug. The 64-bit output of kthread_run is being assigned to an int and truncated. In the cases where a valid task pointer, when interpreted as an int, falls between 0 and -4095 it is treated as an error. The code in ll_start_agl handles the return from kthread_run correctly.

Comment by Lai Siyao [ 24/Apr/14 ]

Thank you, I'll fix it in related patch soon.

Comment by Ryan Haasken [ 30/Apr/14 ]

Lai, I have created LU-4985 for the bug about incorrect handling of the kthread_run return value in start_statahead_thread. That bug is also for other calls to kthread_run which are handled incorrectly.

Comment by Lai Siyao [ 04/May/14 ]

Thanks Ryan, and as is pointed out in LU-4985, this has been fixed by LU-3498, which has been landed to master, and waiting to land on 2.5.

Comment by James A Simmons [ 27/Aug/14 ]

Has this work been delayed until 2.8?

Comment by James A Simmons [ 08/Sep/14 ]

Refreshed http://review.whamcloud.com/#/c/9663. Patch is ready for inspection.

Comment by James A Simmons [ 03/Nov/14 ]

Many patches have been landed. Only one patch left at http://review.whamcloud.com/#/c/9667. Please inspect.

Comment by Gerrit Updater [ 19/Nov/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/9667/
Subject: LU-3270 statahead: small fixes and cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 672ab0e00d6148330561d21ccff9331124914d35

Comment by James A Simmons [ 20/Nov/14 ]

Last patch has been merged. This ticket can be closed.

Comment by Jodi Levi (Inactive) [ 21/Nov/14 ]

Patches landed to Master.

Comment by Gerrit Updater [ 02/Dec/14 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/12901
Subject: LU-3270 statahead: refactor statahead code
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 2cccaaee80e60ca7cf1fcd701fe7705198099164

Generated at Sat Feb 10 01:32:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.