[LU-11969] sanity-lfsck test_36c: (5) MDS3 is not the expected 'completed' Created: 14/Feb/19  Updated: 10/Apr/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/52b653d4-3018-11e9-bd83-52540065bddc

test_36c failed with the following error on the MDS1 log:

lfsck           S ffff8a82a228d140     0  8776      2 0x00000080
Call Trace:
schedule+0x29/0x70
lfsck_double_scan_generic+0x22e/0x2c0 [lfsck]
lfsck_layout_master_double_scan+0x30/0x1e0 [lfsck]
lfsck_double_scan+0x5f/0x210 [lfsck]
lfsck_master_engine+0x4c6/0x1370 [lfsck]
kthread+0xd1/0xe0

umount          S ffff8a829b391040     0 13270  13269 0x00000080
Call Trace:
schedule+0x29/0x70
lfsck_stop+0x485/0x5c0 [lfsck]
mdd_iocontrol+0x327/0xb40 [mdd]
mdt_device_fini+0x75/0x930 [mdt]
class_cleanup+0x862/0xbd0 [obdclass]
class_process_config+0x65c/0x2830 [obdclass]
class_manual_cleanup+0x1c6/0x710 [obdclass]
server_put_super+0x8de/0xcd0 [obdclass]
generic_shutdown_super+0x6d/0x100
kill_anon_super+0x12/0x20
lustre_kill_super+0x32/0x50 [obdclass]
deactivate_locked_super+0x4e/0x70
deactivate_super+0x46/0x60
cleanup_mnt+0x3f/0x80
__cleanup_mnt+0x12/0x20
task_work_run+0xbb/0xe0
do_notify_resume+0xa5/0xc0

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lfsck test_36c - (5) MDS3 is not the expected 'completed'



 Comments   
Comment by Patrick Farrell (Inactive) [ 14/Feb/19 ]

Master:

https://testing.whamcloud.com/test_sessions/b9c7607a-51ca-4bc2-89bb-accfb41935ef

Comment by Patrick Farrell (Inactive) [ 14/Feb/19 ]

For whenever an lfsck-familiar person takes a crack at this, here are a few notes.

This looks to be a race condition, but I'm not sure at all how to fix it.

Here's the log snippets:

MDT0002 master is 8467

00100000:10000000:0.0:1550117810.581394:0:8467:0:(lfsck_lib.c:2584:lfsck_post_generic()) lustre-MDT0002-osd: waiting for assistant to do lfsck_layout post, rc = 1
00100000:10000000:0.0:1550117810.581416:0:8467:0:(lfsck_lib.c:2614:lfsck_double_scan_generic()) lustre-MDT0002-osd: waiting for assistant to do lfsck_layout double_scan, status 200100000:10000000:0.0:1550117810.581420:0:8469:0:(lfsck_engine.c:1665:lfsck_assistant_engine()) lustre-MDT0002-osd: lfsck_layout LFSCK assistant thread post

8469 is the child thread...
00100000:10000000:0.0:1550117810.584905:0:8469:0:(lfsck_engine.c:1684:lfsck_assistant_engine()) lustre-MDT0002-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0
00100000:10000000:0.0:1550117810.584914:0:8469:0:(lfsck_engine.c:1702:lfsck_assistant_engine()) lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning

8467 again:
00100000:10000000:0.0:1550117810.585025:0:8467:0:(lfsck_lib.c:2624:lfsck_double_scan_generic()) lustre-MDT0002-osd: the assistant has done lfsck_layout double_scan, status 0
And this completes normally.

MDT0000, 8464 is master:
00100000:10000000:0.0:1550117810.584824:0:8464:0:(lfsck_lib.c:2584:lfsck_post_generic()) lustre-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 1
00100000:10000000:0.0:1550117810.584831:0:8464:0:(lfsck_lib.c:2596:lfsck_post_generic()) lustre-MDT0000-osd: the assistant has done lfsck_layout post, rc = 1
00100000:10000000:0.0:1550117810.584841:0:8464:0:(lfsck_layout.c:5881:lfsck_layout_master_post()) lustre-MDT0000-osd: layout LFSCK master post done: rc = 0
Assistant is 8466:
00100000:10000000:1.0:1550117810.584842:0:8466:0:(lfsck_engine.c:1665:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_layout LFSCK assistant thread post00100000:10000000:0.0:1550117810.584843:0:8464:0:(lfsck_lib.c:2614:lfsck_double_scan_generic()) lustre-MDT0000-osd: waiting for assistant to do lfsck_layout double_scan, status 2

This thread (8464, the master thread here) doesn't wake up again.

Notice, we've got "assistant thread post" *before* lfsck_layout double_scan waiting...

Various things happen, then eventually:
00100000:10000000:1.0:1550117810.585975:0:8466:0:(lfsck_engine.c:1684:lfsck_assistant_engine()) lustre-MDT0000-osd: LFSCK assistant notified others for lfsck_layout post: rc = 0

The assistant thread never logs this message:
00100000:10000000:0.0:1550117810.584914:0:8469:0:(lfsck_engine.c:1702:lfsck_assistant_engine()) lustre-MDT0002-osd: LFSCK assistant sync before the second-stage scaning

And the master thread :8464: never wakes up. 

The condition for doing "second-stage scanning" (in the assistant) is:

                if (lad->lad_to_double_scan) {
                        lad->lad_to_double_scan = 0;
                        atomic_inc(&lfsck->li_double_scan_count); 

Which is set in the master thread:

lfsck_double_scan_generic:
        if (status != LS_SCANNING_PHASE2)
                lad->lad_exit = 1;
        else
                lad->lad_to_double_scan = 1;        

        CDEBUG(D_LFSCK, "%s: waiting for assistant to do %s double_scan, "
               "status %d\n",
               lfsck_lfsck2name(com->lc_lfsck), lad->lad_name, status);

        wake_up_all(&athread->t_ctl_waitq);
        l_wait_event(mthread->t_ctl_waitq,
                     lad->lad_in_double_scan ||
                     thread_is_stopped(athread),
                     &lwi); 

 

Basically, it looks like on MDT0000 the assistant thread completed the first phase scanning before the master thread had set this variable and gone to sleep.  And the assistant thread never enters the second scan path that would cause it to wake up the master thread, because it's already passed that point.

Comment by Bruno Faccini (Inactive) [ 10/Apr/19 ]

I think I have triggered the same issue during autotest session at https://testing.whamcloud.com/test_sessions/3955c094-f818-44fb-877f-de6d0d9fd73a . This time it has happen during sanity-lfsck/test_20a .

I haven't gone thru the debug log to check it is the same scenario Patrick has already detailed before, but the following stacks found dumped afterward n the MDS Console, clearly indicate a very similar situation :

[13917.818020] lfsck           S ffff988690c00000     0  4111      2 0x00000080
[13917.819302] Call Trace:
[13917.819740]  [<ffffffffa9b67c49>] schedule+0x29/0x70
[13917.820677]  [<ffffffffc121766e>] lfsck_double_scan_generic+0x22e/0x2c0 [lfsck]
[13917.822075]  [<ffffffffa94d67b0>] ? wake_up_state+0x20/0x20
[13917.823103]  [<ffffffffc123ec90>] lfsck_layout_master_double_scan+0x30/0x1e0 [lfsck]
[13917.824458]  [<ffffffffc1217f0f>] lfsck_double_scan+0x5f/0x210 [lfsck]
[13917.825637]  [<ffffffffc0cb0631>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
[13917.826884]  [<ffffffffc121cee6>] lfsck_master_engine+0x4c6/0x1370 [lfsck]
[13917.828107]  [<ffffffffa94d67b0>] ? wake_up_state+0x20/0x20
[13917.829110]  [<ffffffffc121ca20>] ? lfsck_master_oit_engine+0x1510/0x1510 [lfsck]
[13917.830420]  [<ffffffffa94c1c31>] kthread+0xd1/0xe0
[13917.831359]  [<ffffffffa94c1b60>] ? insert_kthread_work+0x40/0x40
[13917.832429]  [<ffffffffa9b74c37>] ret_from_fork_nospec_begin+0x21/0x21
[13917.833567]  [<ffffffffa94c1b60>] ? insert_kthread_work+0x40/0x40
[13917.834656] lfsck_layout    S ffff988690c030c0     0  4117      2 0x00000080
[13917.835969] Call Trace:
[13917.836415]  [<ffffffffc0b93f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[13917.837620]  [<ffffffffa9b67c49>] schedule+0x29/0x70
[13917.838516]  [<ffffffffc121eedd>] lfsck_assistant_engine+0x114d/0x2090 [lfsck]
[13917.839784]  [<ffffffffa94e0eee>] ? dequeue_task_fair+0x41e/0x660
[13917.840872]  [<ffffffffa942a59e>] ? __switch_to+0xce/0x580
[13917.841898]  [<ffffffffa94d67b0>] ? wake_up_state+0x20/0x20
[13917.842895]  [<ffffffffc121dd90>] ? lfsck_master_engine+0x1370/0x1370 [lfsck]
[13917.844140]  [<ffffffffa94c1c31>] kthread+0xd1/0xe0
[13917.845022]  [<ffffffffa94c1b60>] ? insert_kthread_work+0x40/0x40
[13917.846099]  [<ffffffffa9b74c37>] ret_from_fork_nospec_begin+0x21/0x21
[13917.847282]  [<ffffffffa94c1b60>] ? insert_kthread_work+0x40/0x40

..........................

[13917.925158] umount          S ffff9886a3696180     0 26668  26667 0x00000080
[13917.926436] Call Trace:
[13917.926880]  [<ffffffffa9b67c49>] schedule+0x29/0x70
[13917.927743]  [<ffffffffc1213925>] lfsck_stop+0x485/0x5c0 [lfsck]
[13917.928894]  [<ffffffffa94d67b0>] ? wake_up_state+0x20/0x20
[13917.929952]  [<ffffffffc146b7c7>] mdd_iocontrol+0x327/0xb40 [mdd]
[13917.931049]  [<ffffffffc0cb0509>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[13917.932411]  [<ffffffffc12e2d15>] mdt_device_fini+0x75/0x930 [mdt]
[13917.933540]  [<ffffffffc0ccfc73>] ? lu_context_init+0xd3/0x1f0 [obdclass]
[13917.934807]  [<ffffffffc0cbd542>] class_cleanup+0x862/0xbd0 [obdclass]
[13917.935987]  [<ffffffffc0cbe53c>] class_process_config+0x65c/0x2830 [obdclass]
[13917.937251]  [<ffffffffc0b93f17>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[13917.938544]  [<ffffffffc0cc08d6>] class_manual_cleanup+0x1c6/0x720 [obdclass]
[13917.939862]  [<ffffffffc0cf0c8e>] server_put_super+0x8de/0xcd0 [obdclass]
[13917.941058]  [<ffffffffa9643dbd>] generic_shutdown_super+0x6d/0x100
[13917.942172]  [<ffffffffa96441b2>] kill_anon_super+0x12/0x20
[13917.943187]  [<ffffffffc0cc34f2>] lustre_kill_super+0x32/0x50 [obdclass]
[13917.944362]  [<ffffffffa964456e>] deactivate_locked_super+0x4e/0x70
[13917.945473]  [<ffffffffa9644cf6>] deactivate_super+0x46/0x60
[13917.946538]  [<ffffffffa966327f>] cleanup_mnt+0x3f/0x80
[13917.947481]  [<ffffffffa9663312>] __cleanup_mnt+0x12/0x20
[13917.948446]  [<ffffffffa94be79b>] task_work_run+0xbb/0xe0
[13917.949416]  [<ffffffffa942bc65>] do_notify_resume+0xa5/0xc0
[13917.950440]  [<ffffffffa9b75124>] int_signal+0x12/0x17
Generated at Sat Feb 10 02:48:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.