Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11961

nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      Looks like LU-8824 came back, this was reported in latest master testing:

      [ 4228.011703] Lustre: DEBUG MARKER: == sanity-sec test 26: test transferring very large nodemap ========================================== 06:23:44 (1549952624)
      [ 4271.913653] LustreError: 11659:0:(nodemap_handler.c:1663:nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed: 
      [ 4271.915235] LustreError: 11659:0:(nodemap_handler.c:1663:nodemap_config_set_active()) LBUG
      [ 4271.916117] Pid: 11659, comm: ll_cfg_requeue 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Wed Feb 6 06:53:01 UTC 2019
      [ 4271.917149] Call Trace:
      [ 4271.917451]  [<ffffffffc09417cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 4271.918283]  [<ffffffffc094187c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 4271.918961]  [<ffffffffc0e0a6f7>] nodemap_config_set_active+0x2a7/0x2e0 [ptlrpc]
      [ 4271.920041]  [<ffffffffc0e13498>] nodemap_config_set_active_mgc+0x38/0x1e0 [ptlrpc]
      [ 4271.920969]  [<ffffffffc0cf2e12>] mgc_process_recover_nodemap_log+0x352/0xe40 [mgc]
      [ 4271.921830]  [<ffffffffc0cf588e>] mgc_process_log+0x7be/0x8f0 [mgc]
      [ 4271.922527]  [<ffffffffc0cf7938>] mgc_requeue_thread+0x2c8/0x870 [mgc]
      [ 4271.923365]  [<ffffffffa22c1c31>] kthread+0xd1/0xe0
      [ 4271.923935]  [<ffffffffa2974c37>] ret_from_fork_nospec_end+0x0/0x39
      [ 4271.924670]  [<ffffffffffffffff>] 0xffffffffffffffff
      

      https://testing.whamcloud.com/test_sessions/a68fa72d-794e-45bb-ac5b-114917878f9c

      Attachments

        Issue Links

          Activity

            [LU-11961] nodemap_config_set_active()) ASSERTION( config->nmc_default_nodemap ) failed
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            I am not sure it is the same issue, but it would need to have a proper dump in order to investigate the origin of the NULL pointer with the crash utility.
            Does Maloo save the crash file somewhere?

            sebastien Sebastien Buisson added a comment - I am not sure it is the same issue, but it would need to have a proper dump in order to investigate the origin of the NULL pointer with the crash utility. Does Maloo save the crash file somewhere?

            I don't see the assertion, but is the crash at https://testing.whamcloud.com/test_sets/caac4a91-55b0-45ab-af99-356db88af611 the same or related to this ticket?

            On the OSS,

            [74388.707838] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
            [74388.709261] PGD 0 P4D 0 
            [74388.709696] Oops: 0000 [#1] SMP PTI
            [74388.710293] CPU: 0 PID: 7056 Comm: ll_cfg_requeue Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-240.1.1.el8_lustre.x86_64 #1
            [74388.712339] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [74388.713761] RIP: 0010:nodemap_config_set_active_mgc+0x92/0x210 [ptlrpc]
            [74388.714844] Code: 00 48 8d 58 f0 48 3d 40 ac 22 c1 0f 84 9f 00 00 00 49 c7 c7 a0 2f 25 c1 49 c7 c6 c8 b8 1d c1 49 c7 c5 30 17 18 c1 48 8b 73 08 <4c> 8b 66 08 4d 85 e4 74 3a 49 8b 54 24 08 f6 02 02 75 30 48 c7 c7
            [74388.717857] RSP: 0018:ffffa43f40b8fd10 EFLAGS: 00010287
            [74388.718712] RAX: ffff8e2db75694d0 RBX: ffff8e2db75694c0 RCX: 0000000000000000
            [74388.719871] RDX: ffff8e2d8ed10000 RSI: 0000000000000000 RDI: ffffffffc122ac20
            [74388.721023] RBP: ffff8e2dac4c1500 R08: ffff8e2dbda2f160 R09: ffff8e2d4fc02a00
            [74388.722184] R10: ffff8e2d88aa3000 R11: 0000000000000001 R12: ffff8e2d77ef4800
            [74388.723342] R13: ffffffffc1181730 R14: ffffffffc11db8c8 R15: ffffffffc1252fa0
            [74388.724496] FS:  0000000000000000(0000) GS:ffff8e2dbda00000(0000) knlGS:0000000000000000
            [74388.725804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [74388.726748] CR2: 0000000000000008 CR3: 0000000078c0a006 CR4: 00000000000606f0
            [74388.727905] Call Trace:
            [74388.728378]  mgc_process_recover_nodemap_log+0xe14/0xeb0 [mgc]
            [74388.729344]  mgc_process_log+0x7a7/0x840 [mgc]
            [74388.730088]  mgc_requeue_thread+0x29a/0x720 [mgc]
            [74388.730896]  ? finish_wait+0x80/0x80
            [74388.731494]  ? mgc_process_config+0xe10/0xe10 [mgc]
            [74388.732314]  kthread+0x112/0x130
            [74388.732869]  ? kthread_flush_work_fn+0x10/0x10
            [74388.733627]  ret_from_fork+0x35/0x40
            
            jamesanunez James Nunez (Inactive) added a comment - I don't see the assertion, but is the crash at https://testing.whamcloud.com/test_sets/caac4a91-55b0-45ab-af99-356db88af611 the same or related to this ticket? On the OSS, [74388.707838] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [74388.709261] PGD 0 P4D 0 [74388.709696] Oops: 0000 [#1] SMP PTI [74388.710293] CPU: 0 PID: 7056 Comm: ll_cfg_requeue Kdump: loaded Tainted: G OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [74388.712339] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [74388.713761] RIP: 0010:nodemap_config_set_active_mgc+0x92/0x210 [ptlrpc] [74388.714844] Code: 00 48 8d 58 f0 48 3d 40 ac 22 c1 0f 84 9f 00 00 00 49 c7 c7 a0 2f 25 c1 49 c7 c6 c8 b8 1d c1 49 c7 c5 30 17 18 c1 48 8b 73 08 <4c> 8b 66 08 4d 85 e4 74 3a 49 8b 54 24 08 f6 02 02 75 30 48 c7 c7 [74388.717857] RSP: 0018:ffffa43f40b8fd10 EFLAGS: 00010287 [74388.718712] RAX: ffff8e2db75694d0 RBX: ffff8e2db75694c0 RCX: 0000000000000000 [74388.719871] RDX: ffff8e2d8ed10000 RSI: 0000000000000000 RDI: ffffffffc122ac20 [74388.721023] RBP: ffff8e2dac4c1500 R08: ffff8e2dbda2f160 R09: ffff8e2d4fc02a00 [74388.722184] R10: ffff8e2d88aa3000 R11: 0000000000000001 R12: ffff8e2d77ef4800 [74388.723342] R13: ffffffffc1181730 R14: ffffffffc11db8c8 R15: ffffffffc1252fa0 [74388.724496] FS: 0000000000000000(0000) GS:ffff8e2dbda00000(0000) knlGS:0000000000000000 [74388.725804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [74388.726748] CR2: 0000000000000008 CR3: 0000000078c0a006 CR4: 00000000000606f0 [74388.727905] Call Trace: [74388.728378] mgc_process_recover_nodemap_log+0xe14/0xeb0 [mgc] [74388.729344] mgc_process_log+0x7a7/0x840 [mgc] [74388.730088] mgc_requeue_thread+0x29a/0x720 [mgc] [74388.730896] ? finish_wait+0x80/0x80 [74388.731494] ? mgc_process_config+0xe10/0xe10 [mgc] [74388.732314] kthread+0x112/0x130 [74388.732869] ? kthread_flush_work_fn+0x10/0x10 [74388.733627] ret_from_fork+0x35/0x40

            might be related?

            Lustre: DEBUG MARKER: == sanity test 60c: unlink file when mds full ======================================================== 22:23:16 (1586020996)
            BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
            PGD 27f329067 P4D 27f329067 PUD 231175067 PMD 0 
            Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
            CPU: 1 PID: 9139 Comm: ll_cfg_requeue Tainted: P        W  O   X --------- ---  4.18.0 #4
            Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
            RIP: 0010:nodemap_config_set_active_mgc+0x77/0x1d0 [ptlrpc]
            Code: e8 4e 5d 10 e1 48 8b 05 c7 4f 0b 00 48 8d 58 f0 48 3d 60 ea 69 a0 0f 84 92 00 00 00 49 bc 01 00 00 00 47 04 00 00 48 8b 73 08 <48> 8b 6e 08 48 85 ed 74 39 48 8b 45 08 f6 00 02 75 30 48 c7 c7 a0
            RSP: 0018:ffff880288eabd80 EFLAGS: 00010287
            RAX: ffff88027c4c0c50 RBX: ffff88027c4c0c40 RCX: 0000000000000001
            RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000282
            RBP: ffff880254872000 R08: 0000000000000000 R09: ffff880258d3e600
            R10: 0000000000000000 R11: 0000000000000000 R12: 0000044700000001
            R13: ffff88026b8b4588 R14: ffff8802a6122ec0 R15: ffff8802a6122e00
            FS:  0000000000000000(0000) GS:ffff8802b2200000(0000) knlGS:0000000000000000
            CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            CR2: 0000000000000008 CR3: 000000024dd81000 CR4: 00000000000006a0
            Call Trace:
             mgc_process_log+0x266c/0x26c0 [mgc]
             mgc_requeue_thread+0x2ad/0x720 [mgc]
             ? wait_woken+0x90/0x90
             kthread+0x105/0x140
             ? mgc_process_config+0x17c0/0x17c0 [mgc]
             ? kthread_flush_work_fn+0x10/0x10
             ret_from_fork+0x24/0x30
            
            bzzz Alex Zhuravlev added a comment - might be related? Lustre: DEBUG MARKER: == sanity test 60c: unlink file when mds full ======================================================== 22:23:16 (1586020996) BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 27f329067 P4D 27f329067 PUD 231175067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 9139 Comm: ll_cfg_requeue Tainted: P W O X --------- --- 4.18.0 #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:nodemap_config_set_active_mgc+0x77/0x1d0 [ptlrpc] Code: e8 4e 5d 10 e1 48 8b 05 c7 4f 0b 00 48 8d 58 f0 48 3d 60 ea 69 a0 0f 84 92 00 00 00 49 bc 01 00 00 00 47 04 00 00 48 8b 73 08 <48> 8b 6e 08 48 85 ed 74 39 48 8b 45 08 f6 00 02 75 30 48 c7 c7 a0 RSP: 0018:ffff880288eabd80 EFLAGS: 00010287 RAX: ffff88027c4c0c50 RBX: ffff88027c4c0c40 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000282 RBP: ffff880254872000 R08: 0000000000000000 R09: ffff880258d3e600 R10: 0000000000000000 R11: 0000000000000000 R12: 0000044700000001 R13: ffff88026b8b4588 R14: ffff8802a6122ec0 R15: ffff8802a6122e00 FS: 0000000000000000(0000) GS:ffff8802b2200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000024dd81000 CR4: 00000000000006a0 Call Trace: mgc_process_log+0x266c/0x26c0 [mgc] mgc_requeue_thread+0x2ad/0x720 [mgc] ? wait_woken+0x90/0x90 kthread+0x105/0x140 ? mgc_process_config+0x17c0/0x17c0 [mgc] ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x24/0x30

            The patch that landed might not be enough to fix the issue reported here. But it has been a while since I have seen this assertion, so I am not sure if it is still occurring on master.

            sebastien Sebastien Buisson added a comment - The patch that landed might not be enough to fix the issue reported here. But it has been a while since I have seen this assertion, so I am not sure if it is still occurring on master.
            pjones Peter Jones added a comment -

            Is work still ongoing under this ticket?

            pjones Peter Jones added a comment - Is work still ongoing under this ticket?

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34245/
            Subject: LU-11961 nodemap: nodemap_create() handles default nodemap
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ebea7c87f758d843cf71fb5873b2955f6fbf9044

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34245/ Subject: LU-11961 nodemap: nodemap_create() handles default nodemap Project: fs/lustre-release Branch: master Current Patch Set: Commit: ebea7c87f758d843cf71fb5873b2955f6fbf9044

            I don't think it's related but shouldn't the LASSERT(active_config != config) in nodemap_config_set_active come after the lock?

            jfilizetti Jeremy Filizetti added a comment - I don't think it's related but shouldn't the LASSERT(active_config != config) in nodemap_config_set_active come after the lock?

            Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/34245
            Subject: LU-11961 nodemap: nodemap_create() handles default nodemap
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6d8057712ecac178f7edfebcd838faa8e44b86d4

            gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/34245 Subject: LU-11961 nodemap: nodemap_create() handles default nodemap Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6d8057712ecac178f7edfebcd838faa8e44b86d4

            I have noticed that nmc_default_nodemap is modified outside of nodemap_create(). Not sure that it is causing the problem reported in this ticket, but it should be fixed anyway.

            sebastien Sebastien Buisson added a comment - I have noticed that nmc_default_nodemap is modified outside of nodemap_create(). Not sure that it is causing the problem reported in this ticket, but it should be fixed anyway.

            People

              sebastien Sebastien Buisson
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: