Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15000

MDS crashes with (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed

Details

    • 3
    • 9223372036854775807

    Description

      I previously reported some issues when adding OSTs in LU-14695, and I also guess we got hit by LU-13356 and the MGS was not unable to process new OSTs. So we recently upgraded all Oak servers to 2.12.7 + single patch from LU-13356. I'm opening this ticket in this new context where the MGS seems to work better so far. But now the issue is more on the MDS side apparently.

      We have 300+ OSTs. After adding new OSTs, the first ones succeed, but after 3 or 4 new OSTs, all MDTs crash with the following backtrace (you can see the new OST being initialized on an OSS and then a MDS crash:

      Sep 09 07:59:33 oak-io6-s2 kernel: md/raid:md45: raid level 6 active with 10 out of 10 devices, algorithm 2
      Sep 09 07:59:33 oak-io6-s2 kernel: md45: detected capacity change from 0 to 112003075014656
      Sep 09 07:59:33 oak-io6-s2 kernel: LDISKFS-fs (md45): file extents enabled, maximum tree depth=5
      Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): file extents enabled, maximum tree depth=5
      Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
      Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: oak-OST013d: new disk, initializing
      Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: srv-oak-OST013d: No data found on store. Initialize space
      Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: oak-OST013d: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      Sep 09 07:59:43 oak-md1-s1 kernel: LustreError: 24251:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: 
      Sep 09 07:59:43 oak-md1-s1 kernel: LustreError: 24251:0:(osp_dev.c:1404:osp_obd_connect()) LBUG
      Sep 09 07:59:43 oak-md1-s1 kernel: Pid: 24251, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      Sep 09 07:59:43 oak-md1-s1 kernel: Call Trace:
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0cac7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0cac87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc1a2beb6>] osp_obd_connect+0x3c6/0x400 [osp]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc194b04e>] lod_add_device+0xa8e/0x19a0 [lod]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc1946895>] lod_process_config+0x13b5/0x1510 [lod]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0ef18b2>] class_process_config+0x2142/0x2830 [obdclass]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0ef3b79>] class_config_llog_handler+0x819/0x1520 [obdclass]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0eb65af>] llog_process_thread+0x85f/0x1a10 [obdclass]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffc0eb8174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffa62c5c21>] kthread+0xd1/0xe0
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffa6994ddd>] ret_from_fork_nospec_begin+0x7/0x21
      Sep 09 07:59:43 oak-md1-s1 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff
      

      This is another example when we can see llog errors, for OSTs that actually are added (they are properly registered on the MGS), before the crash of the MDS:

      Sep  9 08:21:17 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client 59a1cc5a-bdc6-102a-1ccb-b551d993bacc (at 10.51.3.32@o2ib3) reconnecting
      Sep  9 08:21:18 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client ea26edda-b4c7-1625-cd1a-c65d4487d437 (at 10.51.3.68@o2ib3) reconnecting
      Sep  9 08:21:18 oak-md1-s1 kernel: Lustre: Skipped 5 previous similar messages
      Sep  9 08:21:20 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client 38d27508-3ea0-383f-8501-005bb0842ebd (at 10.51.4.12@o2ib3) reconnecting
      Sep  9 08:21:20 oak-md1-s1 kernel: Lustre: Skipped 2 previous similar messages
      Sep  9 08:21:35 oak-md1-s1 kernel: Lustre: 7555:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0)
      Sep  9 08:21:35 oak-md1-s1 kernel: LustreError: 7555:0:(genops.c:556:class_register_device()) oak-OST0144-osc-MDT0001: already exists, won't add
      Sep  9 08:21:35 oak-md1-s1 kernel: LustreError: 7555:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
      Sep  9 08:21:35 oak-md1-s1 kernel: Lustre:    cmd=cf001 0:oak-OST0144-osc-MDT0001  1:osp  2:oak-MDT0001-mdtlov_UUID  
      Sep  9 08:21:35 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
      Sep  9 08:21:35 oak-md1-s1 kernel: Lustre:    cmd=cf001 0:oak-OST0144-osc-MDT0002  1:osp  2:oak-MDT0002-mdtlov_UUID  
      Sep  9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:461:class_setup()) Device 635 already setup (type osp)
      Sep  9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
      Sep  9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:1835:class_config_llog_handler()) Skipped 1 previous similar message
      Sep  9 08:21:44 oak-md1-s1 kernel: Lustre:    cmd=cf003 0:oak-OST0144-osc-MDT0001  1:oak-OST0144_UUID  2:10.0.2.103@o2ib5  
      Sep  9 08:21:44 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
      Sep  9 08:21:44 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) Skipped 1 previous similar message
      Sep  9 08:21:48 oak-md1-s1 kernel: LustreError: 7571:0:(obd_config.c:461:class_setup()) Device 316 already setup (type osp)
      Sep  9 08:21:48 oak-md1-s1 kernel: LustreError: 7571:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
      Sep  9 08:21:48 oak-md1-s1 kernel: Lustre:    cmd=cf003 0:oak-OST0144-osc-MDT0002  1:oak-OST0144_UUID  2:10.0.2.103@o2ib5  
      Sep  9 08:21:48 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
      Sep  9 08:21:55 oak-md1-s1 kernel: LustreError: 7574:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: 
      Sep  9 08:21:55 oak-md1-s1 kernel: LustreError: 7574:0:(osp_dev.c:1404:osp_obd_connect()) LBUG
      Sep  9 08:21:55 oak-md1-s1 kernel: Pid: 7574, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      Sep  9 08:21:56 oak-md1-s1 kernel: Call Trace:
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0b057cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0b0587c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc18baeb6>] osp_obd_connect+0x3c6/0x400 [osp]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc17da04e>] lod_add_device+0xa8e/0x19a0 [lod]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc17d5895>] lod_process_config+0x13b5/0x1510 [lod]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d4f8b2>] class_process_config+0x2142/0x2830 [obdclass]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d51b79>] class_config_llog_handler+0x819/0x1520 [obdclass]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d145af>] llog_process_thread+0x85f/0x1a10 [obdclass]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d16174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffff9dcc5c21>] kthread+0xd1/0xe0
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffff9e394ddd>] ret_from_fork_nospec_begin+0x7/0x21
      Sep  9 08:21:56 oak-md1-s1 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
      

      In the trace below, we can see that the MDS crashes are happening at the same time:

      oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md/raid:md53: device dm-494 operational as raid disk 1
      oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md/raid:md53: raid level 6 active with 10 out of 10 devices, algorithm 2
      oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md53: detected capacity change from 0 to 112003075014656
      oak-io6-s2: Sep 09 08:38:00 oak-io6-s2 kernel: LDISKFS-fs (md53): file extents enabled, maximum tree depth=5
      oak-io6-s2: Sep 09 08:38:00 oak-io6-s2 kernel: LDISKFS-fs (md53): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      oak-io6-s2: Sep 09 08:38:01 oak-io6-s2 kernel: LDISKFS-fs (md53): file extents enabled, maximum tree depth=5
      oak-io6-s2: Sep 09 08:38:01 oak-io6-s2 kernel: LDISKFS-fs (md53): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
      oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: oak-OST0145: new disk, initializing
      oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: srv-oak-OST0145: No data found on store. Initialize space
      oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: oak-OST0145: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: LustreError: 7744:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: 
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: LustreError: 7744:0:(osp_dev.c:1404:osp_obd_connect()) LBUG
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: Pid: 7744, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: LustreError: 5232:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: 
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: LustreError: 5232:0:(osp_dev.c:1404:osp_obd_connect()) LBUG
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: Pid: 5232, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: Call Trace:
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc097b7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: Call Trace:
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc097b87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc06667cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc066687c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc172deb6>] osp_obd_connect+0x3c6/0x400 [osp]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc1311eb6>] osp_obd_connect+0x3c6/0x400 [osp]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc1bd404e>] lod_add_device+0xa8e/0x19a0 [lod]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc164d04e>] lod_add_device+0xa8e/0x19a0 [lod]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc1648895>] lod_process_config+0x13b5/0x1510 [lod]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc1bcf895>] lod_process_config+0x13b5/0x1510 [lod]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc13df8b2>] class_process_config+0x2142/0x2830 [obdclass]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc0c3a8b2>] class_process_config+0x2142/0x2830 [obdclass]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc0c3cb79>] class_config_llog_handler+0x819/0x1520 [obdclass]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc13e1b79>] class_config_llog_handler+0x819/0x1520 [obdclass]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc13a45af>] llog_process_thread+0x85f/0x1a10 [obdclass]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc0bff5af>] llog_process_thread+0x85f/0x1a10 [obdclass]
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffc0c01174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffc13a6174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffff9bec5c21>] kthread+0xd1/0xe0
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffbd8c5c21>] kthread+0xd1/0xe0
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffbdf94ddd>] ret_from_fork_nospec_begin+0x7/0x21
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffff9c594ddd>] ret_from_fork_nospec_begin+0x7/0x21
      oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff
      oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: LustreError: 5224:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: 
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: LustreError: 5224:0:(osp_dev.c:1404:osp_obd_connect()) LBUG
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: Pid: 5224, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: Call Trace:
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc09f07cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc09f087c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc1b38eb6>] osp_obd_connect+0x3c6/0x400 [osp]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc1a5804e>] lod_add_device+0xa8e/0x19a0 [lod]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc1a53895>] lod_process_config+0x13b5/0x1510 [lod]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc13228b2>] class_process_config+0x2142/0x2830 [obdclass]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc1324b79>] class_config_llog_handler+0x819/0x1520 [obdclass]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc12e75af>] llog_process_thread+0x85f/0x1a10 [obdclass]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffc12e9174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffae4c5c21>] kthread+0xd1/0xe0
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffaeb94ddd>] ret_from_fork_nospec_begin+0x7/0x21
      oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff
      

      After rebooting the crashed MDS, the last added OSTs are working fine (after MDS recovery, they allocate their "super-sequence" and are then operational. But every time I add more OSTs to the filesystem, all MDS crash with this backtrace. This looks similar to LU-9699.

      I'm attaching the output of

      lctl --device MGS llog_print <FILE>
      

      as <FILE>.llog.txt for FILE = oak-client and oak-MDT0000 to oak-MDT0005
      in case anyone can see if there is a problem somewhere.

      Attachments

        1. llogdiff
          297 kB
        2. oak-client.llog.txt
          187 kB
        3. oak-MDT0000.after-ost331
          426 kB
        4. oak-MDT0000.backup-20210923
          425 kB
        5. oak-MDT0000.llog.txt
          201 kB
        6. oak-MDT0001.llog.txt
          199 kB
        7. oak-MDT0002.llog.txt
          199 kB
        8. oak-MDT0003.llog.txt
          198 kB
        9. oak-MDT0004.llog.txt
          198 kB
        10. oak-MDT0005.llog.txt
          198 kB

        Issue Links

          Activity

            [LU-15000] MDS crashes with (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed

            @Peter Jones could you please consider this patch for inclusion in LTS releases (at least for 2.15)?

            sthiell Stephane Thiell added a comment - @Peter Jones could you please consider this patch for inclusion in LTS releases (at least for 2.15)?

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47515
            Subject: LU-15000 llog: read canceled records in llog_backup
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 57b085cb1068c1a7a7a35376ccf1db10f489c6a9

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47515 Subject: LU-15000 llog: read canceled records in llog_backup Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 57b085cb1068c1a7a7a35376ccf1db10f489c6a9
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46552/
            Subject: LU-15000 llog: read canceled records in llog_backup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d8e2723b4e9409954846939026c599b0b1170e6e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46552/ Subject: LU-15000 llog: read canceled records in llog_backup Project: fs/lustre-release Branch: master Current Patch Set: Commit: d8e2723b4e9409954846939026c599b0b1170e6e
            eaujames Etienne Aujames added a comment - - edited

            Hi Stephane,
            Great for you.
            I will add some reviewers to land the patch.

            eaujames Etienne Aujames added a comment - - edited Hi Stephane, Great for you. I will add some reviewers to land the patch.

            Hi Etienne,
            Today, we applied your patch on MGS/MDS (on top of 2.12.7) and we were then able to add 16 OSTs without issue. So your patch worked for us. This will also allow us to consider using llog_cancel/del_ost in the future again.
            Thank you!

            sthiell Stephane Thiell added a comment - Hi Etienne, Today, we applied your patch on MGS/MDS (on top of 2.12.7) and we were then able to add 16 OSTs without issue. So your patch worked for us. This will also allow us to consider using llog_cancel/del_ost in the future again. Thank you!

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46552
            Subject: LU-15000 llog: read canceled records in llog_backup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0c5471748b18d40893309aad3943b8aed5e91f1a

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46552 Subject: LU-15000 llog: read canceled records in llog_backup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0c5471748b18d40893309aad3943b8aed5e91f1a

            @Stephane Thiell the issue with correcting the MGS configs is that the target copies have last indexes lower that the original ones. Clients/targets are notified only for "new" records in mounted state, so they will ignore indexes between original and the copy.

            One solution could be to artificially generate rec after correcting MGS (with "lctl config_param <param>" and "lctl config_param -d <param>") until the correct configs index reach the old configs "last indexes". Then clients/targets should take into account new records (e.g adding new OSTs).

            eaujames Etienne Aujames added a comment - @Stephane Thiell the issue with correcting the MGS configs is that the target copies have last indexes lower that the original ones. Clients/targets are notified only for "new" records in mounted state, so they will ignore indexes between original and the copy. One solution could be to artificially generate rec after correcting MGS (with "lctl config_param <param>" and "lctl config_param -d <param>") until the correct configs index reach the old configs "last indexes". Then clients/targets should take into account new records (e.g adding new OSTs).
            asmadeus Dominique Martinet added a comment - - edited

            Hi again, sorry for the spam Talking with Stéphane gave me another idea: if we can't copy the offsets reliably, we can just get rid of them.

            In mgc_llog_local_copy the MDS will make a local copy first with llog_backup(env, obd, lctxt, lctxt, logname, temp_log); (note they use lctxt twice which is LLOG_CONFIG_ORIG_CTXT).

            This process is identical and will get rid of any cancelled changelog, so if we just do the same on the MGS for any log that has holes when it starts we won't need to worry about these anymore.

            It's not perfect either though: we still have the problem while the MGS is running immediately after cancel events, so I guess we need a better time to do that.. asynchronously a while after any cancel event? hmm... Well, just a wild idea if we can't get copy right. I still intend on looking further this weekend.

            asmadeus Dominique Martinet added a comment - - edited Hi again, sorry for the spam Talking with Stéphane gave me another idea: if we can't copy the offsets reliably, we can just get rid of them. In mgc_llog_local_copy the MDS will make a local copy first with llog_backup(env, obd, lctxt, lctxt, logname, temp_log); (note they use lctxt twice which is LLOG_CONFIG_ORIG_CTXT ). This process is identical and will get rid of any cancelled changelog, so if we just do the same on the MGS for any log that has holes when it starts we won't need to worry about these anymore. It's not perfect either though: we still have the problem while the MGS is running immediately after cancel events, so I guess we need a better time to do that.. asynchronously a while after any cancel event? hmm... Well, just a wild idea if we can't get copy right. I still intend on looking further this weekend.

            Hi Etienne, thanks for the patch!

            That's pretty much what I had in mind, yes. This came back haunting me a bit while I sleep and yes it's not perfect because of padding: we can't control when each side will add their padding (because length differences), so the last index written might be 1 too big and we might have the opposite problem now (last index bigger means we'll skip one record we should have processed)

            So I've been thinking it's safer to keep the holes, and just append dummy small logs at the end until we get the right last index to fake it up. It's safer because the dummy logs being small would reduce the need to add padding, but unfortunately it's still not perfect because if it does require padding at the last dummy log appended we'll overrun the last index again

            I can't think of any safe copy mechanism that preserves indices right now as we don't know the side of skipped records northe offset within the changelog of the entry immediately after a hole, and unless we can reproduce that we won't be able to match the automatic padding. I plan to look further this way this weekend.

             

            Stéphane, this is not a problem: clients don't have a copy of the MDT changelogs, so we can just restart the MDT and they'll get a new copy which will fix all our problems. If we want to play safe we can also backport the tunefs --nolocallogs patch, but even without it the servers will try to get a new copy and throw away the old one if copy worked, so it should work as far as I understand. That won't impact clients more than a normal failover, so if we can get this right this is a good solution.

            asmadeus Dominique Martinet added a comment - Hi Etienne, thanks for the patch! That's pretty much what I had in mind, yes. This came back haunting me a bit while I sleep and yes it's not perfect because of padding: we can't control when each side will add their padding (because length differences), so the last index written might be 1 too big and we might have the opposite problem now (last index bigger means we'll skip one record we should have processed) So I've been thinking it's safer to keep the holes, and just append dummy small logs at the end until we get the right last index to fake it up. It's safer because the dummy logs being small would reduce the need to add padding, but unfortunately it's still not perfect because if it does require padding at the last dummy log appended we'll overrun the last index again I can't think of any safe copy mechanism that preserves indices right now as we don't know the side of skipped records northe offset within the changelog of the entry immediately after a hole, and unless we can reproduce that we won't be able to match the automatic padding. I plan to look further this way this weekend.   Stéphane, this is not a problem: clients don't have a copy of the MDT changelogs, so we can just restart the MDT and they'll get a new copy which will fix all our problems. If we want to play safe we can also backport the tunefs --nolocallogs patch, but even without it the servers will try to get a new copy and throw away the old one if copy worked, so it should work as far as I understand. That won't impact clients more than a normal failover, so if we can get this right this is a good solution.
            sthiell Stephane Thiell added a comment - - edited

            Hi Etienne,
            Thank you!
            Re: your last comment, we were thinking of doing that (do a copy of each MDT's local backup to the MGS), which could fix the immediate issue on the MDTs when adding new OSTs, but wouldn't that mess up mounted clients in that case, especially after we add more OSTs? Are clients' config based on indexes? Our main goal is to avoid re-mounting all clients but still be able to add new OSTs, which is tricky due to this index problem.

            sthiell Stephane Thiell added a comment - - edited Hi Etienne, Thank you! Re: your last comment, we were thinking of doing that (do a copy of each MDT's local backup to the MGS), which could fix the immediate issue on the MDTs when adding new OSTs, but wouldn't that mess up mounted clients in that case, especially after we add more OSTs? Are clients' config based on indexes? Our main goal is to avoid re-mounting all clients but still be able to add new OSTs, which is tricky due to this index problem.

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: