Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11761

blocked MDT mount and high cpu usage from lodXXXX_recYYYY threads

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.12.0
    • None
    • CentOS 7.6 3.10.0-957.1.3.el7_lustre.x86_64 Lustre 2.12.0 RC2
    • 3
    • 9223372036854775807

    Description

      Another issue when using 2.12.0 RC2 during testing... MDTs mount seems to never complete and the following threads take 100% cpu:

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
       20953 root 20 0 0 0 0 R 100.0 0.0 27:00.33 lod0002_rec0001
       20954 root 20 0 0 0 0 R 100.0 0.0 27:00.34 lod0002_rec0003
      

      This is on fir-md1-s1 that handles MDT0 and MDT2 on this test system.

      sysrq t shows:

      Dec 11 09:50:13 fir-md1-s1 kernel: lod0002_rec0001 R  running task        0 20953      2 0x00000080
      Dec 11 09:50:13 fir-md1-s1 kernel: Call Trace:
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3cc5d>] ? keys_fini+0x2d/0x1d0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3ce2b>] lu_context_fini+0x2b/0xa0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3d0da>] lu_env_init+0x1a/0x30 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0f19b68>] ptlrpc_set_wait+0x7d8/0x8d0 [ptlrpc]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d515e5>] ? lustre_get_jobid+0x185/0x2e0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d09f3c>] ? obd_get_request_slot+0x3c/0x280 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0f19ce3>] ptlrpc_queue_wait+0x83/0x230 [ptlrpc]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc088e334>] fld_client_rpc+0x104/0x540 [fld]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0892f5f>] fld_server_lookup+0x15f/0x320 [fld]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc1684587>] lod_fld_lookup+0x327/0x510 [lod]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc16997dd>] lod_object_init+0x7d/0x3c0 [lod]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3dfd5>] lu_object_alloc+0xe5/0x320 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3e2e6>] lu_object_find_at+0x76/0x280 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3f78d>] dt_locate_at+0x1d/0xb0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d02b4c>] llog_osd_open+0xfc/0xf30 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0d3e789>] ? lu_object_put+0x279/0x3d0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc0ceff20>] llog_open+0x140/0x3d0 [obdclass]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc16bdeed>] lod_sub_prep_llog+0x14d/0x783 [lod]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc16837ab>] lod_sub_recovery_thread+0x1cb/0xc80 [lod]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffc16835e0>] ? lod_obd_get_info+0x9d0/0x9d0 [lod]
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffa7ac1c31>] kthread+0xd1/0xe0
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffa7ac1b60>] ? insert_kthread_work+0x40/0x40
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffa8174c24>] ret_from_fork_nospec_begin+0xe/0x21
      Dec 11 09:50:13 fir-md1-s1 kernel:  [<ffffffffa7ac1b60>] ? insert_kthread_work+0x40/0x40
      

      and

      Dec 11 09:44:24 fir-md1-s1 kernel: lod0002_rec0003 R  running task        0 20954      2 0x00000080
      Dec 11 09:44:24 fir-md1-s1 kernel: Call Trace:
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3cfa3>] ? lu_context_init+0xd3/0x1f0 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3ceba>] ? lu_env_fini+0x1a/0x30 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0f19b68>] ? ptlrpc_set_wait+0x7d8/0x8d0 [ptlrpc]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d515e5>] ? lustre_get_jobid+0x185/0x2e0 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d09f3c>] ? obd_get_request_slot+0x3c/0x280 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0f19ce3>] ? ptlrpc_queue_wait+0x83/0x230 [ptlrpc]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc088e421>] ? fld_client_rpc+0x1f1/0x540 [fld]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0892f5f>] ? fld_server_lookup+0x15f/0x320 [fld]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc1684587>] ? lod_fld_lookup+0x327/0x510 [lod]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc16997dd>] ? lod_object_init+0x7d/0x3c0 [lod]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3dfd5>] ? lu_object_alloc+0xe5/0x320 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3e2e6>] ? lu_object_find_at+0x76/0x280 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3f78d>] ? dt_locate_at+0x1d/0xb0 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d02b4c>] ? llog_osd_open+0xfc/0xf30 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0d3e789>] ? lu_object_put+0x279/0x3d0 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc0ceff20>] ? llog_open+0x140/0x3d0 [obdclass]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc16bdeed>] ? lod_sub_prep_llog+0x14d/0x783 [lod]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc16837ab>] ? lod_sub_recovery_thread+0x1cb/0xc80 [lod]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffc16835e0>] ? lod_obd_get_info+0x9d0/0x9d0 [lod]
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffa7ac1c31>] ? kthread+0xd1/0xe0
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffa7ac1b60>] ? insert_kthread_work+0x40/0x40
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffa8174c24>] ? ret_from_fork_nospec_begin+0xe/0x21
      Dec 11 09:44:24 fir-md1-s1 kernel:  [<ffffffffa7ac1b60>] ? insert_kthread_work+0x40/0x40
      

      Mount commands are stuck, even when using -o abort_recov

      I took a crash dump just in case you're interested.

      I believe it's a regression from earlier 2.11.x versions...

      HTH,
      Stephane

      Attachments

        Issue Links

          Activity

            [LU-11761] blocked MDT mount and high cpu usage from lodXXXX_recYYYY threads

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35661/
            Subject: LU-11761 fld: let's caller to retry FLD_QUERY
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 80eb31f5367f5b879b544a61a938a8acd7de9cf5

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35661/ Subject: LU-11761 fld: let's caller to retry FLD_QUERY Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 80eb31f5367f5b879b544a61a938a8acd7de9cf5

            Great! With this patch on top of current b2_12 (so quite a lot of changes for us, not only this patch), the recovery worked on all 4 MDTs:

            fir-md1-s2: Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 5:26, of 1378 clients 1378 recovered and 0 were evicted.
            fir-md1-s2: Sep 05 05:59:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 0:32, of 1378 clients 1378 recovered and 0 were evicted.
            fir-md1-s1: Sep 05 05:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 5:26, of 1379 clients 1379 recovered and 0 were evicted.
            fir-md1-s1: Sep 05 05:59:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 0:43, of 1379 clients 1379 recovered and 0 were evicted.
            

            Minor thing, two MDTs took some more time to mount, but after some 300s timeout it worked:

            [ 4563.995907] LustreError: 39383:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688019, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fb0c2240480/0x5731634ee5f506c6 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 3 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce1399443f3 expref: -99 pid: 39383 timeout: 0 lvb_type: 0
            [ 4564.033287] LustreError: 40296:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fb0c1334600) refcount nonzero (2) after lock cleanup; forcing cleanup.
            [ 4564.052847] Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7)
            [ 4564.062155] Lustre: Skipped 1 previous similar message
            [ 4564.070150] Lustre: fir-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
            [ 4564.323008] Lustre: fir-MDD0003: changelog on
            [ 4564.332139] Lustre: fir-MDT0003: in recovery but waiting for the first client to connect
            [ 4564.429981] Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1378 clients reconnect
            

            In case you want to investigate this timeout issue, the kernel logs of the successful start of 2.12.2_119 on our two MDSes are attached as:
            fir-md1-s1_2.12.2_119_SRCC.log and fir-md1-s2_2.12.2_119_SRCC.log

            But no 100% loaded lod* threads anymore. Not even a single eviction. This is awesome. Thanks all!

            sthiell Stephane Thiell added a comment - Great! With this patch on top of current b2_12 (so quite a lot of changes for us, not only this patch), the recovery worked on all 4 MDTs: fir-md1-s2: Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 5:26, of 1378 clients 1378 recovered and 0 were evicted. fir-md1-s2: Sep 05 05:59:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 0:32, of 1378 clients 1378 recovered and 0 were evicted. fir-md1-s1: Sep 05 05:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 5:26, of 1379 clients 1379 recovered and 0 were evicted. fir-md1-s1: Sep 05 05:59:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 0:43, of 1379 clients 1379 recovered and 0 were evicted. Minor thing, two MDTs took some more time to mount, but after some 300s timeout it worked: [ 4563.995907] LustreError: 39383:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688019, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fb0c2240480/0x5731634ee5f506c6 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 3 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce1399443f3 expref: -99 pid: 39383 timeout: 0 lvb_type: 0 [ 4564.033287] LustreError: 40296:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fb0c1334600) refcount nonzero (2) after lock cleanup; forcing cleanup. [ 4564.052847] Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [ 4564.062155] Lustre: Skipped 1 previous similar message [ 4564.070150] Lustre: fir-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 4564.323008] Lustre: fir-MDD0003: changelog on [ 4564.332139] Lustre: fir-MDT0003: in recovery but waiting for the first client to connect [ 4564.429981] Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1378 clients reconnect In case you want to investigate this timeout issue, the kernel logs of the successful start of 2.12.2_119 on our two MDSes are attached as: fir-md1-s1_2.12.2_119_SRCC.log and fir-md1-s2_2.12.2_119_SRCC.log But no 100% loaded lod* threads anymore. Not even a single eviction. This is awesome. Thanks all!
            pjones Peter Jones added a comment -

            Yes it is quite likely this will end up in 2.12.3

            pjones Peter Jones added a comment - Yes it is quite likely this will end up in 2.12.3
            sthiell Stephane Thiell added a comment - - edited

            Is there any chance to land this one in 2.12.3?

            sthiell Stephane Thiell added a comment - - edited Is there any chance to land this one in 2.12.3?

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35661
            Subject: LU-11761 fld: let's caller to retry FLD_QUERY
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: fe3d5156ac95e58c5fc62ab503dcff3619ccfbe0

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35661 Subject: LU-11761 fld: let's caller to retry FLD_QUERY Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: fe3d5156ac95e58c5fc62ab503dcff3619ccfbe0
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34962/
            Subject: LU-11761 fld: let's caller to retry FLD_QUERY
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e3f6111dfd1c6f2266d0beef67e5a7514a6965d0

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34962/ Subject: LU-11761 fld: let's caller to retry FLD_QUERY Project: fs/lustre-release Branch: master Current Patch Set: Commit: e3f6111dfd1c6f2266d0beef67e5a7514a6965d0

            People

              hongchao.zhang Hongchao Zhang
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: