Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8857

sptlrpc_target_local_read_conf() missing llog context

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.8.0
    • lustre-2.8.0_5.chaos
      kernel-3.10.0-510.0.0.2chaos.ch6.x86_64
      zfs-0.7.0-0.5llnl.ch6.x86_64
      DNE file system with 16 MDTs
    • 3
    • 9223372036854775807

    Description

      Booting the MDTs at the same time, MDT0001 (on zinc2) failed to exit recovery. The console log reports
      Console log shows:

      2016-11-15 16:29:26 [ 2960.562708] LustreError: 144169:0 (sec_config.c:1103:sptlrpc_target_local_read_conf()) missing llog context
      

      about 20 seconds after Lustre reported Build Version in log. No intervening errors.

      Dumped the debug logs from that MDT. The thread that reported the error recorded the following in the debug log:

         1 2016-11-15 16:29:26.845220 00000020:01000000:24.0::0:144169:0:(obd_config.c:1487:class_config_llog_handler()) Marker, inst_flg=0x8 mark_flg=0x1
         2 2016-11-15 16:29:26.845227 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf010
         3 2016-11-15 16:29:26.845228 00000020:00000080:24.0::0:144169:0:(obd_config.c:1218:class_process_config()) marker 8 (0x1) lsh-MDT0001-mdtl lov setup
         4 2016-11-15 16:29:26.845230 00000020:01000000:24.0::0:144169:0:(obd_config.c:1562:class_config_llog_handler()) For 2.x interoperability, rename obd type from lov to lod (lsh-MDT0001)
         5 2016-11-15 16:29:26.845231 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf001
         6 2016-11-15 16:29:26.845233 00000020:00000080:24.0::0:144169:0:(obd_config.c:362:class_attach()) attach type lod name: lsh-MDT0001-mdtlov uuid: lsh-MDT0001-mdtlov_UUID
         7 2016-11-15 16:29:26.857650 00000020:00000080:24.0::0:144169:0:(genops.c:371:class_newdev()) Adding new device lsh-MDT0001-mdtlov (ffff887f01a68000)
         8 2016-11-15 16:29:26.857653 00000020:00000080:24.0::0:144169:0:(obd_config.c:432:class_attach()) OBD: dev 3 attached type lod with refcount 1
         9 2016-11-15 16:29:26.857656 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf003
        10 2016-11-15 16:29:26.857701 00080000:01000000:24.0::0:144169:0:(osd_handler.c:1248:osd_obd_connect()) connect #1
        11 2016-11-15 16:29:26.857703 00000020:00000080:24.0::0:144169:0:(genops.c:1167:class_connect()) connect: client lsh-MDT0001-osd_UUID, cookie 0xd50fdc70539570e9
        12 2016-11-15 16:29:26.857727 00000020:00000080:24.0::0:144169:0:(obd_config.c:552:class_setup()) finished setup of obd lsh-MDT0001-mdtlov (uuid lsh-MDT0001-mdtlov_UUID)
        13 2016-11-15 16:29:26.857730 00000020:01000000:24.0::0:144169:0:(obd_config.c:1487:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2
        14 2016-11-15 16:29:26.857731 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf010
        15 2016-11-15 16:29:26.857731 00000020:00000080:24.0::0:144169:0:(obd_config.c:1218:class_process_config()) marker 8 (0x2) lsh-MDT0001-mdtl lov setup
        16 2016-11-15 16:29:26.857732 00000020:01000000:24.0::0:144169:0:(obd_config.c:1487:class_config_llog_handler()) Marker, inst_flg=0x0 mark_flg=0x1
        17 2016-11-15 16:29:26.857733 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf010
        18 2016-11-15 16:29:26.857734 00000020:00000080:24.0::0:144169:0:(obd_config.c:1218:class_process_config()) marker 9 (0x1) lsh-MDT0001 add mdt
        19 2016-11-15 16:29:26.857735 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf001
        20 2016-11-15 16:29:26.857735 00000020:00000080:24.0::0:144169:0:(obd_config.c:362:class_attach()) attach type mdt name: lsh-MDT0001 uuid: lsh-MDT0001_UUID
        21 2016-11-15 16:29:26.857754 00000020:00000080:24.0::0:144169:0:(genops.c:371:class_newdev()) Adding new device lsh-MDT0001 (ffff887f01a68f28)
        22 2016-11-15 16:29:26.857755 00000020:00000080:24.0::0:144169:0:(obd_config.c:432:class_attach()) OBD: dev 4 attached type mdt with refcount 1
        23 2016-11-15 16:29:26.857756 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf007
        24 2016-11-15 16:29:26.857757 00000020:00000080:24.0::0:144169:0:(obd_config.c:1176:class_process_config()) mountopt: profile lsh-MDT0001 osc lsh-MDT0001-mdtlov mdc (null)
        25 2016-11-15 16:29:26.857758 00000020:01000000:24.0::0:144169:0:(obd_config.c:873:class_add_profile()) Add profile lsh-MDT0001
        26 2016-11-15 16:29:26.857760 00000020:00000080:24.0::0:144169:0:(obd_config.c:1148:class_process_config()) processing cmd: cf003
        27 2016-11-15 16:29:26.857772 00000020:01000004:24.0::0:144169:0:(obd_mount_server.c:175:server_get_mount()) get mount ffff883f70083800 from lsh-MDT0001, refs=2
        28 2016-11-15 16:29:26.857776 00000020:00000080:24.0::0:144169:0:(obd_config.c:362:class_attach()) attach type mdd name: lsh-MDD0001 uuid: lsh-MDD0001_UUID
        29 2016-11-15 16:29:26.868373 00000020:00000080:24.0::0:144169:0:(genops.c:371:class_newdev()) Adding new device lsh-MDD0001 (ffff887f01a69e50)
        30 2016-11-15 16:29:26.868376 00000020:00000080:24.0::0:144169:0:(obd_config.c:432:class_attach()) OBD: dev 5 attached type mdd with refcount 1
        31 2016-11-15 16:29:26.868397 00000004:01000000:24.0::0:144169:0:(lod_dev.c:1687:lod_obd_connect()) connect #0
        32 2016-11-15 16:29:26.868399 00000020:00000080:24.0::0:144169:0:(genops.c:1167:class_connect()) connect: client lsh-MDT0001-mdtlov_UUID, cookie 0xd50fdc70539570fe
        33 2016-11-15 16:29:26.868412 00000020:00000080:24.0::0:144169:0:(obd_config.c:552:class_setup()) finished setup of obd lsh-MDD0001 (uuid lsh-MDD0001_UUID)
        34 2016-11-15 16:29:26.868420 00000004:01000000:24.0::0:144169:0:(mdd_device.c:1178:mdd_obd_connect()) connect #0
        35 2016-11-15 16:29:26.868422 00000020:00000080:24.0::0:144169:0:(genops.c:1167:class_connect()) connect: client lsh-MDD0001_UUID, cookie 0xd50fdc7053957105
        36 2016-11-15 16:29:26.868423 00080000:01000000:24.0::0:144169:0:(osd_handler.c:1248:osd_obd_connect()) connect #2
        37 2016-11-15 16:29:26.868425 00000020:00000080:24.0::0:144169:0:(genops.c:1167:class_connect()) connect: client lsh-MDT0001-osd_UUID, cookie 0xd50fdc705395710c
        38 2016-11-15 16:29:26.920911 02000000:00020000:24.0::0:144169:0:(sec_config.c:1103:sptlrpc_target_local_read_conf()) missing llog context
        39 2016-11-15 16:29:26.931932 00000020:00000080:24.0::0:144169:0:(obd_config.c:552:class_setup()) finished setup of obd lsh-MDT0001 (uuid lsh-MDT0001_UUID)
      

      Attachments

        Activity

          [LU-8857] sptlrpc_target_local_read_conf() missing llog context
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24119/
          Subject: LU-8857 config: refactor sptlrpc config process
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: df00ee15534419070c559acdb913898fc9dc3a00

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24119/ Subject: LU-8857 config: refactor sptlrpc config process Project: fs/lustre-release Branch: master Current Patch Set: Commit: df00ee15534419070c559acdb913898fc9dc3a00
          ofaaland Olaf Faaland added a comment -

          Updated priority which had been set to Critical when I thought the error message was related to recovery problems, and not updated after I learned otherwise.

          ofaaland Olaf Faaland added a comment - Updated priority which had been set to Critical when I thought the error message was related to recovery problems, and not updated after I learned otherwise.

          Hi all,
          Can we get some review of Lai's patch?
          thanks,
          Olaf

          ofaaland Olaf Faaland added a comment - Hi all, Can we get some review of Lai's patch? thanks, Olaf

          Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/24119
          Subject: LU-8857 llog: init llog context for target
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 2cd184c00ed99b250ce4ee636b78716738f29bdd

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/24119 Subject: LU-8857 llog: init llog context for target Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2cd184c00ed99b250ce4ee636b78716738f29bdd
          ofaaland Olaf Faaland added a comment -

          Lai,
          OK, thanks for the update.

          ofaaland Olaf Faaland added a comment - Lai, OK, thanks for the update.
          laisiyao Lai Siyao added a comment -

          “missing log context" looks to be a bug introduced in server code refactor long ago, and after I tweaked the code a bit it's reproduced, but I still need to understand the refactor more to fix it.

          laisiyao Lai Siyao added a comment - “missing log context" looks to be a bug introduced in server code refactor long ago, and after I tweaked the code a bit it's reproduced, but I still need to understand the refactor more to fix it.

          Removed topllnl label because the log message does not reflect an actual error.

          ofaaland Olaf Faaland added a comment - Removed topllnl label because the log message does not reflect an actual error.

          Di,
          Thanks. The servers have been restarted while I looked into the message, so I lost the state you are looking for. I'll use this ticket for a patch to change the debug mask on the "missing llog context" message.

          ofaaland Olaf Faaland added a comment - Di, Thanks. The servers have been restarted while I looked into the message, so I lost the state you are looking for. I'll use this ticket for a patch to change the debug mask on the "missing llog context" message.
          di.wang Di Wang added a comment -

          Olaf: You can ignore "missing llog context", that is noise, which should be un-related with the endless recovery.

          According to the debug log, it looks like MDT0001 is keep waiting for all of clients to connect. Olaf, all clients and other MDTs are up at that moment? and also it seems recovery debug mask (D_HA 0x00080000) is not enabled?

          0010000:02000400:0.0:1479257111.133403:0:96245:0:(ldlm_lib.c:779:target_handle_reconnect()) lsh-MDT0001: Client lsh-MDT0008-mdtlov_UUID (at 172.19.3.9@o2ib600) reconnecting, waiting for 1187 clients in recovery for 0:00
          00000100:02000000:0.0:1479257111.133417:0:96245:0:(import.c:1539:ptlrpc_import_recovery_state_machine()) lsh-MDT0001: Connection restored to  (at 172.19.3.9@o2ib600)
          00010000:02000400:6.0:1479257111.467229:0:143823:0:(ldlm_lib.c:779:target_handle_reconnect()) lsh-MDT0001: Client lsh-MDT0005-mdtlov_UUID (at 172.19.3.6@o2ib600) reconnecting, waiting for 1187 clients in recovery for 0:00
          

          Olaf: could you please also post the stack trace for "tgt_recover_1"? thanks.

          di.wang Di Wang added a comment - Olaf: You can ignore "missing llog context", that is noise, which should be un-related with the endless recovery. According to the debug log, it looks like MDT0001 is keep waiting for all of clients to connect. Olaf, all clients and other MDTs are up at that moment? and also it seems recovery debug mask (D_HA 0x00080000) is not enabled? 0010000:02000400:0.0:1479257111.133403:0:96245:0:(ldlm_lib.c:779:target_handle_reconnect()) lsh-MDT0001: Client lsh-MDT0008-mdtlov_UUID (at 172.19.3.9@o2ib600) reconnecting, waiting for 1187 clients in recovery for 0:00 00000100:02000000:0.0:1479257111.133417:0:96245:0:(import.c:1539:ptlrpc_import_recovery_state_machine()) lsh-MDT0001: Connection restored to (at 172.19.3.9@o2ib600) 00010000:02000400:6.0:1479257111.467229:0:143823:0:(ldlm_lib.c:779:target_handle_reconnect()) lsh-MDT0001: Client lsh-MDT0005-mdtlov_UUID (at 172.19.3.6@o2ib600) reconnecting, waiting for 1187 clients in recovery for 0:00 Olaf: could you please also post the stack trace for "tgt_recover_1"? thanks.

          People

            laisiyao Lai Siyao
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: