Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13069

osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 308 changes, 111 in progress, 0 in flight

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.13.0
    • 3
    • 9223372036854775807

    Description

      osp llog processing stops before umount, because it doesn't see some records at catalog

      REFORMAT=yes PTLDEBUG="-malloc-trace" SUBSYSTEM_DEBUG="+log+osc+osd-rpc-lnet-ldlm" DEBUG_SIZE=200 ONLY=135 sh lustre/tests/sanity.sh
       
      [ 3647.083597] Lustre: DEBUG MARKER: == sanity test 135: Race catalog processing ========================================================== 07:07:08 (1573646828)
      [ 3669.794725] Lustre: lustre-OST0000-osc-ffff88003711b800: disconnect after 24s idle
      [ 4257.120915] LNet: 20833:0:(lib-move.c:3914:lnet_parse_put()) Dropping PUT from 12345-10.0.2.9@tcp portal 4 match 1650088281856656 offset 0 length 192: 4
      [ 4600.368617] Lustre: lustre-OST0001-osc-ffff88003711b800: reconnect after 931s idle
      [ 4600.369232] Lustre: lustre-OST0000: Connection restored to 10.0.2.9@tcp (at 10.0.2.9@tcp)
      [ 4600.369235] Lustre: Skipped 5 previous similar messages
      [ 4640.122168] LNet: 20832:0:(lib-move.c:3914:lnet_parse_put()) Dropping PUT from 12345-10.0.2.9@tcp portal 16 match 1650088287536976 offset 0 length 192: 4
      [ 4896.183677] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 4896.975199] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      [ 4896.975203] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 1 previous similar message
      [ 4898.956196] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      [ 4898.956201] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 3 previous similar messages
      [ 4901.807494] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 4901.807504] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 3 previous similar messages
      [ 4906.198002] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 4906.198007] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 9 previous similar messages
      [ 4914.323032] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 4914.323037] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 15 previous similar messages
      [ 4931.388799] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      [ 4931.388804] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 31 previous similar messages
      [ 4963.746106] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      [ 4963.746112] Lustre: 23911:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 55 previous similar messages
      [ 5027.798533] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 5027.798541] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 121 previous similar messages
      [ 5156.633800] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) lustre-OST0001-osc-MDT0000: catlog [0x5:0x1:0x0] crosses index zero
      [ 5156.633808] Lustre: 23913:0:(llog_cat.c:918:llog_cat_process_or_fork()) Skipped 245 previous similar messages
      [ 5218.830741] Lustre: DEBUG MARKER: == sanity test complete, duration 1597 sec =========================================================== 07:33:20 (1573648400)
      [ 5219.039200] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
      [ 5228.383189] LustreError: 23913:0:(osp_sync.c:1293:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 308 changes, 111 in progress, 0 in flight
      [ 5228.383242] LustreError: 23913:0:(osp_sync.c:1293:osp_sync_thread()) LBUG
      [ 5228.383260] Pid: 23913, comm: osp-syn-1-0 3.10.0-693.21.1.x3.1.9.x86_64 #1 SMP Tue Jun 26 09:38:31 PDT 2018
      [ 5228.383261] Call Trace:
      [ 5228.383271]  [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40
      [ 5228.383275]  [<ffffffffc055f7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 5228.383283]  [<ffffffffc055f87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 5228.383288]  [<ffffffffc11ea71b>] osp_sync_thread+0xa1b/0xa70 [osp]
      [ 5228.383296]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      

      and test 136

      [ 1303.271639] Lustre: 20681:0:(llog_cat.c:919:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      [ 1336.588370] LustreError: 20681:0:(osp_sync.c:1293:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 2 changes, 971 in progress, 4 in flight
      [ 1336.591561] LustreError: 20681:0:(osp_sync.c:1293:osp_sync_thread()) LBUG
      Log
      
      00000040:00080000:3.0:1575562250.250807:0:20681:0:(llog.c:665:llog_process_thread()) index: 438, lh_last_idx: 786 synced_idx: 0 lgh_last_idx: 787
      00000040:00080000:3.0:1575562250.250808:0:20681:0:(llog_cat.c:814:llog_cat_process_common()) processing log [0xf85d7:0x1:0x0]:0 at index 438 of catalog [0x2:0x1:0x0]
      00000040:00080000:3.0:1575562250.250855:0:20681:0:(llog.c:822:llog_process_or_fork()) Processing [0xf85d7:0x1:0x0] flags 0x005 startcat -1054886720 startidx -1 first_idx -1 last_idx -1
      00000004:00080000:1.0:1575562250.250966:0:20043:0:(osp_sync.c:564:osp_sync_interpret()) reply req ffff880043637500/2, rc 0, transno 32314
      00002000:00080000:2.0:1575562250.251029:0:21009:0:(ofd_dev.c:1798:ofd_destroy_hdl()) lustre-OST0000: Destroy object 0x0:1316016 count 1
      00000004:00080000:3.0:1575562250.251034:0:20041:0:(osp_sync.c:564:osp_sync_interpret()) reply req ffff880050e1c380/2, rc 0, transno 32315
      00000004:00080000:3.0:1575562250.251039:0:20681:0:(osp_sync.c:1166:osp_sync_process_queues()) 349 changes, 1363 in progress, 6 in flight
      00000040:00080000:3.0:1575562250.251040:0:20681:0:(llog.c:715:llog_process_thread()) stop processing plain 0xf85d7:1:0 index 2 count 2
      00000040:00080000:3.0:1575562250.251042:0:20681:0:(llog.c:715:llog_process_thread()) stop processing catalog 0x2:1:0 index 438 count 1713
      00000040:00000400:3.0:1575562250.251045:0:20681:0:(llog_cat.c:919:llog_cat_process_or_fork()) lustre-OST0000-osc-MDT0000: catlog [0x2:0x1:0x0] crosses index zero
      00000040:00080000:3.0:1575562250.251051:0:20681:0:(llog.c:822:llog_process_or_fork()) Processing [0x2:0x1:0x0] flags 0x012 startcat 439 startidx 0 first_idx 439 last_idx 787
      00000040:00080000:3.0:1575562250.251057:0:20681:0:(llog.c:665:llog_process_thread()) index: 440, lh_last_idx: 787 synced_idx: 0 lgh_last_idx: 787
      00000040:00080000:3.0:1575562250.251058:0:20681:0:(llog_cat.c:814:llog_cat_process_common()) processing log [0xf85d9:0x1:0x0]:0 at index 440 of catalog [0x2:0x1:0x0]
      

      So 438 was processed, and then loop restart and next index is 440.
      Catalog dump shows the same thing.

      Number of records: 973
      Target uuid :
      -----------------------
      #439 (064)id=[0xf85d8:0x1:0x0]:0 path=oi.1/0x1:0xf85d8:0x0
      #788 (064)id=[0xf8735:0x1:0x0]:0 path=oi.1/0x1:0xf8735:0x0
      #10327 (064)id=[0xfac78:0x1:0x0]:0 path=oi.1/0x1:0xfac78:0x0
      #10328 (064)id=[0xfac79:0x1:0x0]:0 path=oi.1/0x1:0xfac79:0x0
      #10329 (064)id=[0xfac7a:0x1:0x0]:0 path=oi.1/0x1:0xfac7a:0x0
      #10330 (064)id=[0xfac7b:0x1:0x0]:0 path=oi.1/0x1:0xfac7b:0x0
      #10331 (064)id=[0xfac7c:0x1:0x0]:0 path=oi.1/0x1:0xfac7c:0x0
      

      The root case lay on a loop restart, for a wrapped catalog processing starts from a next index and it increase a last processed. But llog_process_thread do the one more increasing for a incoming index.

        d->opd_sync_last_catalog_idx = 0;
              do {
                      int     size;
       
                      wrapped = (llh->lgh_hdr->llh_cat_idx >= llh->lgh_last_idx &&
                                 llh->lgh_hdr->llh_count > 1);
       
                      rc = llog_cat_process(&env, llh, osp_sync_process_queues, d,
                                            d->opd_sync_last_catalog_idx, 0);
       
                      size = OBD_FAIL_PRECHECK(OBD_FAIL_CAT_RECORDS) ?
                             cfs_fail_val : (LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr) - 1);
                      /* processing reaches catalog bottom */
                      if (d->opd_sync_last_catalog_idx == size)
                              d->opd_sync_last_catalog_idx = LLOG_CAT_FIRST;
                      else if (wrapped)
                              /* If catalog is wrapped we can`t predict last index of
                               * processing because lgh_last_idx could be changed.
                               * Starting form the next one */
                              d->opd_sync_last_catalog_idx++;
       
              } while (rc == 0 && (wrapped ||
                                   d->opd_sync_last_catalog_idx == LLOG_CAT_FIRST));
      
        if (cd != NULL) {
                      last_called_index = cd->lpcd_first_idx;
                      index = cd->lpcd_first_idx + 1;
              }
      

      Attachments

        Activity

          People

            aboyko Alexander Boyko
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: