Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12717

ASSERTION( !lod_obj_is_striped(child) ) failed

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.8
    • None
    • Clients: 2.12.0, CentOS 7.6
    • 3
    • 9223372036854775807

    Description

      LBUG today on oak-MDT0000, never seem this one before. We have had some big data transfers using dsync going on on Sherlock (2.12.0 clients). Might be related, or not.

      [4954375.921845] LustreError: 15102:0:(tgt_handler.c:628:process_req_last_xid()) @@@ Unexpected xid 5d6425ffe4140 vs. last_xid 5d6425ffe418f
        req@ffffa1597f41f200 x1642955450237248/t0(0) o101->98bbe778-4f70-8a89-d80e-d6a8120c693b@10.8.2.23@o2ib6:663/0 lens 736/0 e 0 to 0 dl 1567111883 ref 1 fl Interpret:/2/ffffffff rc 0/-1
      [4954542.487326] LustreError: 15290:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [4954542.517377] LustreError: 15290:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3754300 previous similar messages
      [4954874.316190] LustreError: 15347:0:(lod_object.c:3919:lod_ah_init()) ASSERTION( !lod_obj_is_striped(child) ) failed: 
      [4954874.351112] LustreError: 15347:0:(lod_object.c:3919:lod_ah_init()) LBUG
      [4954874.373452] Pid: 15347, comm: mdt01_049 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Mon Oct 8 11:21:37 PDT 2018
      [4954874.406359] Call Trace:
      [4954874.414973]  [<ffffffffc08af7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [4954874.437035]  [<ffffffffc08af87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [4954874.459664]  [<ffffffffc135a89f>] lod_ah_init+0x23f/0xde0 [lod]
      [4954874.479751]  [<ffffffffc13d306b>] mdd_object_make_hint+0xcb/0x190 [mdd]
      [4954874.502388]  [<ffffffffc13bed50>] mdd_create_data+0x330/0x730 [mdd]
      [4954874.523606]  [<ffffffffc129140c>] mdt_mfd_open+0xc5c/0xe70 [mdt]
      [4954874.544523]  [<ffffffffc1291b9b>] mdt_finish_open+0x57b/0x690 [mdt]
      [4954874.565743]  [<ffffffffc1293478>] mdt_reint_open+0x17c8/0x3190 [mdt]
      [4954874.587229]  [<ffffffffc1288cb3>] mdt_reint_rec+0x83/0x210 [mdt]
      [4954874.607567]  [<ffffffffc126a19b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [4954874.630197]  [<ffffffffc126a6c2>] mdt_intent_reint+0x162/0x430 [mdt]
      [4954874.651677]  [<ffffffffc126d4cb>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
      [4954874.672619]  [<ffffffffc1275d68>] mdt_intent_policy+0x138/0x320 [mdt]
      [4954874.694668]  [<ffffffffc0be82dd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc]
      [4954874.719320]  [<ffffffffc0c11c03>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc]
      [4954874.743104]  [<ffffffffc0c977f2>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [4954874.764026]  [<ffffffffc0c9b72a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [4954874.787245]  [<ffffffffc0c4404b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [4954874.813872]  [<ffffffffc0c47792>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [4954874.835628]  [<ffffffff8babdf21>] kthread+0xd1/0xe0
      [4954874.852252]  [<ffffffff8c1255f7>] ret_from_fork_nospec_end+0x0/0x39
      [4954874.873448]  [<ffffffffffffffff>] 0xffffffffffffffff
      [4954874.890366] Kernel panic - not syncing: LBUG
      

      I do have a crash dump if you're interested. MDT failover was smooth so not a big deal:

      Aug 29 14:04:49 oak-md1-s1 kernel: Lustre: oak-MDT0000: Recovery over after 0:55, of 1464 clients 1464 recovered and 0 were evicted.
      

       

      Attachments

        Activity

          [LU-12717] ASSERTION( !lod_obj_is_striped(child) ) failed

          Hi! This issue hit us again today, even though we're now using SSDs on all Oak's MDTs. I see that Lai's patch above (https://review.whamcloud.com/36100) was almost ready to land and even had Andreas' approval. It would probably be too much effort to port it to 2.10.8 (that we're still running on Oak), but would it be possible that you look at the patch again so that it can land into master. That way, this rare issue would be avoided in the future. Thanks!

          Apr 30 15:28:38 oak-md1-s2 kernel: LustreError: 9033:0:(lod_object.c:3919:lod_ah_init()) ASSERTION( !lod_obj_is_striped(child) ) failed: 
          Apr 30 15:28:38 oak-md1-s2 kernel: LustreError: 9033:0:(lod_object.c:3919:lod_ah_init()) LBUG
          Apr 30 15:28:38 oak-md1-s2 kernel: Pid: 9033, comm: mdt01_088 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019
          Apr 30 15:28:38 oak-md1-s2 kernel: Call Trace:
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc0ddb7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc0ddb87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc174fb4f>] lod_ah_init+0x23f/0xde0 [lod]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc17cc09b>] mdd_object_make_hint+0xcb/0x190 [mdd]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc17b7d50>] mdd_create_data+0x330/0x730 [mdd]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168a3fc>] mdt_mfd_open+0xc5c/0xe70 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168ab8b>] mdt_finish_open+0x57b/0x690 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168d09d>] mdt_reint_open+0x23fd/0x3190 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc1681ca3>] mdt_reint_rec+0x83/0x210 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc166318b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc16636b2>] mdt_intent_reint+0x162/0x430 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc16664bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc166ed58>] mdt_intent_policy+0x138/0x320 [mdt]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc123d2dd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc1266c03>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc12ec892>] tgt_enqueue+0x62/0x210 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc12f07ca>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc129905b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc129c7a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffff866c2e81>] kthread+0xd1/0xe0
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffff86d77c37>] ret_from_fork_nospec_end+0x0/0x39
          Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
          
          sthiell Stephane Thiell added a comment - Hi! This issue hit us again today, even though we're now using SSDs on all Oak's MDTs. I see that Lai's patch above ( https://review.whamcloud.com/36100 ) was almost ready to land and even had Andreas' approval. It would probably be too much effort to port it to 2.10.8 (that we're still running on Oak), but would it be possible that you look at the patch again so that it can land into master. That way, this rare issue would be avoided in the future. Thanks! Apr 30 15:28:38 oak-md1-s2 kernel: LustreError: 9033:0:(lod_object.c:3919:lod_ah_init()) ASSERTION( !lod_obj_is_striped(child) ) failed: Apr 30 15:28:38 oak-md1-s2 kernel: LustreError: 9033:0:(lod_object.c:3919:lod_ah_init()) LBUG Apr 30 15:28:38 oak-md1-s2 kernel: Pid: 9033, comm: mdt01_088 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Apr 30 15:28:38 oak-md1-s2 kernel: Call Trace: Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc0ddb7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc0ddb87c>] lbug_with_loc+0x4c/0xa0 [libcfs] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc174fb4f>] lod_ah_init+0x23f/0xde0 [lod] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc17cc09b>] mdd_object_make_hint+0xcb/0x190 [mdd] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc17b7d50>] mdd_create_data+0x330/0x730 [mdd] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168a3fc>] mdt_mfd_open+0xc5c/0xe70 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168ab8b>] mdt_finish_open+0x57b/0x690 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc168d09d>] mdt_reint_open+0x23fd/0x3190 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc1681ca3>] mdt_reint_rec+0x83/0x210 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc166318b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc16636b2>] mdt_intent_reint+0x162/0x430 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc16664bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc166ed58>] mdt_intent_policy+0x138/0x320 [mdt] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc123d2dd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc1266c03>] ldlm_handle_enqueue0+0xa83/0x1670 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc12ec892>] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc12f07ca>] tgt_request_handle+0x92a/0x1370 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc129905b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffc129c7a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffff866c2e81>] kthread+0xd1/0xe0 Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffff86d77c37>] ret_from_fork_nospec_end+0x0/0x39 Apr 30 15:28:38 oak-md1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff

          Hi Lai,

          Thanks! As we're still running 2.10.8 on Oak, this patch apparently will require some back porting:

          Making all in .
          /tmp/rpmbuild-lustre-sthiell-SRrpJSP1/BUILD/lustre-2.10.8_4_g05af3ab/lustre/lod/lod_object.c: In function 'lod_invalidate':
          /tmp/rpmbuild-lustre-sthiell-SRrpJSP1/BUILD/lustre-2.10.8_4_g05af3ab/lustre/lod/lod_object.c:4883:2: error: implicit declaration of function 'lod_striping_free' [-Werror=implicit-function-declaration]
            lod_striping_free(env, lod_dt_obj(dt));
            ^
          cc1: all warnings being treated as errors
          

          In any case, for us, we're in the process of migrating MDT0 to a SSD backed RAID-10 volume (offline device-level copy + resize2fs). We'll do the same for MDT1 later (as soon as I get more SSDs). Meanwhile we have tested NRS TBF to limit the number of op/s and this has been helpful and it didn't crash again even with dsync running, but of course the performance is not optimal.

          sthiell Stephane Thiell added a comment - Hi Lai, Thanks! As we're still running 2.10.8 on Oak, this patch apparently will require some back porting: Making all in . /tmp/rpmbuild-lustre-sthiell-SRrpJSP1/BUILD/lustre-2.10.8_4_g05af3ab/lustre/lod/lod_object.c: In function 'lod_invalidate': /tmp/rpmbuild-lustre-sthiell-SRrpJSP1/BUILD/lustre-2.10.8_4_g05af3ab/lustre/lod/lod_object.c:4883:2: error: implicit declaration of function 'lod_striping_free' [-Werror=implicit-function-declaration] lod_striping_free(env, lod_dt_obj(dt)); ^ cc1: all warnings being treated as errors In any case, for us, we're in the process of migrating MDT0 to a SSD backed RAID-10 volume (offline device-level copy + resize2fs). We'll do the same for MDT1 later (as soon as I get more SSDs). Meanwhile we have tested NRS TBF to limit the number of op/s and this has been helpful and it didn't crash again even with dsync running, but of course the performance is not optimal.
          laisiyao Lai Siyao added a comment -

          It may be related with journal full: upon journal full, the LOV setting transaction may fail, but it doesn't free allocated striping in LOV declare_set, and next LOV setting will trigger lod_obj_is_striped() assertion.

          laisiyao Lai Siyao added a comment - It may be related with journal full: upon journal full, the LOV setting transaction may fail, but it doesn't free allocated striping in LOV declare_set, and next LOV setting will trigger lod_obj_is_striped() assertion.
          laisiyao Lai Siyao added a comment -

          Hi Stephane, I uploaded a patch, you can apply it and see whether dsync works with it.

          laisiyao Lai Siyao added a comment - Hi Stephane, I uploaded a patch, you can apply it and see whether dsync works with it.

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36100
          Subject: LU-12717 mdd: free striping upon LOV setting error
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 47dd1cae1c02f63840cb9e301239917ca2138de9

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36100 Subject: LU-12717 mdd: free striping upon LOV setting error Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 47dd1cae1c02f63840cb9e301239917ca2138de9

          Another hint:

          [root@oak-md1-s2 127.0.0.1-2019-08-31-18:02:59]# grep -c __jbd2_log_wait_for_space foreach_bt-crash-oak-md1-s2-2019-08-31-18-02-59 
          290
          

          That doesn't sound good... Still, it shouldn't LBUG. Let me know what you think.

          sthiell Stephane Thiell added a comment - Another hint: [root@oak-md1-s2 127.0.0.1-2019-08-31-18:02:59]# grep -c __jbd2_log_wait_for_space foreach_bt-crash-oak-md1-s2-2019-08-31-18-02-59 290 That doesn't sound good... Still, it shouldn't LBUG. Let me know what you think.

          Hi Lai,

          A colleague has been playing with it for the last few days to perform large parallel transfers. dsync is part of mpiFileUtils:

          I think what makes it different (vs. rsync/fpsync) is that it will remove all files from the destination first, then it will create directories and empty files, then when it's done, it will copy the data and after that set permissions on all copied files. Below is an example of run:

             0: [2019-09-02T08:25:31] Walking source path                                 
             0: [2019-09-02T08:25:31] Walking /scratch/groups/pritch                      
             0: [2019-09-02T08:54:15] Items walked 863997                                 
             0: [2019-09-02T09:19:38] Items walked 7289486                                
             0: [2019-09-02T09:19:46] Items walked 9080593                                
             0: [2019-09-02T09:19:55] Items walked 9334035                                
             0: [2019-09-02T09:20:05] Items walked 9564552                                
             0: [2019-09-02T09:20:20] Items walked 9746523                                
             0: [2019-09-02T09:20:30] Items walked 9956014                                
             0: [2019-09-02T09:20:46] Items walked 10128637                               
             0: [2019-09-02T09:21:10] Items walked 10389513                               
             0: [2019-09-02T09:21:20] Items walked 10528663                               
             0: [2019-09-02T09:21:33] Items walked 10528663                               
             0: [2019-09-02T09:21:43] Items walked 10528663                               
             0: [2019-09-02T09:21:54] Items walked 10528663                               
             0: [2019-09-02T09:22:03] Walked 10528663 items in 3391.952981 seconds (3104.012072 files/sec)
             0: [2019-09-02T09:22:03] Walking destination path                            
             0: [2019-09-02T09:22:03] Walking /oak/stanford/groups/pritch/scratch.copy/groups/pritch
             0: [2019-09-02T09:39:47] Items walked 59941                                  
             0: [2019-09-02T09:40:00] Items walked 8726652                                
             0: [2019-09-02T09:40:08] Items walked 8811009
             0: [2019-09-02T09:40:20] Removing 10404883 items                             
             0: [2019-09-02T09:40:26] level=26 min=0 max=1 sum=3 rate=7.486081 secs=0.400744
             0: [2019-09-02T09:40:27] level=25 min=0 max=1 sum=185 rate=403.045488 secs=0.459005
             0: [2019-09-02T09:40:27] level=24 min=0 max=1 sum=783 rate=1699.719168 secs=0.460664
             0: [2019-09-02T09:40:28] level=23 min=3 max=4 sum=3176 rate=5708.211666 secs=0.556391
             0: [2019-09-02T09:40:29] level=22 min=7 max=8 sum=7293 rate=11232.519041 secs=0.649276
             0: [2019-09-02T09:40:29] level=21 min=8 max=9 sum=8477 rate=11066.037630 secs=0.766038
             0: [2019-09-02T09:40:31] level=20 min=23 max=24 sum=24537 rate=13170.356973 secs=1.863047
             0: [2019-09-02T09:40:33] level=19 min=44 max=45 sum=45833 rate=22867.105954 secs=2.004320
             0: [2019-09-02T09:40:36] level=18 min=76 max=77 sum=78818 rate=27849.612712 secs=2.830129
             0: [2019-09-02T09:40:41] level=17 min=138 max=139 sum=142200 rate=32225.704622 secs=4.412627
             0: [2019-09-02T09:41:04] level=16 min=625 max=626 sum=640878 rate=27090.786318 secs=23.656678
             0: [2019-09-02T09:42:44] level=15 min=611 max=612 sum=625755 rate=6303.651528 secs=99.268654
             0: [2019-09-02T09:47:07] level=14 min=1291 max=1292 sum=1322053 rate=5012.764106 secs=263.737326
             0: [2019-09-02T10:16:42] level=13 min=4246 max=4247 sum=4348287 rate=2450.833574 secs=1774.207374
             0: [2019-09-02T10:34:13] level=12 min=2252 max=2253 sum=2306862 rate=2193.813409 secs=1051.530632
             0: [2019-09-02T10:36:53] level=11 min=771 max=772 sum=790333 rate=4943.302018 secs=159.879570
             0: [2019-09-02T10:36:56] level=10 min=56 max=57 sum=58175 rate=25132.111486 secs=2.314768
             0: [2019-09-02T10:36:56] level=9 min=1 max=2 sum=1225 rate=1446.678564 secs=0.846767
             0: [2019-09-02T10:36:57] level=8 min=0 max=1 sum=10 rate=14.656902 secs=0.682272
             0: [2019-09-02T10:36:57] Removed 10404883 items in 3397.492401 seconds (3062.518402 items/sec)
             0: [2019-09-02T10:36:58] Copying items to destination                        
             0: [2019-09-02T10:36:58] Copying to /oak/stanford/groups/pritch/scratch.copy/groups/pritch
             0: [2019-09-02T10:36:58] Items: 10408855                                     
             0: [2019-09-02T10:36:58]   Directories: 104                                  
             0: [2019-09-02T10:36:58]   Files: 10408751                                   
             0: [2019-09-02T10:36:58]   Links: 0                                          
             0: [2019-09-02T10:36:58] Data: 40.035 TB (4.033 MB per file)                 
             0: [2019-09-02T10:37:10] Creating directories.                               
             0: [2019-09-02T10:37:10]   level=4 min=0 max=0 sum=0 rate=0.000000/sec secs=0.017780
             0: [2019-09-02T10:37:10]   level=5 min=0 max=0 sum=0 rate=0.000000/sec secs=0.012852
             0: [2019-09-02T10:37:10]   level=6 min=0 max=0 sum=0 rate=0.000000/sec secs=0.022612
          …
             0: [2019-09-02T10:37:12] Created 104 directories in 2.017779 seconds (51.541817 items/sec)
             0: [2019-09-02T10:37:12] Creating files.    
             0: [2019-09-02T11:12:10] Created 10408751 items in 2097.987293 seconds (4961.303166 items/sec)
             0: [2019-09-02T11:12:10] Copying data.                                       
             0: [2019-09-02T14:12:18] Copy data: 40.035 TB (44018758689546 bytes)         
             0: [2019-09-02T14:12:18] Copy rate: 3.793 GB/s (44018758689546 bytes in 10807.864147 seconds)
             0: [2019-09-02T14:12:18] Syncing data to disk.                               
             0: [2019-09-02T14:12:32] Sync completed in 14.066516 seconds.                
             0: [2019-09-02T14:12:32] Setting ownership, permissions, and timestamps.     
             0: [2019-09-02T14:32:09] Updated 10408855 items in 1177.167491 seconds (8842.288869 items/sec)
             0: [2019-09-02T14:32:09] Syncing directory updates to disk.                  
             0: [2019-09-02T14:32:22] Sync completed in 13.326690 seconds.                
             0: [2019-09-02T14:32:23] Started: Sep-02-2019,10:36:58                       
             0: [2019-09-02T14:32:23] Completed: Sep-02-2019,14:32:22                     
             0: [2019-09-02T14:32:23] Seconds: 14124.547                                  
             0: [2019-09-02T14:32:23] Items: 10408855                                     
             0: [2019-09-02T14:32:23]   Directories: 104                                  
             0: [2019-09-02T14:32:23]   Files: 10408751                                   
             0: [2019-09-02T14:32:23]   Links: 0                                          
             0: [2019-09-02T14:32:23] Data: 40.035 TB (44018758689546 bytes)              
             0: [2019-09-02T14:32:23] Rate: 2.902 GB/s (44018758689546 bytes in 14124.547 seconds)
          

          Oak doesn't use SSDs on the MDTs at the moment, and I fear that the JBD2 journal (4GB) was full at some point. I'm investigating a file-level backup/restore-to-SSD option (we would also like to move from 512-byte inodes to 1024 for PFL later).

          sthiell Stephane Thiell added a comment - Hi Lai, A colleague has been playing with it for the last few days to perform large parallel transfers. dsync is part of mpiFileUtils: man: https://github.com/hpc/mpifileutils/blob/master/doc/rst/dsync.1.rst source: https://github.com/hpc/mpifileutils/blob/master/src/dsync/dsync.c I think what makes it different (vs. rsync/fpsync) is that it will remove all files from the destination first, then it will create directories and empty files, then when it's done, it will copy the data and after that set permissions on all copied files. Below is an example of run: 0: [2019-09-02T08:25:31] Walking source path 0: [2019-09-02T08:25:31] Walking /scratch/groups/pritch 0: [2019-09-02T08:54:15] Items walked 863997 0: [2019-09-02T09:19:38] Items walked 7289486 0: [2019-09-02T09:19:46] Items walked 9080593 0: [2019-09-02T09:19:55] Items walked 9334035 0: [2019-09-02T09:20:05] Items walked 9564552 0: [2019-09-02T09:20:20] Items walked 9746523 0: [2019-09-02T09:20:30] Items walked 9956014 0: [2019-09-02T09:20:46] Items walked 10128637 0: [2019-09-02T09:21:10] Items walked 10389513 0: [2019-09-02T09:21:20] Items walked 10528663 0: [2019-09-02T09:21:33] Items walked 10528663 0: [2019-09-02T09:21:43] Items walked 10528663 0: [2019-09-02T09:21:54] Items walked 10528663 0: [2019-09-02T09:22:03] Walked 10528663 items in 3391.952981 seconds (3104.012072 files/sec) 0: [2019-09-02T09:22:03] Walking destination path 0: [2019-09-02T09:22:03] Walking /oak/stanford/groups/pritch/scratch.copy/groups/pritch 0: [2019-09-02T09:39:47] Items walked 59941 0: [2019-09-02T09:40:00] Items walked 8726652 0: [2019-09-02T09:40:08] Items walked 8811009 0: [2019-09-02T09:40:20] Removing 10404883 items 0: [2019-09-02T09:40:26] level=26 min=0 max=1 sum=3 rate=7.486081 secs=0.400744 0: [2019-09-02T09:40:27] level=25 min=0 max=1 sum=185 rate=403.045488 secs=0.459005 0: [2019-09-02T09:40:27] level=24 min=0 max=1 sum=783 rate=1699.719168 secs=0.460664 0: [2019-09-02T09:40:28] level=23 min=3 max=4 sum=3176 rate=5708.211666 secs=0.556391 0: [2019-09-02T09:40:29] level=22 min=7 max=8 sum=7293 rate=11232.519041 secs=0.649276 0: [2019-09-02T09:40:29] level=21 min=8 max=9 sum=8477 rate=11066.037630 secs=0.766038 0: [2019-09-02T09:40:31] level=20 min=23 max=24 sum=24537 rate=13170.356973 secs=1.863047 0: [2019-09-02T09:40:33] level=19 min=44 max=45 sum=45833 rate=22867.105954 secs=2.004320 0: [2019-09-02T09:40:36] level=18 min=76 max=77 sum=78818 rate=27849.612712 secs=2.830129 0: [2019-09-02T09:40:41] level=17 min=138 max=139 sum=142200 rate=32225.704622 secs=4.412627 0: [2019-09-02T09:41:04] level=16 min=625 max=626 sum=640878 rate=27090.786318 secs=23.656678 0: [2019-09-02T09:42:44] level=15 min=611 max=612 sum=625755 rate=6303.651528 secs=99.268654 0: [2019-09-02T09:47:07] level=14 min=1291 max=1292 sum=1322053 rate=5012.764106 secs=263.737326 0: [2019-09-02T10:16:42] level=13 min=4246 max=4247 sum=4348287 rate=2450.833574 secs=1774.207374 0: [2019-09-02T10:34:13] level=12 min=2252 max=2253 sum=2306862 rate=2193.813409 secs=1051.530632 0: [2019-09-02T10:36:53] level=11 min=771 max=772 sum=790333 rate=4943.302018 secs=159.879570 0: [2019-09-02T10:36:56] level=10 min=56 max=57 sum=58175 rate=25132.111486 secs=2.314768 0: [2019-09-02T10:36:56] level=9 min=1 max=2 sum=1225 rate=1446.678564 secs=0.846767 0: [2019-09-02T10:36:57] level=8 min=0 max=1 sum=10 rate=14.656902 secs=0.682272 0: [2019-09-02T10:36:57] Removed 10404883 items in 3397.492401 seconds (3062.518402 items/sec) 0: [2019-09-02T10:36:58] Copying items to destination 0: [2019-09-02T10:36:58] Copying to /oak/stanford/groups/pritch/scratch.copy/groups/pritch 0: [2019-09-02T10:36:58] Items: 10408855 0: [2019-09-02T10:36:58] Directories: 104 0: [2019-09-02T10:36:58] Files: 10408751 0: [2019-09-02T10:36:58] Links: 0 0: [2019-09-02T10:36:58] Data: 40.035 TB (4.033 MB per file) 0: [2019-09-02T10:37:10] Creating directories. 0: [2019-09-02T10:37:10] level=4 min=0 max=0 sum=0 rate=0.000000/sec secs=0.017780 0: [2019-09-02T10:37:10] level=5 min=0 max=0 sum=0 rate=0.000000/sec secs=0.012852 0: [2019-09-02T10:37:10] level=6 min=0 max=0 sum=0 rate=0.000000/sec secs=0.022612 … 0: [2019-09-02T10:37:12] Created 104 directories in 2.017779 seconds (51.541817 items/sec) 0: [2019-09-02T10:37:12] Creating files. 0: [2019-09-02T11:12:10] Created 10408751 items in 2097.987293 seconds (4961.303166 items/sec) 0: [2019-09-02T11:12:10] Copying data. 0: [2019-09-02T14:12:18] Copy data: 40.035 TB (44018758689546 bytes) 0: [2019-09-02T14:12:18] Copy rate: 3.793 GB/s (44018758689546 bytes in 10807.864147 seconds) 0: [2019-09-02T14:12:18] Syncing data to disk. 0: [2019-09-02T14:12:32] Sync completed in 14.066516 seconds. 0: [2019-09-02T14:12:32] Setting ownership, permissions, and timestamps. 0: [2019-09-02T14:32:09] Updated 10408855 items in 1177.167491 seconds (8842.288869 items/sec) 0: [2019-09-02T14:32:09] Syncing directory updates to disk. 0: [2019-09-02T14:32:22] Sync completed in 13.326690 seconds. 0: [2019-09-02T14:32:23] Started: Sep-02-2019,10:36:58 0: [2019-09-02T14:32:23] Completed: Sep-02-2019,14:32:22 0: [2019-09-02T14:32:23] Seconds: 14124.547 0: [2019-09-02T14:32:23] Items: 10408855 0: [2019-09-02T14:32:23] Directories: 104 0: [2019-09-02T14:32:23] Files: 10408751 0: [2019-09-02T14:32:23] Links: 0 0: [2019-09-02T14:32:23] Data: 40.035 TB (44018758689546 bytes) 0: [2019-09-02T14:32:23] Rate: 2.902 GB/s (44018758689546 bytes in 14124.547 seconds) Oak doesn't use SSDs on the MDTs at the moment, and I fear that the JBD2 journal (4GB) was full at some point. I'm investigating a file-level backup/restore-to-SSD option (we would also like to move from 512-byte inodes to 1024 for PFL later).
          laisiyao Lai Siyao added a comment -

          Stephane, I'm not familiar with dsync, where can I get its source code? And how do you use it?

          laisiyao Lai Siyao added a comment - Stephane, I'm not familiar with dsync, where can I get its source code? And how do you use it?

          You can see the heavy load at the moment of the crash from the vmcore I uploaded:

                KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl1.x86_64/vmlinux
              DUMPFILE: vmcore-oak-md1-s2-2019-08-31-18-02-59  [PARTIAL DUMP]
                  CPUS: 24
                  DATE: Sat Aug 31 18:02:35 2019
                UPTIME: 1 days, 11:25:03
          LOAD AVERAGE: 425.53, 389.13, 349.79
                 TASKS: 1533
              NODENAME: oak-md1-s2
               RELEASE: 3.10.0-957.27.2.el7_lustre.pl1.x86_64
               VERSION: #1 SMP Mon Aug 5 15:28:37 PDT 2019
               MACHINE: x86_64  (3399 Mhz)
                MEMORY: 127.8 GB
                 PANIC: "Kernel panic - not syncing: LBUG"
                   PID: 33146
               COMMAND: "mdt00_037"
                  TASK: ffff8e69f5a8d140  [THREAD_INFO: ffff8e6957ae8000]
                   CPU: 2
                 STATE: TASK_RUNNING (PANIC)
          
          sthiell Stephane Thiell added a comment - You can see the heavy load at the moment of the crash from the vmcore I uploaded: KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.27.2.el7_lustre.pl1.x86_64/vmlinux DUMPFILE: vmcore-oak-md1-s2-2019-08-31-18-02-59 [PARTIAL DUMP] CPUS: 24 DATE: Sat Aug 31 18:02:35 2019 UPTIME: 1 days, 11:25:03 LOAD AVERAGE: 425.53, 389.13, 349.79 TASKS: 1533 NODENAME: oak-md1-s2 RELEASE: 3.10.0-957.27.2.el7_lustre.pl1.x86_64 VERSION: #1 SMP Mon Aug 5 15:28:37 PDT 2019 MACHINE: x86_64 (3399 Mhz) MEMORY: 127.8 GB PANIC: "Kernel panic - not syncing: LBUG" PID: 33146 COMMAND: "mdt00_037" TASK: ffff8e69f5a8d140 [THREAD_INFO: ffff8e6957ae8000] CPU: 2 STATE: TASK_RUNNING (PANIC)
          sthiell Stephane Thiell added a comment - - edited

          Thanks Peter.

          We lost the original crash dump, because we were in the process of updating the kernel on Oak. But we had another occurrence of the same crash with the new kernel. So I just uploaded the latest vmcore as vmcore-oak-md1-s2-2019-08-31-18-02-59. New kernel debuginfo rpms have also been uploaded to the WC ftp server as kernel-debuginfo-3.10.0-957.27.2.el7_lustre.pl1.x86_64.rpm and kernel-debuginfo-common-x86_64-3.10.0-957.27.2.el7_lustre.pl1.x86_64.rpm. For convenience I'm attaching the output of foreach bt as foreach_bt-crash-oak-md1-s2-2019-08-31-18-02-59.log and the dmesg-vmcore.txt as vmcore-dmesg-oak-md1-s2-2019-08-31-18-02-59.txt

          Lai, thanks for looking into this. I wanted to add that it happened under heavy load, both in terms of read/write bandwidth on the OSTs and metadata load. Big dsync parallel transfers were running along with other user jobs with unusual large sequential reads (plink2). My (2-cent) theory is that the OSTs were so loaded that this MDT couldn't keep up allocating new stripes on them, and we hit that bug. But let me know what you think and what else we can provide. Thanks much!

          sthiell Stephane Thiell added a comment - - edited Thanks Peter. We lost the original crash dump, because we were in the process of updating the kernel on Oak. But we had another occurrence of the same crash with the new kernel. So I just uploaded the latest vmcore as vmcore-oak-md1-s2-2019-08-31-18-02-59 . New kernel debuginfo rpms have also been uploaded to the WC ftp server as kernel-debuginfo-3.10.0-957.27.2.el7_lustre.pl1.x86_64.rpm and kernel-debuginfo-common-x86_64-3.10.0-957.27.2.el7_lustre.pl1.x86_64.rpm . For convenience I'm attaching the output of foreach bt as foreach_bt-crash-oak-md1-s2-2019-08-31-18-02-59.log and the dmesg-vmcore.txt as vmcore-dmesg-oak-md1-s2-2019-08-31-18-02-59.txt Lai, thanks for looking into this. I wanted to add that it happened under heavy load, both in terms of read/write bandwidth on the OSTs and metadata load. Big dsync parallel transfers were running along with other user jobs with unusual large sequential reads (plink2). My (2-cent) theory is that the OSTs were so loaded that this MDT couldn't keep up allocating new stripes on them, and we hit that bug. But let me know what you think and what else we can provide. Thanks much!

          People

            laisiyao Lai Siyao
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: