Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.14.0
    • lustre-master-ib #404
    • 3
    • 9223372036854775807

    Description

      1 MDS hung during mount during failover process.

      soak-9 console

      [ 3961.086008] mount.lustre    D ffff8f5730291070     0  5206   5205 0x00000082
      [ 3961.093940] Call Trace:
      [ 3961.096752]  [<ffffffffc1333360>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 3961.105419]  [<ffffffff99380a09>] schedule+0x29/0x70
      [ 3961.110980]  [<ffffffff9937e511>] schedule_timeout+0x221/0x2d0
      [ 3961.117509]  [<ffffffff98ce10f6>] ? select_task_rq_fair+0x5a6/0x760
      [ 3961.124565]  [<ffffffffc1333360>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 3961.133226]  [<ffffffff99380dbd>] wait_for_completion+0xfd/0x140
      [ 3961.139955]  [<ffffffff98cdb4c0>] ? wake_up_state+0x20/0x20
      [ 3961.146222]  [<ffffffffc12f8b84>] llog_process_or_fork+0x254/0x520 [obdclass]
      [ 3961.154226]  [<ffffffffc12f8e64>] llog_process+0x14/0x20 [obdclass]
      [ 3961.161271]  [<ffffffffc132b055>] class_config_parse_llog+0x125/0x350 [obdclass]
      [ 3961.169552]  [<ffffffffc15beaf8>] mgc_process_cfg_log+0x788/0xc40 [mgc]
      [ 3961.176961]  [<ffffffffc15c223f>] mgc_process_log+0x3bf/0x920 [mgc]
      [ 3961.184004]  [<ffffffffc1333360>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 3961.192673]  [<ffffffffc15c3cc3>] mgc_process_config+0xc63/0x1870 [mgc]
      [ 3961.200110]  [<ffffffffc1336f27>] lustre_process_log+0x2d7/0xad0 [obdclass]
      [ 3961.207925]  [<ffffffffc136a064>] server_start_targets+0x12d4/0x2970 [obdclass]
      [ 3961.216133]  [<ffffffffc1339fe7>] ? lustre_start_mgc+0x257/0x2420 [obdclass]
      [ 3961.224020]  [<ffffffff98e23db6>] ? kfree+0x106/0x140
      [ 3961.229698]  [<ffffffffc1333360>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 3961.238396]  [<ffffffffc136c7cc>] server_fill_super+0x10cc/0x1890 [obdclass]
      [ 3961.246314]  [<ffffffffc133cd88>] lustre_fill_super+0x498/0x990 [obdclass]
      [ 3961.254033]  [<ffffffffc133c8f0>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      [ 3961.262511]  [<ffffffff98e4e7df>] mount_nodev+0x4f/0xb0
      [ 3961.268390]  [<ffffffffc1334d98>] lustre_mount+0x18/0x20 [obdclass]
      [ 3961.275401]  [<ffffffff98e4f35e>] mount_fs+0x3e/0x1b0
      [ 3961.281064]  [<ffffffff98e6d507>] vfs_kern_mount+0x67/0x110
      [ 3961.287299]  [<ffffffff98e6fc5f>] do_mount+0x1ef/0xce0
      [ 3961.293070]  [<ffffffff98e4737a>] ? __check_object_size+0x1ca/0x250
      [ 3961.300073]  [<ffffffff98e250ec>] ? kmem_cache_alloc_trace+0x3c/0x200
      [ 3961.307276]  [<ffffffff98e70a93>] SyS_mount+0x83/0xd0
      [ 3961.312939]  [<ffffffff9938dede>] system_call_fastpath+0x25/0x2a
      [ 3961.319665]  [<ffffffff9938de21>] ? system_call_after_swapgs+0xae/0x146
      [ 4024.321554] Lustre: soaked-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
      [ 4024.360505] Lustre: soaked-MDT0001: in recovery but waiting for the first client to connect
      [ 4025.087731] Lustre: soaked-MDT0001: Will be in recovery for at least 2:30, or until 27 clients reconnect
      
      

      Attachments

        1. lustre-log.1588133843.6068-soak-8
          124.12 MB
        2. soak-11.log-051120
          944 kB
        3. soak-9.log-20200419.gz
          184 kB
        4. trace-8
          1002 kB
        5. trace-s-11-051120
          997 kB
        6. trace-soak8
          976 kB

        Issue Links

          Activity

            [LU-13469] MDS hung during mount

            sorry, it's not quite enough information..
            it would be very helpful if you can start the test and then grab logs (let's start with messages and/or consoles) from all the nodes.
            one interesting thing from the log attached:

            [ 1279.175117] sd 0:0:1:1: task abort: SUCCESS scmd(ffff99512626abc0)
            [ 1279.182085] sd 0:0:1:1: attempting task abort! scmd(ffff99512626aa00)
            [ 1279.189301] sd 0:0:1:1: [sdi] tag#96 CDB: Write(16) 8a 00 00 00 00 00 02 a8 01 90 00 00 00 08 00 00
            [ 1279.199423] scsi target0:0:1: handle(0x0009), sas_address(0x50080e52ff4f0004), phy(0)
            [ 1279.208168] scsi target0:0:1: enclosure logical id(0x500605b005d6e9a0), slot(3) 
            [ 1279.367751] sd 0:0:1:1: task abort: SUCCESS scmd(ffff99512626aa00)
            [ 1279.374697] sd 0:0:1:1: attempting task abort! scmd(ffff99512626a840)
            [ 1279.381918] sd 0:0:1:1: [sdi] tag#95 CDB: Write(16) 8a 00 00 00 00 00 02 a8 01 70 00 00 00 08 00 00
            [ 1279.392037] scsi target0:0:1: handle(0x0009), sas_address(0x50080e52ff4f0004), phy(0)
            

            I guess this shouldn't happen during this test?

            bzzz Alex Zhuravlev added a comment - sorry, it's not quite enough information.. it would be very helpful if you can start the test and then grab logs (let's start with messages and/or consoles) from all the nodes. one interesting thing from the log attached: [ 1279.175117] sd 0:0:1:1: task abort: SUCCESS scmd(ffff99512626abc0) [ 1279.182085] sd 0:0:1:1: attempting task abort! scmd(ffff99512626aa00) [ 1279.189301] sd 0:0:1:1: [sdi] tag#96 CDB: Write(16) 8a 00 00 00 00 00 02 a8 01 90 00 00 00 08 00 00 [ 1279.199423] scsi target0:0:1: handle(0x0009), sas_address(0x50080e52ff4f0004), phy(0) [ 1279.208168] scsi target0:0:1: enclosure logical id(0x500605b005d6e9a0), slot(3) [ 1279.367751] sd 0:0:1:1: task abort: SUCCESS scmd(ffff99512626aa00) [ 1279.374697] sd 0:0:1:1: attempting task abort! scmd(ffff99512626a840) [ 1279.381918] sd 0:0:1:1: [sdi] tag#95 CDB: Write(16) 8a 00 00 00 00 00 02 a8 01 70 00 00 00 08 00 00 [ 1279.392037] scsi target0:0:1: handle(0x0009), sas_address(0x50080e52ff4f0004), phy(0) I guess this shouldn't happen during this test?
            sarah Sarah Liu added a comment -

            there are 2 kinds of mds fault injections, I think when the crash happened, it was in the middle of mds_failover
            1. mds1 failover
            reboot mds1
            mount the disks to failover pair mds2
            after mds1 up, fail back the disks to mds1

            2. mds restart
            this is similar to mds failover, just not mounting the disk to the failover pair but wait and mount the disk back when the server is up

            sarah Sarah Liu added a comment - there are 2 kinds of mds fault injections, I think when the crash happened, it was in the middle of mds_failover 1. mds1 failover reboot mds1 mount the disks to failover pair mds2 after mds1 up, fail back the disks to mds1 2. mds restart this is similar to mds failover, just not mounting the disk to the failover pair but wait and mount the disk back when the server is up

            thanks.. looking at the logs - there were lots of invalidations in OSP which shouldn't be common - regular failover shouldn't cause this.
            can you please explain what the test is doing?

            bzzz Alex Zhuravlev added a comment - thanks.. looking at the logs - there were lots of invalidations in OSP which shouldn't be common - regular failover shouldn't cause this. can you please explain what the test is doing?
            sarah Sarah Liu added a comment -

            I just uploaded the lustre log and trace of soak-8, with panic_on_lbug=0. Please let me know if anything else needed.

            sarah Sarah Liu added a comment - I just uploaded the lustre log and trace of soak-8, with panic_on_lbug=0. Please let me know if anything else needed.

            sarah I don't think there is any relation here. I think you can either modify the source or set panic_on_lbug=0 in the scripts? or in modules conf file

            bzzz Alex Zhuravlev added a comment - sarah I don't think there is any relation here. I think you can either modify the source or set panic_on_lbug=0 in the scripts? or in modules conf file
            sarah Sarah Liu added a comment -

            Hi Alex,

            I am having a weird issue when setting up the panic_on_lbug=0 permanently on soak-8(MGS), Here is what I did
            1. lctl set_param -P panic_on_lbug=0
            2. umount and remount as ldiskfs and check the config log, and the value was set as 0
            3. mount lustre, check the panic_on_lbug=1, it didn't change.

            I am not sure if this is related to the llog issue here, can you please check? Do you need any log for this? If it is unrelated, I will create a new ticket, and may need delete bad stuff and restart.

            Thanks

            sarah Sarah Liu added a comment - Hi Alex, I am having a weird issue when setting up the panic_on_lbug=0 permanently on soak-8(MGS), Here is what I did 1. lctl set_param -P panic_on_lbug=0 2. umount and remount as ldiskfs and check the config log, and the value was set as 0 3. mount lustre, check the panic_on_lbug=1, it didn't change. I am not sure if this is related to the llog issue here, can you please check? Do you need any log for this? If it is unrelated, I will create a new ticket, and may need delete bad stuff and restart. Thanks
            sarah Sarah Liu added a comment -

            Hi Alex, I will restart with the debug on

            sarah Sarah Liu added a comment - Hi Alex, I will restart with the debug on

            would it be possible to grab full Lustre logs with libcfs_panic_on_lbug=0 ?

            bzzz Alex Zhuravlev added a comment - would it be possible to grab full Lustre logs with libcfs_panic_on_lbug=0 ?
            sarah Sarah Liu added a comment -

            The LBUG seems can be reproduced easily on soak, I have come cross this 2/3 times so far.

            sarah Sarah Liu added a comment - The LBUG seems can be reproduced easily on soak, I have come cross this 2/3 times so far.

            I think this share root cause with LU-13195 - basically llog got corrupted and then lead to different symptoms.

            bzzz Alex Zhuravlev added a comment - I think this share root cause with LU-13195 - basically llog got corrupted and then lead to different symptoms.
            sarah Sarah Liu added a comment - - edited

            I restarted the test and the system crashed overnight on 3 MDSs . There are crash dumps available on spirit for soak-8/9/10, the path is:
            /scratch/dumps/soak-9.spirit.whamcloud.com/10.10.1.109-2020-04-22-01:59:08

            Last login: Tue Apr 21 12:05:20 on ttys001
            DDN11368M:~ wliu$ ssh w3liu@ssh-1.spirit.whamcloud.int
            Last login: Tue Apr 21 18:50:40 2020 from host-64-47-132-181.masergy.com
            [w3liu@ssh-1 ~]$ ssh root@soak-16
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Password: 
            Last login: Tue Apr 21 17:06:58 2020 from ssh-1.spirit.whamcloud.com
            [root@soak-16 ~]# ssh soak-9
            Last login: Tue Mar  3 23:55:21 2020 from soak-16.spirit.whamcloud.com
            [root@soak-9 ~]# ls /tmp/
            krb5ccmachine_SPIRIT.WHAMCLOUD.COM                                                      systemd-private-ba1d46467241463c9bf777e964af7eaa-ntpd.service-yRleX7
            openldap-tlsmc-certs--F2A4EB14DB359420C21982FE2480932C55B52B238FF1E9079F379137D27D13CC  systemd-private-e3a3d4c94d1741a68674eb88c50144a6-ntpd.service-5xNWmx
            [root@soak-9 ~]# exit
            logout
            Connection to soak-9 closed.
            [root@soak-16 ~]# ssh soak-8
            Last login: Mon Apr  6 19:47:28 2020 from soak-16.spirit.whamcloud.com
            [root@soak-8 ~]# ls /tmp/
            krb5ccmachine_SPIRIT.WHAMCLOUD.COM                                                      systemd-private-c062f40ef20c422c98b9375d108e2d78-ntpd.service-bpcY2l
            openldap-tlsmc-certs--283FB28B7CD85BA8C6BD6F2F1837458C8301AFC4036ADCD22B40A473F4F87DAE  systemd-private-d5bd51fa26ff4fc3aab039698e87f37f-ntpd.service-wXZA7q
            systemd-private-1faf9075d15b445eab312529a1bc7e2e-ntpd.service-oOzCQD                    systemd-private-f83975b64fc54f1498329d9f871d565d-ntpd.service-tdxy0M
            [root@soak-8 ~]# exit
            logout
            Connection to soak-8 closed.
            [root@soak-16 ~]# screen -x
            [detached from 12070.pts-0.soak-16]
            [root@soak-16 ~]# exit
            logout
            Connection to soak-16 closed.
            [w3liu@ssh-1 ~]$ ls
            aeon          ddn-681             HP259  lu-13050.stack  savelog     soak-conf        soak-homedir.tgz  soak-toolbin  tmp          toolbin.tgz
            backup-frank  download-build-bin  logs   repo            soak-5.tgz  soak-format.tgz  soak-merge        test-git      tmp-scripts
            [w3liu@ssh-1 ~]$ mkdir lu-13469
            [w3liu@ssh-1 ~]$ cd lu-13469/
            [w3liu@ssh-1 lu-13469]$ scp soak-16://scratch/logs/console/soak-9.log-20200419.tgz .
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            scp: //scratch/logs/console/soak-9.log-20200419.tgz: No such file or directory
            [w3liu@ssh-1 lu-13469]$ ssh soak-16
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Last login: Mon Apr 20 21:08:14 2020 from ssh-1.spirit.whamcloud.com
            [w3liu@soak-16 ~]$ exit
            logout
            Connection to soak-16 closed.
            [w3liu@ssh-1 lu-13469]$ ssh root@soak-16
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Password: 
            Last login: Tue Apr 21 20:26:23 2020 from ssh-1.spirit.whamcloud.com
            [root@soak-16 ~]# screen -x
            [detached from 12070.pts-0.soak-16]
            [root@soak-16 ~]# exit
            logout
            Connection to soak-16 closed.
            [w3liu@ssh-1 lu-13469]$ scp root@soak-16:/scratch/logs/console/soak-9.log-20200419.tgz .
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Password: 
            scp: /scratch/logs/console/soak-9.log-20200419.tgz: No such file or directory
            [w3liu@ssh-1 lu-13469]$ scp root@soak-16:/scratch/logs/console/soak-9.log-20200419.gz .
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Password: 
            soak-9.log-20200419.gz                                                                                                          100%  184KB 183.6KB/s   00:00    
            [w3liu@ssh-1 lu-13469]$ ssh root@soak-16
            The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established.
            RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44.
            Are you sure you want to continue connecting (yes/no)? yes
            Warning: Permanently added 'soak-16' (RSA) to the list of known hosts.
            Password: 
            Last login: Tue Apr 21 20:29:06 2020 from ssh-1.spirit.whamcloud.com
            [root@soak-16 ~]# screen -x
            
            [338152.399548] Lustre: Skipped 1 previous similar message
            [338153.078244] Lustre: soaked-MDT0000: Client 333b7f48-68da-4 (at 192.168.1.120@o2ib) reconnecting
            [338153.088076] Lustre: Skipped 6 previous similar messages
            [338154.126802] Lustre: soaked-MDT0000: Client b8a5f1df-2f21-4 (at 192.168.1.119@o2ib) reconnecting
            [338154.136639] Lustre: Skipped 8 previous similar messages
            [338533.965788] LNet: 57040:0:(o2iblnd_cb.c:3394:kiblnd_check_conns()) Timed out tx for 192.168.1.110@o2ib: 1 seconds
            [338533.977364] LNet: 57040:0:(o2iblnd_cb.c:3394:kiblnd_check_conns()) Skipped 146 previous similar messages
            [338610.961038] LNetError: 57040:0:(lib-move.c:3680:lnet_handle_recovery_reply()) peer NI (192.168.1.110@o2ib) recovery failed with -110
            [338610.974469] LNetError: 57040:0:(lib-move.c:3680:lnet_handle_recovery_reply()) Skipped 122 previous similar messages
            [338753.412960] Lustre: soaked-MDT0000: Client f59a1caf-b57f-4 (at 192.168.1.130@o2ib) reconnecting
            [338753.422834] Lustre: Skipped 2 previous similar messages
            [338753.428846] Lustre: soaked-MDT0000: Connection restored to 3a61c982-16a5-4 (at 192.168.1.130@o2ib)
            [338753.438969] Lustre: Skipped 20 previous similar messages
            [338760.544124] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID
            [339000.160995] Lustre: 57949:0:(mdd_device.c:1825:mdd_changelog_clear()) soaked-MDD0000: No entry for user 1
            [340067.766748] LustreError: 57125:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g
            6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524
            [340067.789495] LustreError: 57125:0:(out_handler.c:918:out_tx_end()) Skipped 299 previous similar messages
            [340131.819910] LustreError: 58494:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g
            6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524
            [340131.842686] LustreError: 58494:0:(out_handler.c:918:out_tx_end()) Skipped 1855 previous similar messages
            [340265.331826] LustreError: 57983:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g
            6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524
            [340265.354552] LustreError: 57983:0:(out_handler.c:918:out_tx_end()) Skipped 2651 previous similar messages
            [340419.033915] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2
            [340419.048319] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2
            [340424.618904] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2
            [340424.633301] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 22 previous similar messages
            [340424.644781] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2
            [340424.659646] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 22 previous similar messages
            [340432.347952] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2
            ...
            [340475.463999] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -116
            [340475.479044] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 177 previous similar messages
            [340516.956460] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2
            [340516.970860] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 294 previous similar messages
            [340516.982425] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2
            [340516.997295] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 294 previous similar messages
            [340581.591455] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2
            [340581.605835] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 172 previous similar messages
            [340581.617394] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2
            [340581.632256] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 172 previous similar messages
            [340681.741732] LustreError: 58187:0:(llog_osd.c:616:llog_osd_write_rec()) soaked-MDT0003-osp-MDT0000: index 36369 already set in log bitmap
            [340681.755535] LustreError: 58187:0:(llog_osd.c:618:llog_osd_write_rec()) LBUG
            [340681.763417] Pid: 58187, comm: mdt_rdpg00_004 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Tue Feb 11 19:11:56 UTC 2020
            [340681.775080] Call Trace:
            [340681.777921]  [<ffffffffc1041e6c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            [340681.785341]  [<ffffffffc1041f1c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            [340681.792402]  [<ffffffffc10d34e5>] llog_osd_write_rec+0x1395/0x14b0 [obdclass]
            [340681.800519]  [<ffffffffc10c22c3>] llog_write_rec+0x273/0x540 [obdclass]
            [340681.808044]  [<ffffffffc10c79f0>] llog_cat_add_rec+0x220/0x880 [obdclass]
            [340681.815763]  [<ffffffffc10bf422>] llog_add+0x162/0x1d0 [obdclass]
            [340681.822701]  [<ffffffffc1490a01>] sub_updates_write+0x2e8/0xdd7 [ptlrpc]
            [340681.830372]  [<ffffffffc1472027>] top_trans_stop+0x827/0xbc0 [ptlrpc]
            [340681.837748]  [<ffffffffc19ab98c>] lod_trans_stop+0x25c/0x340 [lod]
            [340681.844795]  [<ffffffffc1a6623e>] mdd_trans_stop+0x2e/0x174 [mdd]
            [340681.851729]  [<ffffffffc1a59a07>] mdd_attr_set+0x6a7/0xdd0 [mdd]
            [340681.858558]  [<ffffffffc18e21b9>] mdt_mfd_close+0x6b9/0x860 [mdt]
            [340681.865523]  [<ffffffffc18e7a71>] mdt_close_internal+0x121/0x220 [mdt]
            [340681.872958]  [<ffffffffc18e7d91>] mdt_close+0x221/0x790 [mdt]
            [340681.879515]  [<ffffffffc145db9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc]
            [340681.887378]  [<ffffffffc1402f76>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
            [340681.896074]  [<ffffffffc1407424>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
            [340681.903228]  [<ffffffff842c61f1>] kthread+0xd1/0xe0
            [340681.908808]  [<ffffffff8498dd37>] ret_from_fork_nospec_end+0x0/0x39
            [340681.915929]  [<ffffffffffffffff>] 0xffffffffffffffff
            [340681.921610] Kernel panic - not syncing: LBUG
            [340681.926473] CPU: 7 PID: 58187 Comm: mdt_rdpg00_004 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.9.1.el7_lustre.x86_64 #1
            [340681.940752] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
            [340681.953368] Call Trace:
            [340681.956203]  [<ffffffff8497ac23>] dump_stack+0x19/0x1b
            [340681.962038]  [<ffffffff84974967>] panic+0xe8/0x21f
            [340681.967489]  [<ffffffffc1041f6b>] lbug_with_loc+0x9b/0xa0 [libcfs]
            [340681.974504]  [<ffffffffc10d34e5>] llog_osd_write_rec+0x1395/0x14b0 [obdclass]
            [340681.982585]  [<ffffffffc10c22c3>] llog_write_rec+0x273/0x540 [obdclass]
            [340681.990085]  [<ffffffffc10c79f0>] llog_cat_add_rec+0x220/0x880 [obdclass]
            [340681.997775]  [<ffffffffc10bf422>] llog_add+0x162/0x1d0 [obdclass]
            [340682.004696]  [<ffffffffc10edf19>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            [340682.012803]  [<ffffffffc1490a01>] sub_updates_write+0x2e8/0xdd7 [ptlrpc]
            [340682.020422]  [<ffffffffc1475b43>] ? update_records_attr_set_pack+0xc3/0xf0 [ptlrpc]
            [340682.029124]  [<ffffffffc14906fd>] ? prepare_writing_updates.isra.12+0x139/0x14a [ptlrpc]
            [340682.038311]  [<ffffffffc1472027>] top_trans_stop+0x827/0xbc0 [ptlrpc]
            [340682.045606]  [<ffffffffc19ab98c>] lod_trans_stop+0x25c/0x340 [lod]
            [340682.052620]  [<ffffffffc1a6623e>] mdd_trans_stop+0x2e/0x174 [mdd]
            [340682.059528]  [<ffffffffc1a59a07>] mdd_attr_set+0x6a7/0xdd0 [mdd]
            [340682.066362]  [<ffffffffc13f5e97>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
            [340682.074451]  [<ffffffffc18e21b9>] mdt_mfd_close+0x6b9/0x860 [mdt]
            [340682.081377]  [<ffffffffc13f636f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
            [340682.089646]  [<ffffffffc18e7a71>] mdt_close_internal+0x121/0x220 [mdt]
            [340682.097039]  [<ffffffffc18e7d91>] mdt_close+0x221/0x790 [mdt]
            [340682.103593]  [<ffffffffc145db9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc]
            [340682.111377]  [<ffffffffc102902e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
            [340682.119391]  [<ffffffffc1402f76>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
            [340682.128070]  [<ffffffffc13fde2e>] ? ptlrpc_wait_event+0x12e/0x5b0 [ptlrpc]
            [340682.135852]  [<ffffffff842d3903>] ? __wake_up+0x13/0x20
            [340682.141814]  [<ffffffffc140254a>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc]
            [340682.150657]  [<ffffffff842c72e0>] ? wake_up_atomic_t+0x30/0x30
            [340682.157298]  [<ffffffffc1407424>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
            [340682.164440]  [<ffffffffc1406870>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
            [340682.172793]  [<ffffffff842c61f1>] kthread+0xd1/0xe0
            [340682.178335]  [<ffffffff842c6120>] ? insert_kthread_work+0x40/0x40
            [340682.185231]  [<ffffffff8498dd37>] ret_from_fork_nospec_begin+0x21/0x21
            [340682.192614]  [<ffffffff842c6120>] ? insert_kthread_work+0x40/0x40
            
            sarah Sarah Liu added a comment - - edited I restarted the test and the system crashed overnight on 3 MDSs . There are crash dumps available on spirit for soak-8/9/10, the path is: /scratch/dumps/soak-9.spirit.whamcloud.com/10.10.1.109-2020-04-22-01:59:08 Last login: Tue Apr 21 12:05:20 on ttys001 DDN11368M:~ wliu$ ssh w3liu@ssh-1.spirit.whamcloud.int Last login: Tue Apr 21 18:50:40 2020 from host-64-47-132-181.masergy.com [w3liu@ssh-1 ~]$ ssh root@soak-16 The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Password: Last login: Tue Apr 21 17:06:58 2020 from ssh-1.spirit.whamcloud.com [root@soak-16 ~]# ssh soak-9 Last login: Tue Mar 3 23:55:21 2020 from soak-16.spirit.whamcloud.com [root@soak-9 ~]# ls /tmp/ krb5ccmachine_SPIRIT.WHAMCLOUD.COM systemd-private-ba1d46467241463c9bf777e964af7eaa-ntpd.service-yRleX7 openldap-tlsmc-certs--F2A4EB14DB359420C21982FE2480932C55B52B238FF1E9079F379137D27D13CC systemd-private-e3a3d4c94d1741a68674eb88c50144a6-ntpd.service-5xNWmx [root@soak-9 ~]# exit logout Connection to soak-9 closed. [root@soak-16 ~]# ssh soak-8 Last login: Mon Apr 6 19:47:28 2020 from soak-16.spirit.whamcloud.com [root@soak-8 ~]# ls /tmp/ krb5ccmachine_SPIRIT.WHAMCLOUD.COM systemd-private-c062f40ef20c422c98b9375d108e2d78-ntpd.service-bpcY2l openldap-tlsmc-certs--283FB28B7CD85BA8C6BD6F2F1837458C8301AFC4036ADCD22B40A473F4F87DAE systemd-private-d5bd51fa26ff4fc3aab039698e87f37f-ntpd.service-wXZA7q systemd-private-1faf9075d15b445eab312529a1bc7e2e-ntpd.service-oOzCQD systemd-private-f83975b64fc54f1498329d9f871d565d-ntpd.service-tdxy0M [root@soak-8 ~]# exit logout Connection to soak-8 closed. [root@soak-16 ~]# screen -x [detached from 12070.pts-0.soak-16] [root@soak-16 ~]# exit logout Connection to soak-16 closed. [w3liu@ssh-1 ~]$ ls aeon ddn-681 HP259 lu-13050.stack savelog soak-conf soak-homedir.tgz soak-toolbin tmp toolbin.tgz backup-frank download-build-bin logs repo soak-5.tgz soak-format.tgz soak-merge test-git tmp-scripts [w3liu@ssh-1 ~]$ mkdir lu-13469 [w3liu@ssh-1 ~]$ cd lu-13469/ [w3liu@ssh-1 lu-13469]$ scp soak-16://scratch/logs/console/soak-9.log-20200419.tgz . The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. scp: //scratch/logs/console/soak-9.log-20200419.tgz: No such file or directory [w3liu@ssh-1 lu-13469]$ ssh soak-16 The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Last login: Mon Apr 20 21:08:14 2020 from ssh-1.spirit.whamcloud.com [w3liu@soak-16 ~]$ exit logout Connection to soak-16 closed. [w3liu@ssh-1 lu-13469]$ ssh root@soak-16 The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Password: Last login: Tue Apr 21 20:26:23 2020 from ssh-1.spirit.whamcloud.com [root@soak-16 ~]# screen -x [detached from 12070.pts-0.soak-16] [root@soak-16 ~]# exit logout Connection to soak-16 closed. [w3liu@ssh-1 lu-13469]$ scp root@soak-16:/scratch/logs/console/soak-9.log-20200419.tgz . The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Password: scp: /scratch/logs/console/soak-9.log-20200419.tgz: No such file or directory [w3liu@ssh-1 lu-13469]$ scp root@soak-16:/scratch/logs/console/soak-9.log-20200419.gz . The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Password: soak-9.log-20200419.gz 100% 184KB 183.6KB/s 00:00 [w3liu@ssh-1 lu-13469]$ ssh root@soak-16 The authenticity of host 'soak-16 (<no hostip for proxy command>)' can't be established. RSA key fingerprint is 37:1f:14:66:49:de:a8:f9:fa:ab:c7:2d:28:0f:26:44. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'soak-16' (RSA) to the list of known hosts. Password: Last login: Tue Apr 21 20:29:06 2020 from ssh-1.spirit.whamcloud.com [root@soak-16 ~]# screen -x [338152.399548] Lustre: Skipped 1 previous similar message [338153.078244] Lustre: soaked-MDT0000: Client 333b7f48-68da-4 (at 192.168.1.120@o2ib) reconnecting [338153.088076] Lustre: Skipped 6 previous similar messages [338154.126802] Lustre: soaked-MDT0000: Client b8a5f1df-2f21-4 (at 192.168.1.119@o2ib) reconnecting [338154.136639] Lustre: Skipped 8 previous similar messages [338533.965788] LNet: 57040:0:(o2iblnd_cb.c:3394:kiblnd_check_conns()) Timed out tx for 192.168.1.110@o2ib: 1 seconds [338533.977364] LNet: 57040:0:(o2iblnd_cb.c:3394:kiblnd_check_conns()) Skipped 146 previous similar messages [338610.961038] LNetError: 57040:0:(lib-move.c:3680:lnet_handle_recovery_reply()) peer NI (192.168.1.110@o2ib) recovery failed with -110 [338610.974469] LNetError: 57040:0:(lib-move.c:3680:lnet_handle_recovery_reply()) Skipped 122 previous similar messages [338753.412960] Lustre: soaked-MDT0000: Client f59a1caf-b57f-4 (at 192.168.1.130@o2ib) reconnecting [338753.422834] Lustre: Skipped 2 previous similar messages [338753.428846] Lustre: soaked-MDT0000: Connection restored to 3a61c982-16a5-4 (at 192.168.1.130@o2ib) [338753.438969] Lustre: Skipped 20 previous similar messages [338760.544124] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID [339000.160995] Lustre: 57949:0:(mdd_device.c:1825:mdd_changelog_clear()) soaked-MDD0000: No entry for user 1 [340067.766748] LustreError: 57125:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g 6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524 [340067.789495] LustreError: 57125:0:(out_handler.c:918:out_tx_end()) Skipped 299 previous similar messages [340131.819910] LustreError: 58494:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g 6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524 [340131.842686] LustreError: 58494:0:(out_handler.c:918:out_tx_end()) Skipped 1855 previous similar messages [340265.331826] LustreError: 57983:0:(out_handler.c:918:out_tx_end()) soaked-MDT0000-osd: undo for /tmp/rpmbuild-lustre-jenkins-UoS8uXZU/BUILD/lustre-2.13.53_18_g 6706bfa/lustre/ptlrpc/../../lustre/target/out_handler.c:453: rc = -524 [340265.354552] LustreError: 57983:0:(out_handler.c:918:out_tx_end()) Skipped 2651 previous similar messages [340419.033915] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2 [340419.048319] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2 [340424.618904] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2 [340424.633301] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 22 previous similar messages [340424.644781] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2 [340424.659646] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 22 previous similar messages [340432.347952] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2 ... [340475.463999] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -116 [340475.479044] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 177 previous similar messages [340516.956460] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2 [340516.970860] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 294 previous similar messages [340516.982425] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2 [340516.997295] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 294 previous similar messages [340581.591455] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 llog-records: rc = -2 [340581.605835] LustreError: 57214:0:(llog_cat.c:756:llog_cat_cancel_arr_rec()) Skipped 172 previous similar messages [340581.617394] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) soaked-MDT0003-osp-MDT0000: fail to cancel 1 of 1 llog-records: rc = -2 [340581.632256] LustreError: 57214:0:(llog_cat.c:793:llog_cat_cancel_records()) Skipped 172 previous similar messages [340681.741732] LustreError: 58187:0:(llog_osd.c:616:llog_osd_write_rec()) soaked-MDT0003-osp-MDT0000: index 36369 already set in log bitmap [340681.755535] LustreError: 58187:0:(llog_osd.c:618:llog_osd_write_rec()) LBUG [340681.763417] Pid: 58187, comm: mdt_rdpg00_004 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Tue Feb 11 19:11:56 UTC 2020 [340681.775080] Call Trace: [340681.777921] [<ffffffffc1041e6c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [340681.785341] [<ffffffffc1041f1c>] lbug_with_loc+0x4c/0xa0 [libcfs] [340681.792402] [<ffffffffc10d34e5>] llog_osd_write_rec+0x1395/0x14b0 [obdclass] [340681.800519] [<ffffffffc10c22c3>] llog_write_rec+0x273/0x540 [obdclass] [340681.808044] [<ffffffffc10c79f0>] llog_cat_add_rec+0x220/0x880 [obdclass] [340681.815763] [<ffffffffc10bf422>] llog_add+0x162/0x1d0 [obdclass] [340681.822701] [<ffffffffc1490a01>] sub_updates_write+0x2e8/0xdd7 [ptlrpc] [340681.830372] [<ffffffffc1472027>] top_trans_stop+0x827/0xbc0 [ptlrpc] [340681.837748] [<ffffffffc19ab98c>] lod_trans_stop+0x25c/0x340 [lod] [340681.844795] [<ffffffffc1a6623e>] mdd_trans_stop+0x2e/0x174 [mdd] [340681.851729] [<ffffffffc1a59a07>] mdd_attr_set+0x6a7/0xdd0 [mdd] [340681.858558] [<ffffffffc18e21b9>] mdt_mfd_close+0x6b9/0x860 [mdt] [340681.865523] [<ffffffffc18e7a71>] mdt_close_internal+0x121/0x220 [mdt] [340681.872958] [<ffffffffc18e7d91>] mdt_close+0x221/0x790 [mdt] [340681.879515] [<ffffffffc145db9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc] [340681.887378] [<ffffffffc1402f76>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [340681.896074] [<ffffffffc1407424>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [340681.903228] [<ffffffff842c61f1>] kthread+0xd1/0xe0 [340681.908808] [<ffffffff8498dd37>] ret_from_fork_nospec_end+0x0/0x39 [340681.915929] [<ffffffffffffffff>] 0xffffffffffffffff [340681.921610] Kernel panic - not syncing: LBUG [340681.926473] CPU: 7 PID: 58187 Comm: mdt_rdpg00_004 Kdump: loaded Tainted: P OE ------------ 3.10.0-1062.9.1.el7_lustre.x86_64 #1 [340681.940752] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [340681.953368] Call Trace: [340681.956203] [<ffffffff8497ac23>] dump_stack+0x19/0x1b [340681.962038] [<ffffffff84974967>] panic+0xe8/0x21f [340681.967489] [<ffffffffc1041f6b>] lbug_with_loc+0x9b/0xa0 [libcfs] [340681.974504] [<ffffffffc10d34e5>] llog_osd_write_rec+0x1395/0x14b0 [obdclass] [340681.982585] [<ffffffffc10c22c3>] llog_write_rec+0x273/0x540 [obdclass] [340681.990085] [<ffffffffc10c79f0>] llog_cat_add_rec+0x220/0x880 [obdclass] [340681.997775] [<ffffffffc10bf422>] llog_add+0x162/0x1d0 [obdclass] [340682.004696] [<ffffffffc10edf19>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [340682.012803] [<ffffffffc1490a01>] sub_updates_write+0x2e8/0xdd7 [ptlrpc] [340682.020422] [<ffffffffc1475b43>] ? update_records_attr_set_pack+0xc3/0xf0 [ptlrpc] [340682.029124] [<ffffffffc14906fd>] ? prepare_writing_updates.isra.12+0x139/0x14a [ptlrpc] [340682.038311] [<ffffffffc1472027>] top_trans_stop+0x827/0xbc0 [ptlrpc] [340682.045606] [<ffffffffc19ab98c>] lod_trans_stop+0x25c/0x340 [lod] [340682.052620] [<ffffffffc1a6623e>] mdd_trans_stop+0x2e/0x174 [mdd] [340682.059528] [<ffffffffc1a59a07>] mdd_attr_set+0x6a7/0xdd0 [mdd] [340682.066362] [<ffffffffc13f5e97>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc] [340682.074451] [<ffffffffc18e21b9>] mdt_mfd_close+0x6b9/0x860 [mdt] [340682.081377] [<ffffffffc13f636f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [340682.089646] [<ffffffffc18e7a71>] mdt_close_internal+0x121/0x220 [mdt] [340682.097039] [<ffffffffc18e7d91>] mdt_close+0x221/0x790 [mdt] [340682.103593] [<ffffffffc145db9a>] tgt_request_handle+0x95a/0x1630 [ptlrpc] [340682.111377] [<ffffffffc102902e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [340682.119391] [<ffffffffc1402f76>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [340682.128070] [<ffffffffc13fde2e>] ? ptlrpc_wait_event+0x12e/0x5b0 [ptlrpc] [340682.135852] [<ffffffff842d3903>] ? __wake_up+0x13/0x20 [340682.141814] [<ffffffffc140254a>] ? ptlrpc_server_handle_req_in+0x92a/0x1100 [ptlrpc] [340682.150657] [<ffffffff842c72e0>] ? wake_up_atomic_t+0x30/0x30 [340682.157298] [<ffffffffc1407424>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [340682.164440] [<ffffffffc1406870>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [340682.172793] [<ffffffff842c61f1>] kthread+0xd1/0xe0 [340682.178335] [<ffffffff842c6120>] ? insert_kthread_work+0x40/0x40 [340682.185231] [<ffffffff8498dd37>] ret_from_fork_nospec_begin+0x21/0x21 [340682.192614] [<ffffffff842c6120>] ? insert_kthread_work+0x40/0x40

            People

              bzzz Alex Zhuravlev
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: