Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
loading build from https://build.whamcloud.com/job/lustre-reviews/107525/ on both server and clients of WR-SOAK, with no HA enabled, soak hit following issue
[63239.265986] Lustre: Lustre: Build Version: 2.15.90_24_ge863f95f
both VMs from the 2nd controller rebooted, see following errors in console log
vcontroller vm0
[63664.348413] Lustre: sfa18k03-OST0025: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x340000401:20963886 to 0x340000401:20968833)
[63664.813575] Lustre: sfa18k03-OST0026: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0xc40000400:20942873 to 0xc40000400:20948833)
[63664.817519] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x5c0000400:20671906 to 0x5c0000400:20675073)
[63670.858601] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x5c0000401:19996851 to 0x5c0000401:19999185)
[63670.878393] Lustre: sfa18k03-OST001c: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x780000400:19109574 to 0x780000400:19111665)
[63705.015859] Lustre: sfa18k03-OST0029-osc-MDT0001: Connection restored to 172.25.80.53@tcp (at 172.25.80.53@tcp)
[63705.019483] Lustre: Skipped 14 previous similar messages
[63711.321154] LustreError: sfa18k03-OST0031-osc-MDT0001: operation ost_connect to node 172.25.80.53@tcp failed: rc = -19
[63711.323272] LustreError: Skipped 10 previous similar messages
[63936.604947] LDISKFS-fs (sdao): error count since last fsck: 38
[63936.604946] LDISKFS-fs (sdap): error count since last fsck: 2
[63936.604966] LDISKFS-fs (sdap): initial error at time 1723874458: ldiskfs_find_dest_de:2297
[63936.609560] LDISKFS-fs (sdao): initial error at time 1723660739: ldiskfs_find_dest_de:2297
[63936.610905] : inode 182190082
[63936.612597] : inode 73138214
[63936.614252] : block 5830242584
[63936.615147] : block 60217102
[63936.615989]
[63936.616849]
[63936.617671] LDISKFS-fs (sdap): last error at time 1723874458: ldiskfs_evict_inode:257
[63936.618286] LDISKFS-fs (sdao): last error at time 1723878231: ldiskfs_evict_inode:257
[63936.618893]
[63936.620394]
[75516.204047] LustreError: sfa18k03-OST002c: not available for connect from 172.25.80.50@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[75516.206751] LustreError: sfa18k03-OST002b-osc-MDT0001: operation ost_statfs to node 172.25.80.53@tcp failed: rc = -107
[75516.206773] Lustre: sfa18k03-MDT0003-osp-MDT0001: Connection to sfa18k03-MDT0003 (at 172.25.80.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
[75516.206775] Lustre: Skipped 1 previous similar message
[75516.209189] LustreError: Skipped 238 previous similar messages
[75516.211099] LustreError: Skipped 10 previous similar messages
[ESC[0;32m OK ESC[0m] Stopped target resource-agents dependencies.
Stopping Restore /run/initramfs on shutdown...
Stopping LVM event activation on device 8:0...
Stopping LVM event activation on device 8:48...
Stopping LVM event activation on device 252:3...
[ESC[0;32m OK ESC[0m] Stopped target rpc_pipefs.target.
Unmounting RPC Pipe File System...
Stopping LVM event activation on device 8:32...
Stopping LVM event activation on device 252:2...
Stopping LVM event activation on device 8:64...
Stopping Hostname Service...
[ESC[0;32m OK ESC[0m] Stopped t[ESC[0;32m OK ESC[0m] Stopped irqbalance daemon.
[ESC[0;32m OK ESC[0m] Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
[ESC[0;32m OK ESC[0m] Stopped Prometheus exporter for Lustre filesystem.
[ESC[0;32m OK ESC[0m] Stopped Hostname Service.