[LU-6321] Clean downgrade from 2.7.0 to 2.6.0 failed: fail to init namespace LFSCK component: rc = -5 Created: 02/Mar/15 Updated: 04/Mar/15 Resolved: 04/Mar/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sarah Liu | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 17682 | ||||||||||||
| Description |
|
1. formatted and setup lustre for 2.6.0, then clean upgrade the system to 2.7.0, successful 2. downgrade the system to 2.6.0, mount system failed MDS shows: LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. LustreError: 33604:0:(lfsck_namespace.c:1786:lfsck_namespace_setup()) lustre-MDT0000-osd: fail to init namespace LFSCK component: rc = -5 LustreError: 33604:0:(mdd_device.c:1051:mdd_prepare()) lustre-MDD0000: failed to initialize lfsck: rc = -5 LustreError: 33604:0:(obd_mount_server.c:1769:server_fill_super()) Unable to start targets: -5 LustreError: 33712:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x1010000:0x0], rc:-5 Lustre: Failing over lustre-MDT0000 LustreError: 33712:0:(qsd_reint.c:54:qsd_reint_completion()) Skipped 1 previous similar message Lustre: 33604:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324784/real 1425324784] req@ffff88081d695400 x1494561337639040/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1425324790 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete LustreError: 33604:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5) Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. LustreError: 34030:0:(lfsck_namespace.c:1786:lfsck_namespace_setup()) lustre-MDT0000-osd: fail to init namespace LFSCK component: rc = -5 LustreError: 34030:0:(mdd_device.c:1051:mdd_prepare()) lustre-MDD0000: failed to initialize lfsck: rc = -5 LustreError: 34030:0:(obd_mount_server.c:1769:server_fill_super()) Unable to start targets: -5 LustreError: 34125:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 LustreError: 34126:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x1010000:0x0], rc:-5 Lustre: Failing over lustre-MDT0000 Lustre: 34030:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324879/real 1425324879] req@ffff8804187e4800 x1494561337639168/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1425324885 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: server umount lustre-MDT0000 complete LustreError: 34030:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5) Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted client shows: Setup mgs, mdt, osts Starting mds1: -o user_xattr,acl /dev/sdb1 /mnt/mds1 Start of /dev/sdb1 on mds1 failed 5 Starting ost1: /dev/sdb1 /mnt/ost1 Start of /dev/sdb1 on ost1 failed 19 Starting client: onyx-28: -o user_xattr,flock onyx-25@tcp:/lustre /mnt/lustre Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324792/real 1425324792] req@ffff8804364bec00 x1494557626728452/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324797 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 76865:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364be800 x1494557626728456/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 76878:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364be000 x1494557626728464/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324817/real 1425324817] req@ffff8804364be400 x1494557626728468/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324827 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 76865:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364be800 x1494557626728460/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Lustre: Unmounted lustre-client LustreError: 76865:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5) Starting client onyx-23.onyx.hpdd.intel.com,onyx-27,onyx-28: -o user_xattr,flock onyx-25@tcp:/lustre /mnt/lustre Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324829/real 1425324829] req@ffff8804364be000 x1494557626728472/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324834 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 76949:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364bec00 x1494557626728476/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 76962:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364be400 x1494557626728484/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324854/real 1425324854] req@ffff8804364be000 x1494557626728488/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324864 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 76949:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804364bec00 x1494557626728480/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Lustre: Unmounted lustre-client LustreError: 76949:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5) Using TIMEOUT=20 Lustre: DEBUG MARKER: Using TIMEOUT=20 jobstats not supported by server disable quota as required upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted |
| Comments |
| Comment by Oleg Drokin [ 02/Mar/15 ] |
|
getting mds side debug log with increased debug would be great to better understand what failed and why |
| Comment by Andreas Dilger [ 02/Mar/15 ] |
|
This may relate to " |
| Comment by nasf (Inactive) [ 03/Mar/15 ] |
|
Originally, the "lfsck_namespace" file stored both the namespace LFSCK statistics information and the FIDs to be double scanned. But to improve the namespace LFSCK performance (since Lustre-2.7), we split single trace file as multiple ones, and name them as "lfsck_namespace_xx". At that time, the original "lfsck_namespace" only needs to record the namespace LFSCK statistics information. So we make it as regular file, NOT index file. When downgrade to Lustre-2.6, the old LFSCK wants an index trace file instead of regular file, so failed. Two solutions for that: I prefer the later solution. |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13945 |
| Comment by nasf (Inactive) [ 03/Mar/15 ] |
|
Sarah, would you please to help verify above patch when you have time? Thanks! |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13946 |
| Comment by Sarah Liu [ 04/Mar/15 ] |
|
Fan Yong, the patch works! |
| Comment by nasf (Inactive) [ 04/Mar/15 ] |
|
Thanks Sarah! |
| Comment by Gerrit Updater [ 04/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13946/ |
| Comment by Gerrit Updater [ 04/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13945/ |