Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9408

Client fails to mount with ZFS master (0.7.0) and Lustre master (2.9.56)

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • Lustre 2.10.0
    • 3
    • 9223372036854775807

    Description

      ZFS: zfs-0.7.0-rc3-225-g7a25f08
      SPL: spl-0.7.0-rc3-8-g481762f
      Lustre: v2_9_56_0-11-gbfa524f

      This is a straightforward single MDS with MDT/MGT and an OSS with a single OST.
      (also reproduced with split MDT and MGT on a slightly earlier ZFS).

      This setup works without problems with ldiskfs backing.

      Snippet from MDS log on initial mount:

      Apr 26 14:01:18 ieel-mds04 kernel: SPL: Loaded module v0.7.0-rc3_8_g481762f
      Apr 26 14:01:19 ieel-mds04 kernel: ZFS: Loaded module v0.7.0-rc3_225_g7a25f08, ZFS pool version 5000, ZFS filesystem version 5
      ...
      Apr 26 14:20:05 ieel-mds04 kernel: SPL: using hostid 0x7e3a4ec9
      Apr 26 14:20:52 ieel-mds04 kernel: Lustre: MGS: Connection restored to ec5ab9aa-e46a-19dd-47d3-1aa7d07fdc3f (at 0@lo)
      Apr 26 14:20:52 ieel-mds04 kernel: Lustre: srv-scratch-MDT0001: No data found on store. Initialize space
      Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: new disk, initializing
      Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: Imperative Recovery not enabled, recovery window 300-900
      Apr 26 14:20:52 ieel-mds04 kernel: LustreError: 21965:0:(osd_oi.c:497:osd_oid()) unsupported quota oid: 0x16
      Apr 26 14:20:52 ieel-mds04 kernel: LustreError: 22330:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-scratch-MDT0001: Allocated super-sequence failed: rc = -115
      Apr 26 14:20:52 ieel-mds04 kernel: LustreError: 22330:0:(fid_request.c:227:seq_client_alloc_seq()) cli-scratch-MDT0001: Can't allocate new meta-sequence,rc -115
      Apr 26 14:20:52 ieel-mds04 kernel: LustreError: 22330:0:(fid_request.c:383:seq_client_alloc_fid()) cli-scratch-MDT0001: Can't allocate new sequence: rc = -115
      Apr 26 14:20:52 ieel-mds04 kernel: LustreError: 22330:0:(lod_dev.c:419:lod_sub_recovery_thread()) scratch-MDT0001-osd getting update log failed: rc = -115
      

      OST Log on initial mount:

      Apr 26 14:20:56 ieel-oss03 kernel: SPL: Loaded module v0.7.0-rc3_8_g481762f
      Apr 26 14:20:58 ieel-oss03 kernel: ZFS: Loaded module v0.7.0-rc3_225_g7a25f08, ZFS pool version 5000, ZFS filesystem version 5
      Apr 26 14:21:08 ieel-oss03 kernel: SPL: using hostid 0x5d9bdb4b
      Apr 26 14:25:01 ieel-oss03 kernel: LNet: HW nodes: 1, HW CPU cores: 2, npartitions: 1
      Apr 26 14:25:01 ieel-oss03 kernel: alg: No test for adler32 (adler32-zlib)
      Apr 26 14:25:01 ieel-oss03 kernel: alg: No test for crc32 (crc32-table)
      Apr 26 14:25:01 ieel-oss03 kernel: Lustre: Lustre: Build Version: 2.9.56_11_gbfa524f
      Apr 26 14:25:01 ieel-oss03 kernel: LNet: Added LNI 192.168.56.22@tcp [8/256/0/180]
      Apr 26 14:25:01 ieel-oss03 kernel: LNet: Accept secure, port 988
      Apr 26 14:25:02 ieel-oss03 kernel: Lustre: scratch-OST0000: new disk, initializing
      Apr 26 14:25:02 ieel-oss03 kernel: Lustre: srv-scratch-OST0000: No data found on store. Initialize space
      Apr 26 14:25:02 ieel-oss03 kernel: Lustre: scratch-OST0000: Imperative Recovery not enabled, recovery window 300-900
      Apr 26 14:25:02 ieel-oss03 kernel: LustreError: 13214:0:(osd_oi.c:497:osd_oid()) unsupported quota oid: 0x16
      Apr 26 14:25:07 ieel-oss03 kernel: Lustre: scratch-OST0000: Connection restored to scratch-MDT0001-mdtlov_UUID (at 192.168.56.13@tcp)
      

      Client attempting to mount (messages):

      Apr 26 14:30:44 ieel-c03 kernel: LNet: HW CPU cores: 2, npartitions: 1
      Apr 26 14:30:44 ieel-c03 kernel: alg: No test for adler32 (adler32-zlib)
      Apr 26 14:30:44 ieel-c03 kernel: alg: No test for crc32 (crc32-table)
      Apr 26 14:30:49 ieel-c03 kernel: sha512_ssse3: Using AVX optimized SHA-512 implementation
      Apr 26 14:30:52 ieel-c03 kernel: Lustre: Lustre: Build Version: 2.8.0.51-1-PRISTINE-3.10.0-514.6.1.el7.x86_64
      Apr 26 14:30:52 ieel-c03 kernel: LNet: Added LNI 192.168.56.32@tcp [8/256/0/180]
      Apr 26 14:30:52 ieel-c03 kernel: LNet: Accept secure, port 988
      Apr 26 14:30:52 ieel-c03 kernel: LustreError: 2336:0:(lmv_obd.c:553:lmv_check_connect()) scratch-clilmv-ffff88003b98d800: no target configured for index 0.
      Apr 26 14:30:52 ieel-c03 kernel: LustreError: 2336:0:(llite_lib.c:265:client_common_fill_super()) cannot connect to scratch-clilmv-ffff88003b98d800: rc = -22
      Apr 26 14:30:52 ieel-c03 kernel: LustreError: 2363:0:(lov_obd.c:922:lov_cleanup()) scratch-clilov-ffff88003b98d800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
      Apr 26 14:30:52 ieel-c03 kernel: Lustre: Unmounted scratch-client
      Apr 26 14:30:52 ieel-c03 kernel: LustreError: 2336:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-22)
      

      Attached logs.tbz2 has debug_kernel dumps.

      Attachments

        Issue Links

          Activity

            [LU-9408] Client fails to mount with ZFS master (0.7.0) and Lustre master (2.9.56)

            The osd_oid() has been totally removed via the patch:
            https://review.whamcloud.com/#/c/27093/

            yong.fan nasf (Inactive) added a comment - The osd_oid() has been totally removed via the patch: https://review.whamcloud.com/#/c/27093/

            Andreas, Fan Yong is working on Project quota for ZFS, i think that messages will be removed too with zfs project quota supported.

            wangshilong Wang Shilong (Inactive) added a comment - Andreas, Fan Yong is working on Project quota for ZFS, i think that messages will be removed too with zfs project quota supported.

            It would be good to get a patch to quiet the spurious "osd_oid()) unsupported quota oid: 0x16" message at startup, since even I find that confusing and wonder whether there is something wrong. We know this is for project quota, which isn't supported in ZFS yet.

            adilger Andreas Dilger added a comment - It would be good to get a patch to quiet the spurious " osd_oid()) unsupported quota oid: 0x16 " message at startup, since even I find that confusing and wonder whether there is something wrong. We know this is for project quota, which isn't supported in ZFS yet.
            pjones Peter Jones added a comment -

            Thanks Nathaniel

            pjones Peter Jones added a comment - Thanks Nathaniel

            If I format mdt00 with --index=0, everything works just fine. This could be closed as "not a bug" i guess, though it's kind of a strange one to debug.

            utopiabound Nathaniel Clark added a comment - If I format mdt00 with --index=0, everything works just fine. This could be closed as "not a bug" i guess, though it's kind of a strange one to debug.
            ihara Shuichi Ihara (Inactive) added a comment - - edited

            Ah, are you sure does "--index=1" work without --index=0?

            Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: new disk, initializing
            Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: Imperative Recovery not enabled, recovery window 300-900

            it seem you've also setup --index=1 without --index=0 in your original description..

            However, you have --index=0 for MDT in your ldiskfs setup.

            8 UP osp scratch-OST0000-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5
            9 UP osp scratch-OST0001-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5

            ihara Shuichi Ihara (Inactive) added a comment - - edited Ah, are you sure does "--index=1" work without --index=0? Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: new disk, initializing Apr 26 14:20:52 ieel-mds04 kernel: Lustre: scratch-MDT0001: Imperative Recovery not enabled, recovery window 300-900 it seem you've also setup --index=1 without --index=0 in your original description.. However, you have --index=0 for MDT in your ldiskfs setup. 8 UP osp scratch-OST0000-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5 9 UP osp scratch-OST0001-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5
            zpool create -f -o ashift=12 -o cachefile=none mdt00 /dev/sdc /dev/sdd
            mkfs.lustre --reformat --backfstype=zfs --mgs --mdt --index=1 --fsname=scratch mdt00/mdt
            
            utopiabound Nathaniel Clark added a comment - zpool create -f -o ashift=12 -o cachefile=none mdt00 /dev/sdc /dev/sdd mkfs.lustre --reformat --backfstype=zfs --mgs --mdt --index=1 --fsname=scratch mdt00/mdt

            Hi Nathaniel Clark,

            could you show me your exact mkfs options, so that i could reproduce here.

            Thanks,
            Shilong

            wangshilong Wang Shilong (Inactive) added a comment - Hi Nathaniel Clark, could you show me your exact mkfs options, so that i could reproduce here. Thanks, Shilong
            utopiabound Nathaniel Clark added a comment - - edited

            ihara,

            Oh, you are correct, I've updated the client code to match the OSS/MDS but I didn't unload the old modules.

            Using the quoted version above, I get the same result. Works with ldiskfs, does not work with zfs.

            I've go two setups with running the same version of lusture: 2.9.56_11_gbfa524f
            ZFS:

            [root@ieel-mds03 ~]# lctl dl
              0 UP osd-zfs scratch-MDT0001-osd scratch-MDT0001-osd_UUID 8
              1 UP mgs MGS MGS 7
              2 UP mgc MGC192.168.56.12@tcp 3e0eccdf-f828-338f-d3fc-2e717a638014 5
              3 UP mds MDS MDS_uuid 3
              4 UP lod scratch-MDT0001-mdtlov scratch-MDT0001-mdtlov_UUID 4
              5 UP mdt scratch-MDT0001 scratch-MDT0001_UUID 5
              6 UP mdd scratch-MDD0001 scratch-MDD0001_UUID 4
              7 UP osp scratch-OST0000-osc-MDT0001 scratch-MDT0001-mdtlov_UUID 5
            

            ldiskfs:

            [root@ieel-mds04 ~]# lctl dl
              0 UP osd-ldiskfs scratch-MDT0000-osd scratch-MDT0000-osd_UUID 10
              1 UP mgs MGS MGS 7
              2 UP mgc MGC192.168.56.13@tcp 0e5f0018-97cf-c2a4-4817-f51b7410ec7b 5
              3 UP mds MDS MDS_uuid 3
              4 UP lod scratch-MDT0000-mdtlov scratch-MDT0000-mdtlov_UUID 4
              5 UP mdt scratch-MDT0000 scratch-MDT0000_UUID 13
              6 UP mdd scratch-MDD0000 scratch-MDD0000_UUID 4
              7 UP qmt scratch-QMT0000 scratch-QMT0000_UUID 4
              8 UP osp scratch-OST0000-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5
              9 UP osp scratch-OST0001-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5
             10 UP lwp scratch-MDT0000-lwp-MDT0000 scratch-MDT0000-lwp-MDT0000_UUID 5
            

            Should qmt be missing from ZFS?

            utopiabound Nathaniel Clark added a comment - - edited ihara , Oh, you are correct, I've updated the client code to match the OSS/MDS but I didn't unload the old modules. Using the quoted version above, I get the same result. Works with ldiskfs, does not work with zfs. I've go two setups with running the same version of lusture: 2.9.56_11_gbfa524f ZFS: [root@ieel-mds03 ~]# lctl dl 0 UP osd-zfs scratch-MDT0001-osd scratch-MDT0001-osd_UUID 8 1 UP mgs MGS MGS 7 2 UP mgc MGC192.168.56.12@tcp 3e0eccdf-f828-338f-d3fc-2e717a638014 5 3 UP mds MDS MDS_uuid 3 4 UP lod scratch-MDT0001-mdtlov scratch-MDT0001-mdtlov_UUID 4 5 UP mdt scratch-MDT0001 scratch-MDT0001_UUID 5 6 UP mdd scratch-MDD0001 scratch-MDD0001_UUID 4 7 UP osp scratch-OST0000-osc-MDT0001 scratch-MDT0001-mdtlov_UUID 5 ldiskfs: [root@ieel-mds04 ~]# lctl dl 0 UP osd-ldiskfs scratch-MDT0000-osd scratch-MDT0000-osd_UUID 10 1 UP mgs MGS MGS 7 2 UP mgc MGC192.168.56.13@tcp 0e5f0018-97cf-c2a4-4817-f51b7410ec7b 5 3 UP mds MDS MDS_uuid 3 4 UP lod scratch-MDT0000-mdtlov scratch-MDT0000-mdtlov_UUID 4 5 UP mdt scratch-MDT0000 scratch-MDT0000_UUID 13 6 UP mdd scratch-MDD0000 scratch-MDD0000_UUID 4 7 UP qmt scratch-QMT0000 scratch-QMT0000_UUID 4 8 UP osp scratch-OST0000-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5 9 UP osp scratch-OST0001-osc-MDT0000 scratch-MDT0000-mdtlov_UUID 5 10 UP lwp scratch-MDT0000-lwp-MDT0000 scratch-MDT0000-lwp-MDT0000_UUID 5 Should qmt be missing from ZFS?

            This confused me. As far as I read your original description, client version is "2.8.0.51-1", not same version of OSS/MDS (2.9.56_11_gbfa524f).

            ihara Shuichi Ihara (Inactive) added a comment - This confused me. As far as I read your original description, client version is "2.8.0.51-1", not same version of OSS/MDS (2.9.56_11_gbfa524f).

            People

              wangshilong Wang Shilong (Inactive)
              utopiabound Nathaniel Clark
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: