Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2240

implement index range lookup for osd-zfs.

Details

    • 3
    • 5303

    Description

      ZFS needs a index range lookup for DNE.

      Attachments

        Issue Links

          Activity

            [LU-2240] implement index range lookup for osd-zfs.

            yes, I'd suggest to remove them .. and I'd suggest to take a snapshot just before that unfortunately I'm unable to reproduce the case locally:
            I can't generate such an image (can't even find the code in gerrit using 0x200000007 for quota).

            bzzz Alex Zhuravlev added a comment - yes, I'd suggest to remove them .. and I'd suggest to take a snapshot just before that unfortunately I'm unable to reproduce the case locally: I can't generate such an image (can't even find the code in gerrit using 0x200000007 for quota).

            Here's what I see on the MDS:

            # grove-mds2 /tmp/zfs > ls -li oi.3/0x200000003* oi.5/0x200000005* oi.6/0x200000006* oi.7/0x200000007* seq* quota*
               176 -rw-r--r-- 1 root root  8 Dec 31  1969 oi.3/0x200000003:0x1:0x0
               180 -rw-r--r-- 1 root root  0 Dec 31  1969 oi.3/0x200000003:0x3:0x0
            414212 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.5/0x200000005:0x1:0x0
            414214 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.5/0x200000005:0x2:0x0
            417923 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.6/0x200000006:0x10000:0x0
            417924 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.6/0x200000006:0x1010000:0x0
            417927 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.6/0x200000006:0x1020000:0x0
            417926 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.6/0x200000006:0x20000:0x0
            414209 -rw-r--r-- 1 root root  8 Dec 31  1969 oi.7/0x200000007:0x1:0x0
            414211 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.7/0x200000007:0x3:0x0
            414213 -rw-r--r-- 1 root root  2 Dec 31  1969 oi.7/0x200000007:0x4:0x0
            414209 -rw-r--r-- 1 root root  8 Dec 31  1969 seq-200000007-lastid
               173 -rw-rw-rw- 1 root root 24 Dec 31  1969 seq_ctl
               174 -rw-rw-rw- 1 root root 24 Dec 31  1969 seq_srv
            
            oi.3/0x200000003:0x2:0x0:
            total 0
            
            oi.3/0x200000003:0x4:0x0:
            total 9
            417925 drwxr-xr-x 2 root root 2 Dec 31  1969 dt-0x0
            417922 drwxr-xr-x 2 root root 2 Dec 31  1969 md-0x0
            
            oi.3/0x200000003:0x5:0x0:
            total 9
            417923 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000
            417924 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000
            
            oi.3/0x200000003:0x6:0x0:
            total 9
            417927 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1020000
            417926 -rw-r--r-- 1 root root 2 Dec 31  1969 0x20000
            
            oi.7/0x200000007:0x2:0x0:
            total 18
            414211 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000
            414212 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000-MDT0000
            414213 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000
            414214 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000-MDT0000
            
            quota_master:
            total 9
            417925 drwxr-xr-x 2 root root 2 Dec 31  1969 dt-0x0
            417922 drwxr-xr-x 2 root root 2 Dec 31  1969 md-0x0
            
            quota_slave:
            total 18
            414211 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000
            414212 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000-MDT0000
            414213 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000
            414214 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000-MDT0000
            

            I'm somewhat guessing as to what the on disk format is supposed to look like, but it does appear to be using the new quota sequence numbers (0x200000005ULL and 0x200000006ULL).

            So, does this mean I can go ahead and remove these files:

            # grove-mds2 /tmp/zfs > find . -inum 414209 -o -inum 414211 -o -inum 414213
            ./oi.7/0x200000007:0x3:0x0
            ./oi.7/0x200000007:0x4:0x0
            ./oi.7/0x200000007:0x2:0x0/0x1010000
            ./oi.7/0x200000007:0x2:0x0/0x10000
            ./oi.7/0x200000007:0x1:0x0
            ./seq-200000007-lastid
            ./quota_slave/0x1010000
            ./quota_slave/0x10000
            

            ?

            prakash Prakash Surya (Inactive) added a comment - Here's what I see on the MDS: # grove-mds2 /tmp/zfs > ls -li oi.3/0x200000003* oi.5/0x200000005* oi.6/0x200000006* oi.7/0x200000007* seq* quota* 176 -rw-r--r-- 1 root root 8 Dec 31 1969 oi.3/0x200000003:0x1:0x0 180 -rw-r--r-- 1 root root 0 Dec 31 1969 oi.3/0x200000003:0x3:0x0 414212 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.5/0x200000005:0x1:0x0 414214 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.5/0x200000005:0x2:0x0 417923 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.6/0x200000006:0x10000:0x0 417924 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.6/0x200000006:0x1010000:0x0 417927 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.6/0x200000006:0x1020000:0x0 417926 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.6/0x200000006:0x20000:0x0 414209 -rw-r--r-- 1 root root 8 Dec 31 1969 oi.7/0x200000007:0x1:0x0 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.7/0x200000007:0x3:0x0 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.7/0x200000007:0x4:0x0 414209 -rw-r--r-- 1 root root 8 Dec 31 1969 seq-200000007-lastid 173 -rw-rw-rw- 1 root root 24 Dec 31 1969 seq_ctl 174 -rw-rw-rw- 1 root root 24 Dec 31 1969 seq_srv oi.3/0x200000003:0x2:0x0: total 0 oi.3/0x200000003:0x4:0x0: total 9 417925 drwxr-xr-x 2 root root 2 Dec 31 1969 dt-0x0 417922 drwxr-xr-x 2 root root 2 Dec 31 1969 md-0x0 oi.3/0x200000003:0x5:0x0: total 9 417923 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000 417924 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000 oi.3/0x200000003:0x6:0x0: total 9 417927 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1020000 417926 -rw-r--r-- 1 root root 2 Dec 31 1969 0x20000 oi.7/0x200000007:0x2:0x0: total 18 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000 414212 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000-MDT0000 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000 414214 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000-MDT0000 quota_master: total 9 417925 drwxr-xr-x 2 root root 2 Dec 31 1969 dt-0x0 417922 drwxr-xr-x 2 root root 2 Dec 31 1969 md-0x0 quota_slave: total 18 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000 414212 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000-MDT0000 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000 414214 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000-MDT0000 I'm somewhat guessing as to what the on disk format is supposed to look like, but it does appear to be using the new quota sequence numbers (0x200000005ULL and 0x200000006ULL). So, does this mean I can go ahead and remove these files: # grove-mds2 /tmp/zfs > find . -inum 414209 -o -inum 414211 -o -inum 414213 ./oi.7/0x200000007:0x3:0x0 ./oi.7/0x200000007:0x4:0x0 ./oi.7/0x200000007:0x2:0x0/0x1010000 ./oi.7/0x200000007:0x2:0x0/0x10000 ./oi.7/0x200000007:0x1:0x0 ./seq-200000007-lastid ./quota_slave/0x1010000 ./quota_slave/0x10000 ?

            could you check whether your filesystem has been using new quota files now? they're supposed to be in the following sequences:
            FID_SEQ_QUOTA = 0x200000005ULL,
            FID_SEQ_QUOTA_GLB = 0x200000006ULL,

            if so, then it should be OK to just remove old quota files in 0x200000007 sequence.

            bzzz Alex Zhuravlev added a comment - could you check whether your filesystem has been using new quota files now? they're supposed to be in the following sequences: FID_SEQ_QUOTA = 0x200000005ULL, FID_SEQ_QUOTA_GLB = 0x200000006ULL, if so, then it should be OK to just remove old quota files in 0x200000007 sequence.

            seq-<SEQ>-lastid stores the last used ID in sequence <SEQ>

            bzzz Alex Zhuravlev added a comment - seq-<SEQ>-lastid stores the last used ID in sequence <SEQ>

            Sigh, this is the difficulty with following the development branch - you are picking up all of the dirty laundry that is normally put away before the release is made. Typically, we don't want anyone to use development releases for long-lived filesystems for exactly this reason.

            Hopefully Mike or Alex configure out something to resolve this easily.

            adilger Andreas Dilger added a comment - Sigh, this is the difficulty with following the development branch - you are picking up all of the dirty laundry that is normally put away before the release is made. Typically, we don't want anyone to use development releases for long-lived filesystems for exactly this reason. Hopefully Mike or Alex configure out something to resolve this easily.

            Right, I confirmed in a VM that deleting oi.7/0x200000007:0x1:0x0 avoids the crash, and lets it successfully add the new root fid:

            $ ls -lid ./oi.7/0x200000007:0x1:0x0 ROOT
            177 drwxr-xr-x 158 root root 2 Dec  6 12:55 ./oi.7/0x200000007:0x1:0x0/
            177 drwxr-xr-x 158 root root 2 Dec  6 12:55 ROOT/
            

            I'm not sure what the seq-200000007-lastid is for, or if it's safe to remove its OI entry. The MDT for the production filesystem has a similar file, but in a non-colliding sequence:

            # grove-mds1 /mnt/mdtsnap > ls -lid seq-200000003-lastid  oi.3/0x200000003:0x1:0x0
            207172557 -rw-r--r-- 1 root root 8 Dec 31  1969 oi.3/0x200000003:0x1:0x0
            207172557 -rw-r--r-- 1 root root 8 Dec 31  1969 seq-200000003-lastid
            

            So hopefully we won't run into a problem there.

            It would be nice if the conversion code handled these collisions. But since there should be very few affected filesystems in the wild, we could probably live with a manual workaround.

            nedbass Ned Bass (Inactive) added a comment - Right, I confirmed in a VM that deleting oi.7/0x200000007:0x1:0x0 avoids the crash, and lets it successfully add the new root fid: $ ls -lid ./oi.7/0x200000007:0x1:0x0 ROOT 177 drwxr-xr-x 158 root root 2 Dec 6 12:55 ./oi.7/0x200000007:0x1:0x0/ 177 drwxr-xr-x 158 root root 2 Dec 6 12:55 ROOT/ I'm not sure what the seq-200000007-lastid is for, or if it's safe to remove its OI entry. The MDT for the production filesystem has a similar file, but in a non-colliding sequence: # grove-mds1 /mnt/mdtsnap > ls -lid seq-200000003-lastid oi.3/0x200000003:0x1:0x0 207172557 -rw-r--r-- 1 root root 8 Dec 31 1969 oi.3/0x200000003:0x1:0x0 207172557 -rw-r--r-- 1 root root 8 Dec 31 1969 seq-200000003-lastid So hopefully we won't run into a problem there. It would be nice if the conversion code handled these collisions. But since there should be very few affected filesystems in the wild, we could probably live with a manual workaround.

            I'm curious if this is the commit that is biting us:

            commit 5b64ac7f7cf2767acb75b872eaffcf6d255d0501
            Author: Mikhail Pershin <tappro@whamcloud.com>
            Date:   Thu Oct 4 14:24:43 2012 +0400
            
                LU-1943 class: FID_SEQ_LOCAL_NAME set to the Orion value
                
                Keep the same numbers for Orion and master for compatibility
                
                Signed-off-by: Mikhail Pershin <tappro@whamcloud.com>
                Change-Id: I318eba9860be7849ee4a8d828cf27e5fb91164e9
                Reviewed-on: http://review.whamcloud.com/4179
                Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
                Tested-by: Hudson
                Tested-by: Maloo <whamcloud.maloo@gmail.com>
                Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
            
            diff --git a/lustre/include/lustre/lustre_idl.h b/lustre/include/lustre/lustre_idl.h
            index 4705c1d..bae42d0 100644
            --- a/lustre/include/lustre/lustre_idl.h
            +++ b/lustre/include/lustre/lustre_idl.h
            @@ -421,13 +421,12 @@ enum fid_seq {
                    /* sequence for local pre-defined FIDs listed in local_oid */
                     FID_SEQ_LOCAL_FILE = 0x200000001ULL,
                     FID_SEQ_DOT_LUSTRE = 0x200000002ULL,
            -        /* XXX 0x200000003ULL is reserved for FID_SEQ_LLOG_OBJ */
                    /* sequence is used for local named objects FIDs generated
                     * by local_object_storage library */
            +       FID_SEQ_LOCAL_NAME = 0x200000003ULL,
                     FID_SEQ_SPECIAL    = 0x200000004ULL,
                     FID_SEQ_QUOTA      = 0x200000005ULL,
                     FID_SEQ_QUOTA_GLB  = 0x200000006ULL,
            -       FID_SEQ_LOCAL_NAME = 0x200000007ULL,
                     FID_SEQ_NORMAL     = 0x200000400ULL,
                     FID_SEQ_LOV_DEFAULT= 0xffffffffffffffffULL
             };
            
            prakash Prakash Surya (Inactive) added a comment - I'm curious if this is the commit that is biting us: commit 5b64ac7f7cf2767acb75b872eaffcf6d255d0501 Author: Mikhail Pershin <tappro@whamcloud.com> Date: Thu Oct 4 14:24:43 2012 +0400 LU-1943 class: FID_SEQ_LOCAL_NAME set to the Orion value Keep the same numbers for Orion and master for compatibility Signed-off-by: Mikhail Pershin <tappro@whamcloud.com> Change-Id: I318eba9860be7849ee4a8d828cf27e5fb91164e9 Reviewed-on: http://review.whamcloud.com/4179 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: Hudson Tested-by: Maloo <whamcloud.maloo@gmail.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> diff --git a/lustre/include/lustre/lustre_idl.h b/lustre/include/lustre/lustre_idl.h index 4705c1d..bae42d0 100644 --- a/lustre/include/lustre/lustre_idl.h +++ b/lustre/include/lustre/lustre_idl.h @@ -421,13 +421,12 @@ enum fid_seq { /* sequence for local pre-defined FIDs listed in local_oid */ FID_SEQ_LOCAL_FILE = 0x200000001ULL, FID_SEQ_DOT_LUSTRE = 0x200000002ULL, - /* XXX 0x200000003ULL is reserved for FID_SEQ_LLOG_OBJ */ /* sequence is used for local named objects FIDs generated * by local_object_storage library */ + FID_SEQ_LOCAL_NAME = 0x200000003ULL, FID_SEQ_SPECIAL = 0x200000004ULL, FID_SEQ_QUOTA = 0x200000005ULL, FID_SEQ_QUOTA_GLB = 0x200000006ULL, - FID_SEQ_LOCAL_NAME = 0x200000007ULL, FID_SEQ_NORMAL = 0x200000400ULL, FID_SEQ_LOV_DEFAULT= 0xffffffffffffffffULL };

            I think we're getting the -EEXISTS (i.e. -17) error back from zap_add when we try inserting the new root fid (0x200000007:0x1:0x0) into the OI since it already exists.

            prakash Prakash Surya (Inactive) added a comment - I think we're getting the -EEXISTS (i.e. -17 ) error back from zap_add when we try inserting the new root fid ( 0x200000007:0x1:0x0 ) into the OI since it already exists.

            Mounting a snapshot of the MDT through the POSIX layer, I found that objects in the quota_slave directory and the file seq-200000007-lastid are using FID_SEQ_ROOT. Note the matching inode numbers:

            $ ls -li oi.7/0x200000007* seq-200000007-lastid quota_slave/
            414209 -rw-r--r-- 1 root root 8 Dec 31  1969 oi.7/0x200000007:0x1:0x0
            414211 -rw-r--r-- 1 root root 2 Dec 31  1969 oi.7/0x200000007:0x3:0x0
            414213 -rw-r--r-- 1 root root 2 Dec 31  1969 oi.7/0x200000007:0x4:0x0
            414209 -rw-r--r-- 1 root root 8 Dec 31  1969 seq-200000007-lastid
            
            oi.7/0x200000007:0x2:0x0:
            total 22K
            414211 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000
            414212 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000-MDT0000
            414213 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000
            414214 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000-MDT0000
            
            quota_slave/:
            total 22K
            414211 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000
            414212 -rw-r--r-- 1 root root 2 Dec 31  1969 0x10000-MDT0000
            414213 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000
            414214 -rw-r--r-- 1 root root 2 Dec 31  1969 0x1010000-MDT0000
            
            nedbass Ned Bass (Inactive) added a comment - Mounting a snapshot of the MDT through the POSIX layer, I found that objects in the quota_slave directory and the file seq-200000007-lastid are using FID_SEQ_ROOT. Note the matching inode numbers: $ ls -li oi.7/0x200000007* seq-200000007-lastid quota_slave/ 414209 -rw-r--r-- 1 root root 8 Dec 31 1969 oi.7/0x200000007:0x1:0x0 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.7/0x200000007:0x3:0x0 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 oi.7/0x200000007:0x4:0x0 414209 -rw-r--r-- 1 root root 8 Dec 31 1969 seq-200000007-lastid oi.7/0x200000007:0x2:0x0: total 22K 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000 414212 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000-MDT0000 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000 414214 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000-MDT0000 quota_slave/: total 22K 414211 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000 414212 -rw-r--r-- 1 root root 2 Dec 31 1969 0x10000-MDT0000 414213 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000 414214 -rw-r--r-- 1 root root 2 Dec 31 1969 0x1010000-MDT0000

            And enabling D_OTHER I gather this message:

            2013-03-20 14:37:14 Lustre: 58602:0:(osd_oi.c:720:osd_convert_root_to_new_seq()) lstest-MDT0000: /ROOT -> [0x200000001:0x6:0x0] -> 177
            
            prakash Prakash Surya (Inactive) added a comment - And enabling D_OTHER I gather this message: 2013-03-20 14:37:14 Lustre: 58602:0:(osd_oi.c:720:osd_convert_root_to_new_seq()) lstest-MDT0000: /ROOT -> [0x200000001:0x6:0x0] -> 177

            I'm reopening this issue. I tried upgrading our Grove-Test filesystem to 2.3.62-4chaos but hit the following crash:

            Lustre: Lustre: Build Version: 2.3.62-4chaos-4chaos--PRISTINE-2.6.32-220.23.1.2chaos.ch5.x86_64
            LustreError: 58463:0:(osd_oi.c:784:osd_convert_root_to_new_seq()) lstest-MDT0000: can't convert to new fid: rc = -17
            BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
            
            crash> bt
            PID: 58463  TASK: ffff88200c0fc040  CPU: 9   COMMAND: "mount.lustre"
             #0 [ffff881ff51e1520] machine_kexec at ffffffff8103216b
             #1 [ffff881ff51e1580] crash_kexec at ffffffff810b8d12
             #2 [ffff881ff51e1650] oops_end at ffffffff814f2c00
             #3 [ffff881ff51e1680] no_context at ffffffff810423fb
             #4 [ffff881ff51e16d0] __bad_area_nosemaphore at ffffffff81042685
             #5 [ffff881ff51e1720] bad_area at ffffffff810427ae
             #6 [ffff881ff51e1750] __do_page_fault at ffffffff81042eb3
             #7 [ffff881ff51e1870] do_page_fault at ffffffff814f4bde
             #8 [ffff881ff51e18a0] page_fault at ffffffff814f1f95
                [exception RIP: list_del+12]
                RIP: ffffffff8127d75c  RSP: ffff881ff51e1958  RFLAGS: 00010292
                RAX: ffff881ff51e0000  RBX: 0000000000000010  RCX: ffff882017ff7400
                RDX: 0000000000000000  RSI: 0000000000000030  RDI: 0000000000000010
                RBP: ffff881ff51e1968   R8: ffff882018246500   R9: ffff88200f263c00
                R10: ffff8820178263a0  R11: 0000000000000000  R12: ffff881ff51e19f8
                R13: 00000000ffffffef  R14: ffff88200b877340  R15: ffff881ff51e19f8
                ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
             #9 [ffff881ff51e1970] arc_remove_prune_callback at ffffffffa042413c [zfs]
            #10 [ffff881ff51e1990] osd_device_fini at ffffffffa0d315a7 [osd_zfs]
            #11 [ffff881ff51e19b0] osd_device_alloc at ffffffffa0d31c90 [osd_zfs]
            #12 [ffff881ff51e19e0] obd_setup at ffffffffa0736a67 [obdclass]
            #13 [ffff881ff51e1aa0] class_setup at ffffffffa0736d78 [obdclass]
            #14 [ffff881ff51e1af0] class_process_config at ffffffffa073e28c [obdclass]
            #15 [ffff881ff51e1b80] do_lcfg at ffffffffa0746249 [obdclass]
            #16 [ffff881ff51e1c60] lustre_start_simple at ffffffffa0746614 [obdclass]
            #17 [ffff881ff51e1cc0] lustre_fill_super at ffffffffa0756883 [obdclass]
            #18 [ffff881ff51e1da0] get_sb_nodev at ffffffff8117ab1f
            #19 [ffff881ff51e1de0] lustre_get_sb at ffffffffa0741315 [obdclass]
            #20 [ffff881ff51e1e00] vfs_kern_mount at ffffffff8117a77b
            #21 [ffff881ff51e1e50] do_kern_mount at ffffffff8117a922
            #22 [ffff881ff51e1ea0] do_mount at ffffffff81198fa2
            #23 [ffff881ff51e1f20] sys_mount at ffffffff81199630
            #24 [ffff881ff51e1f80] system_call_fastpath at ffffffff8100b0f2
                RIP: 00007ffff771345a  RSP: 00007fffffff99d8  RFLAGS: 00010206
                RAX: 00000000000000a5  RBX: ffffffff8100b0f2  RCX: 0000000001000000
                RDX: 0000000000408087  RSI: 00007fffffffca48  RDI: 0000000000618480
                RBP: 0000000000000000   R8: 0000000000618670   R9: 0000000000000000
                R10: 0000000001000000  R11: 0000000000000206  R12: 000000000060bbd8
                R13: 000000000060bbd0  R14: 0000000000618670  R15: 0000000000000000
                ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
            
            prakash Prakash Surya (Inactive) added a comment - I'm reopening this issue. I tried upgrading our Grove-Test filesystem to 2.3.62-4chaos but hit the following crash: Lustre: Lustre: Build Version: 2.3.62-4chaos-4chaos--PRISTINE-2.6.32-220.23.1.2chaos.ch5.x86_64 LustreError: 58463:0:(osd_oi.c:784:osd_convert_root_to_new_seq()) lstest-MDT0000: can't convert to new fid: rc = -17 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 crash> bt PID: 58463 TASK: ffff88200c0fc040 CPU: 9 COMMAND: "mount.lustre" #0 [ffff881ff51e1520] machine_kexec at ffffffff8103216b #1 [ffff881ff51e1580] crash_kexec at ffffffff810b8d12 #2 [ffff881ff51e1650] oops_end at ffffffff814f2c00 #3 [ffff881ff51e1680] no_context at ffffffff810423fb #4 [ffff881ff51e16d0] __bad_area_nosemaphore at ffffffff81042685 #5 [ffff881ff51e1720] bad_area at ffffffff810427ae #6 [ffff881ff51e1750] __do_page_fault at ffffffff81042eb3 #7 [ffff881ff51e1870] do_page_fault at ffffffff814f4bde #8 [ffff881ff51e18a0] page_fault at ffffffff814f1f95 [exception RIP: list_del+12] RIP: ffffffff8127d75c RSP: ffff881ff51e1958 RFLAGS: 00010292 RAX: ffff881ff51e0000 RBX: 0000000000000010 RCX: ffff882017ff7400 RDX: 0000000000000000 RSI: 0000000000000030 RDI: 0000000000000010 RBP: ffff881ff51e1968 R8: ffff882018246500 R9: ffff88200f263c00 R10: ffff8820178263a0 R11: 0000000000000000 R12: ffff881ff51e19f8 R13: 00000000ffffffef R14: ffff88200b877340 R15: ffff881ff51e19f8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff881ff51e1970] arc_remove_prune_callback at ffffffffa042413c [zfs] #10 [ffff881ff51e1990] osd_device_fini at ffffffffa0d315a7 [osd_zfs] #11 [ffff881ff51e19b0] osd_device_alloc at ffffffffa0d31c90 [osd_zfs] #12 [ffff881ff51e19e0] obd_setup at ffffffffa0736a67 [obdclass] #13 [ffff881ff51e1aa0] class_setup at ffffffffa0736d78 [obdclass] #14 [ffff881ff51e1af0] class_process_config at ffffffffa073e28c [obdclass] #15 [ffff881ff51e1b80] do_lcfg at ffffffffa0746249 [obdclass] #16 [ffff881ff51e1c60] lustre_start_simple at ffffffffa0746614 [obdclass] #17 [ffff881ff51e1cc0] lustre_fill_super at ffffffffa0756883 [obdclass] #18 [ffff881ff51e1da0] get_sb_nodev at ffffffff8117ab1f #19 [ffff881ff51e1de0] lustre_get_sb at ffffffffa0741315 [obdclass] #20 [ffff881ff51e1e00] vfs_kern_mount at ffffffff8117a77b #21 [ffff881ff51e1e50] do_kern_mount at ffffffff8117a922 #22 [ffff881ff51e1ea0] do_mount at ffffffff81198fa2 #23 [ffff881ff51e1f20] sys_mount at ffffffff81199630 #24 [ffff881ff51e1f80] system_call_fastpath at ffffffff8100b0f2 RIP: 00007ffff771345a RSP: 00007fffffff99d8 RFLAGS: 00010206 RAX: 00000000000000a5 RBX: ffffffff8100b0f2 RCX: 0000000001000000 RDX: 0000000000408087 RSI: 00007fffffffca48 RDI: 0000000000618480 RBP: 0000000000000000 R8: 0000000000618670 R9: 0000000000000000 R10: 0000000001000000 R11: 0000000000000206 R12: 000000000060bbd8 R13: 000000000060bbd0 R14: 0000000000618670 R15: 0000000000000000 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b

            People

              bzzz Alex Zhuravlev
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: