Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13845

Kernel crash on: lfs quota -u $(( (1<<32) -1))

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.13.0, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      The following commands cause systematically a crash on the mds:

       

      lfs quota -u|g|p $(( (1<<32) -1)) ... 
      lfs set_quota -u|g|p $(( (1<<32) -1))...

      The uid/gid value "(uit_t) -1" is a special value use for example as default value by chown(). The valid range of UID/GID is in range of 0 to 0xFFFFFFFE.
      Ref: https://systemd.io/UIDS-GIDS/

       

      So "lfs set_quota and "lfs quota" should consider 0xFFFFFFFF as an invalid quota id.

      Here my crash console message:

      [18075.268361] LustreError: Skipped 7 previous similar messages
      [19623.238824] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e
      [19623.239412] IP: [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
      [19623.240130] PGD 80000000d6a70067 PUD d6a71067 PMD 0
      [19623.240633] Oops: 0000 [#1] SMP 
      [19623.241189] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) mbcache jbd2 libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq aesni_intel snd_seq_device lrw snd_pcm gf128mul glue_helper ablk_helper cryptd snd_timer sg snd pcspkr vboxguest(OE) soundcore i2c_piix4 parport_pc video parport ip_tables xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm ata_piix libahci crct10dif_pclmul crct10dif_common libata e1000 crc32c_intel
      [19623.245675] serio_raw drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dm_flakey]
      [19623.246811] CPU: 1 PID: 11998 Comm: lquota_wb_lustr Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.10.1.el7_lustre_ajmes.x86_64 #1
      [19623.248020] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [19623.248674] task: ffff90bc594541c0 ti: ffff90bbae1ac000 task.ti: ffff90bbae1ac000
      [19623.249400] RIP: 0010:[<ffffffffba4bdb8d>] [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
      [19623.250018] RSP: 0018:ffff90bbae1afcd0 EFLAGS: 00010246
      [19623.250558] RAX: 0000000000000000 RBX: ffff90bb44b0dac8 RCX: 0000000000000000
      [19623.251190] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff90bb44b0db38
      [19623.251719] RBP: ffff90bbae1afce0 R08: ffff90bb9fc54000 R09: 0000000000000001
      [19623.252364] R10: ffff90bc5fc9ff50 R11: ffffe6a5846571c0 R12: ffffffffffffffea
      [19623.252942] R13: ffff90bb4103e200 R14: 00000000ffffffff R15: ffff90bb9fc5b000
      [19623.253460] FS: 0000000000000000(0000) GS:ffff90bc5fc80000(0000) knlGS:0000000000000000
      [19623.254049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [19623.254555] CR2: 000000000000007e CR3: 00000000d8290000 CR4: 00000000000606e0
      [19623.255146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [19623.255638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [19623.256120] Call Trace:
      [19623.256597] [<ffffffffc108a945>] osd_acct_index_lookup+0x235/0x480 [osd_ldiskfs]
      [19623.257148] [<ffffffffc0fe071f>] lquota_disk_read+0x12f/0x390 [lquota]
      [19623.257621] [<ffffffffc0fe852a>] qsd_refresh_usage+0x6a/0x2f0 [lquota]
      [19623.258142] [<ffffffffc0fe8970>] qsd_lqe_read+0x1c0/0x5f0 [lquota]
      [19623.258592] [<ffffffffc0fe31e3>] lqe_locate_find+0x243/0x840 [lquota]
      [19623.259087] [<ffffffffc0ff0563>] qsd_upd_thread+0x853/0xc50 [lquota]
      [19623.259529] [<ffffffffc0fefd10>] ? qsd_upd_add+0xe0/0xe0 [lquota]
      [19623.260011] [<ffffffffba2c6691>] kthread+0xd1/0xe0
      [19623.260439] [<ffffffffba486ac0>] ? end_buffer_async_read+0x130/0x130
      [19623.260919] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40
      [19623.261415] [<ffffffffba992d37>] ret_from_fork_nospec_begin+0x21/0x21
      [19623.261885] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40
      [19623.262292] Code: 01 00 00 89 d1 31 c0 c1 e9 03 f6 c2 04 f3 48 ab 0f 85 20 01 00 00 f6 c2 02 0f 85 ff 00 00 00 83 e2 01 0f 85 e6 00 00 00 c6 03 01 <41> 83 bc 24 94 00 00 00 01 19 c0 83 e0 fd 83 c0 04 88 43 01 49
      [19623.263625] RIP [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
      [19623.264099] RSP <ffff90bbae1afcd0>
      [19623.264485] CR2: 000000000000007e

       

      I don't think that I saw this crash when I developed the patch for the LU-12549 , I was on an another environnement.

      Attachments

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: