summary refs log tree commit diff
AgeCommit message (Collapse)Author
2020-03-27dm clone: Add overflow check for number of regionsNikos Tsironis
Add overflow check for clone->nr_regions variable, which holds the number of regions of the target. The overflow can occur with sufficiently large devices, if BITS_PER_LONG == 32. E.g., if the region size is 8 sectors (4K), the overflow would occur for device sizes > 34359738360 sectors (~16TB). This could result in multiple device sectors wrongly mapping to the same region number, due to the truncation from 64 bits to 32 bits, which would lead to data corruption. Fixes: 7431b7835f55 ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-27dm clone: Fix handling of partial region discardsNikos Tsironis
There is a bug in the way dm-clone handles discards, which can lead to discarding the wrong blocks or trying to discard blocks beyond the end of the device. This could lead to data corruption, if the destination device indeed discards the underlying blocks, i.e., if the discard operation results in the original contents of a block to be lost. The root of the problem is the code that calculates the range of regions covered by a discard request and decides which regions to discard. Since dm-clone handles the device in units of regions, we don't discard parts of a region, only whole regions. The range is calculated as: rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size); re = bio_end_sector(bio) >> clone->region_shift; , where 'rs' is the first region to discard and (re - rs) is the number of regions to discard. The bug manifests when we try to discard part of a single region, i.e., when we try to discard a block with size < region_size, and the discard request both starts at an offset with respect to the beginning of that region and ends before the end of the region. The root cause is the following comparison: if (rs == re) // skip discard and complete original bio immediately , which doesn't take into account that 'rs' might be greater than 're'. Thus, we then issue a discard request for the wrong blocks, instead of skipping the discard all together. Fix the check to also take into account the above case, so we don't end up discarding the wrong blocks. Also, add some range checks to dm_clone_set_region_hydrated() and dm_clone_cond_set_range(), which update dm-clone's region bitmap. Note that the aforementioned bug doesn't cause invalid memory accesses, because dm_clone_is_range_hydrated() returns True for this case, so the checks are just precautionary. Fixes: 7431b7835f55 ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-27dm writecache: add cond_resched to avoid CPU hangsMikulas Patocka
Initializing a dm-writecache device can take a long time when the persistent memory device is large. Add cond_resched() to a few loops to avoid warnings that the CPU is stuck. Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: improve discard in journal modeMikulas Patocka
When we discard something that is present in the journal, we flush the journal first, so that discarded blocks are not overwritten by the journal content. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: add optional discard supportMikulas Patocka
Add an argument "allow_discards" that enables discard processing on dm-integrity device. Discards are only allowed to devices using internal hash. When a block is discarded the integrity tag is filled with DISCARD_FILLER (0xf6) bytes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: allow resize of the integrity deviceMikulas Patocka
If the size of the underlying device changes, change the size of the integrity device too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: factor out get_provided_data_sectors()Mikulas Patocka
Move code to a new function get_provided_data_sectors(). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: don't replay journal data past the end of the deviceMikulas Patocka
Following commits will make it possible to shrink or extend the device. If the device was shrunk, we don't want to replay journal data pointing past the end of the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: remove sector type castsMikulas Patocka
Since the commit 72deb455b5ec619ff043c30bc90025aa3de3cdda ("block: remove CONFIG_LBDAF") sector_t is always defined as unsigned long long. Delete the needless type casts in printk and avoids some warnings if DEBUG_PRINT is defined. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: fix a crash with unusually large tag sizeMikulas Patocka
If the user specifies tag size larger than HASH_MAX_DIGESTSIZE, there's a crash in integrity_metadata(). Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()Bob Liu
zmd->nr_rnd_zones was increased twice by mistake. The other place it is increased in dmz_init_zone() is the only one needed: 1131 zmd->nr_useable_zones++; 1132 if (dmz_is_rnd(zone)) { 1133 zmd->nr_rnd_zones++; ^^^ Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm verity fec: fix memory leak in verity_fec_dtrShetty, Harshini X (EXT-Sony Mobile)
Fix below kmemleak detected in verity_fec_ctr. output_pool is allocated for each dm-verity-fec device. But it is not freed when dm-table for the verity target is removed. Hence free the output mempool in destructor function verity_fec_dtr. unreferenced object 0xffffffffa574d000 (size 4096): comm "init", pid 1667, jiffies 4294894890 (age 307.168s) hex dump (first 32 bytes): 8e 36 00 98 66 a8 0b 9b 00 00 00 00 00 00 00 00 .6..f........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000060e82407>] __kmalloc+0x2b4/0x340 [<00000000dd99488f>] mempool_kmalloc+0x18/0x20 [<000000002560172b>] mempool_init_node+0x98/0x118 [<000000006c3574d2>] mempool_init+0x14/0x20 [<0000000008cb266e>] verity_fec_ctr+0x388/0x3b0 [<000000000887261b>] verity_ctr+0x87c/0x8d0 [<000000002b1e1c62>] dm_table_add_target+0x174/0x348 [<000000002ad89eda>] table_load+0xe4/0x328 [<000000001f06f5e9>] dm_ctl_ioctl+0x3b4/0x5a0 [<00000000bee5fbb7>] do_vfs_ioctl+0x5dc/0x928 [<00000000b475b8f5>] __arm64_sys_ioctl+0x70/0x98 [<000000005361e2e8>] el0_svc_common+0xa0/0x158 [<000000001374818f>] el0_svc_handler+0x6c/0x88 [<000000003364e9f4>] el0_svc+0x8/0xc [<000000009d84cec9>] 0xffffffffffffffff Fixes: a739ff3f543af ("dm verity: add support for forward error correction") Depends-on: 6f1c819c219f7 ("dm: convert to bioset_init()/mempool_init()") Cc: stable@vger.kernel.org Signed-off-by: Harshini Shetty <harshini.x.shetty@sony.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm writecache: optimize superblock writeMikulas Patocka
If we write a superblock in writecache_flush, we don't need to set bit and scan the bitmap for it - we can just write the superblock directly. Also, we can set the flag REQ_FUA on the write bio, so that we don't need to submit a flush bio afterwards. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm writecache: implement gradual cleanupMikulas Patocka
If a block is stored in the cache for too long, it will now be written to the underlying device and cleaned up. Add a new option "max_age" that specifies the maximum age of a block in milliseconds. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm writecache: implement the "cleaner" policyMikulas Patocka
The "flush" or "flush_on_suspend" messages flush the whole cache. However, these flushing methods can take some time and the process is left in an interruptible state during the flush. Implement a "cleaner" option that offers an alternate flushing method. When this option is activated (either by a message or in the constructor arguments), the cache will not promote new writes (however, writes to already cached blocks are promoted, to avoid data corruption due to misordered writes) and it will gradually writeback any cached data. The userspace can then monitor the cleaning process with "dmsetup status". When the number of cached bloks drops to zero, the userspace can unload the dm-writecache target and replace it with dm-linear or other targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm writecache: do direct write if the cache is fullMikulas Patocka
If the cache device is full, we do a direct write to the origin device. Note that we must not do it if the written block is already in the cache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm integrity: print device name in integrity_metadata() error messageErich Eckner
Similar to f710126cfc89c8df477002a26dee8407eb0b4acd ("dm crypt: print device name in integrity error message"), this message should also better identify the device with the integrity failure. Signed-off-by: Erich Eckner <git@eckner.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-24dm crypt: use crypt_integrity_aead() helperYang Yingliang
Replace test_bit(CRYPT_MODE_INTEGRITY_AEAD, XXX) with crypt_integrity_aead(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-15Linux 5.6-rc6Linus Torvalds
2020-03-15Merge tag 'irq-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single commit to handle an erratum in Cavium ThunderX to prevent access to GIC registers which are broken in the implementation" * tag 'irq-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2
2020-03-15Merge tag 'locking-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Thomas Gleixner: "Fix for yet another subtle futex issue. The futex code used ihold() to prevent inodes from vanishing, but ihold() does not guarantee inode persistence. Replace the inode pointer with a per boot, machine wide, unique inode identifier. The second commit fixes the breakage of the hash mechanism which causes a 100% performance regression" * tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Unbreak futex hashing futex: Fix inode life-time issue
2020-03-15Merge tag 'x86-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Map EFI runtime service data as encrypted when SEV is enabled. Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode. - Remove the warning in the vector management code which triggered when a managed interrupt affinity changed outside of a CPU hotplug operation. The warning was correct until the recent core code change that introduced a CPU isolation feature which needs to migrate managed interrupts away from online CPUs under certain conditions to achieve the isolation" * tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vector: Remove warning on managed interrupt migration x86/ioremap: Map EFI runtime services data as encrypted for SEV
2020-03-15Merge tag 'perf-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf fixes: Kernel side: - AMD uncore driver: Replace the open coded sanity check with the core variant, which provides the correct error code and also leaves a hint in dmesg Tooling: - Fix the stdio input handling with glibc versions >= 2.28 - Unbreak the futex-wake benchmark which was reduced to 0 test threads due to the conversion to cpumaps - Initialize sigaction structs before invoking sys_sigactio() - Plug the mapfile memory leak in perf jevents - Fix off by one relative directory includes - Fix an undefined string comparison in perf diff" * tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag tools: Fix off-by 1 relative directory includes perf jevents: Fix leak of mapfile memory perf bench: Clear struct sigaction before sigaction() syscall perf bench futex-wake: Restore thread count default to online CPU count perf top: Fix stdio interface input handling with glibc 2.28+ perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare perf symbols: Don't try to find a vmlinux file when looking for kernel modules perf bench: Share some global variables to fix build with gcc 10 perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files perf env: Do not return pointers to local variables perf tests bp_account: Make global variable static
2020-03-15Merge tag 'timers-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix adding the missing time namespace adjustment in sys/sysinfo which caused sys/sysinfo to be inconsistent with /proc/uptime when read from a task inside a time namespace" * tag 'timers-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sys/sysinfo: Respect boottime inside time namespace
2020-03-15Merge tag 'ras-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two RAS related fixes: - Shut down the per CPU thermal throttling poll work properly when a CPU goes offline. The missing shutdown caused the poll work to be migrated to a unbound worker which triggered warnings about the usage of smp_processor_id() in preemptible context - Fix the PPIN feature initialization which missed to enable the functionality when PPIN_CTL was enabled but the MSR locked against updates" * tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix logic and comments around MSR_PPIN_CTL x86/mce/therm_throt: Undo thermal polling properly on CPU offline
2020-03-15Merge tag 'efi-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: "Two EFI fixes: - Prevent a race and buffer overflow in the sysfs efivars interface which causes kernel memory corruption. - Add the missing NULL pointer checks in efivar_store_raw()" * tag 'efi-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Add a sanity check to efivar_store_raw() efi: Fix a race and a buffer overflow while reading efivars via sysfs
2020-03-15Merge tag 'iommu-fixes-v5.6-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - Intel VT-d fixes: - RCU list handling fixes - Replace WARN_TAINT with pr_warn + add_taint for reporting firmware issues - DebugFS fixes - Fix for hugepage handling in iova_to_phys implementation - Fix for handling VMD devices, which have a domain number which doesn't fit into 16 bits - Warning message fix - MSI allocation fix for iommu-dma code - Sign-extension fix for io page-table code - Fix for AMD-Vi to properly update the is-running bit when AVIC is used * tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Populate debugfs if IOMMUs are detected iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE iommu/vt-d: Ignore devices with out-of-spec domain number iommu/vt-d: Fix the wrong printing in RHSA parsing iommu/vt-d: Fix debugfs register reads iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: Silence RCU-list debugging warnings iommu/vt-d: Fix RCU-list bugs in intel_iommu_init() iommu/dma: Fix MSI reservation allocation iommu/io-pgtable-arm: Fix IOVA validation for 32-bit iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page iommu/vt-d: Fix RCU list debugging warnings
2020-03-15Merge tag 'irqchip-fixes-5.6-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Add workaround for Cavium/Marvell ThunderX unimplemented GIC registers
2020-03-14Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has quite some regression fixes this time. One is also related to watchdogs, we have proper acks from Guenter for them" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: acpi: put device when verifying client fails misc: eeprom: at24: fix regulator underflow i2c: gpio: suppress error on probe defer macintosh: windfarm: fix MODINFO regression i2c: designware-pci: Fix BUG_ON during device removal i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional watchdog: iTCO_wdt: Export vendorsupport
2020-03-14Merge tag 'arc-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Fix __ALIGN_STR and __ALIGN to not use default junk padding - Misc Kconfig cleanups, header updates * tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: define __ALIGN_STR and __ALIGN symbols for ARC ARC: show_regs: reduce lines of output ARC: Replace <linux/clk-provider.h> by <linux/of_clk.h> ARC: fpu: fix randconfig build error reported by 0-day test service ARC: fix some Kconfig typos ARC: Cleanup old Kconfig IO scheduler options
2020-03-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes for x86 and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1 KVM: s390: Also reset registers in sync regs for initial cpu reset KVM: fix Kconfig menu text for -Werror KVM: x86: remove stale comment from struct x86_emulate_ctxt KVM: x86: clear stale x86_emulate_ctxt->intercept value KVM: SVM: Fix the svm vmexit code for WRMSR KVM: X86: Fix dereference null cpufreq policy
2020-03-14iommu/vt-d: Populate debugfs if IOMMUs are detectedMegha Dey
Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel) gets populated only when DMA remapping is enabled (dmar_disabled = 0) irrespective of whether interrupt remapping is enabled or not. Instead, populate the intel iommu debugfs directory if any IOMMUs are detected. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support") Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A small collection of fixes. I'll make another sweep soon to look for more fixes for this -rc series. - Mark device node const in of_clk_get_parent APIs to ease landing changes in users later - Fix flag for Qualcomm SC7180 video clocks where we thought it would never turn off but actually hardware takes care of it - Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because this clk is always on anyway - Correct some bad dt-binding numbers for i.MX8MN SoCs" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8mn: Fix incorrect clock defines clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk of: clk: Make of_clk_get_parent_{count,name}() parameter const
2020-03-14Merge branch 'kvm-null-pointer-fix' into kvm-masterPaolo Bonzini
2020-03-14KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAsVitaly Kuznetsov
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter with EVMCS GPA = 0 the host crashes because the evmcs_gpa != vmx->nested.hv_evmcs_vmptr condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will happen on vmx->nested.hv_evmcs pointer dereference. Another problematic EVMCS ptr value is '-1' but it only causes host crash after nested_release_evmcs() invocation. The problem is exactly the same as with '0', we mistakenly think that the EVMCS pointer hasn't changed and thus nested.hv_evmcs_vmptr is valid. Resolve the issue by adding an additional !vmx->nested.hv_evmcs check to nested_vmx_handle_enlightened_vmptrld(), this way we will always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL and this is supposed to catch all invalid EVMCS GPAs. Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs() to be consistent with initialization where we don't currently set hv_evmcs_vmptr to '-1'. Cc: stable@vger.kernel.org Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14Merge tag 'kvm-s390-master-5.6-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fully do the CPU resets as intended With 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") we clarified the meaning of the reset ioctl to fully reset the CPU and not only the parts that can not be handled by userspace. Turns out that we missed some parts.
2020-03-14irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2Marc Zyngier
Despite the architecture spec requiring that reserved registers in the GIC distributor memory map are RES0 (and thus are not allowed to generate an exception), the Cavium ThunderX (aka TX1) SoC explodes as such: [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GICv3: 128 SPIs implemented [ 0.000000] GICv3: 0 Extended SPIs implemented [ 0.000000] Internal error: synchronous external abort: 96000210 [#1] SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-00035-g3cf6a3d5725f #7956 [ 0.000000] Hardware name: cavium,thunder-88xx (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : __raw_readl+0x0/0x8 [ 0.000000] lr : gic_init_bases+0x110/0x560 [ 0.000000] sp : ffff800011243d90 [ 0.000000] x29: ffff800011243d90 x28: 0000000000000000 [ 0.000000] x27: 0000000000000018 x26: 0000000000000002 [ 0.000000] x25: ffff8000116f0000 x24: ffff000fbe6a2c80 [ 0.000000] x23: 0000000000000000 x22: ffff010fdc322b68 [ 0.000000] x21: ffff800010a7a208 x20: 00000000009b0404 [ 0.000000] x19: ffff80001124dad0 x18: 0000000000000010 [ 0.000000] x17: 000000004d8d492b x16: 00000000f67eb9af [ 0.000000] x15: ffffffffffffffff x14: ffff800011249908 [ 0.000000] x13: ffff800091243ae7 x12: ffff800011243af4 [ 0.000000] x11: ffff80001126e000 x10: ffff800011243a70 [ 0.000000] x9 : 00000000ffffffd0 x8 : ffff80001069c828 [ 0.000000] x7 : 0000000000000059 x6 : ffff8000113fb4d1 [ 0.000000] x5 : 0000000000000001 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : ffff8000116f000c [ 0.000000] Call trace: [ 0.000000] __raw_readl+0x0/0x8 [ 0.000000] gic_of_init+0x188/0x224 [ 0.000000] of_irq_init+0x200/0x3cc [ 0.000000] irqchip_init+0x1c/0x40 [ 0.000000] init_IRQ+0x160/0x1d0 [ 0.000000] start_kernel+0x2ec/0x4b8 [ 0.000000] Code: a8c47bfd d65f03c0 d538d080 d65f03c0 (b9400000) when reading the GICv4.1 GICD_TYPER2 register, which is unexpected... Work around it by adding a new quirk for the following variants: ThunderX: CN88xx OCTEON TX: CN83xx, CN81xx OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx* and use this flag to avoid accessing GICD_TYPER2. Note that all reserved registers (including redistributors and ITS) are impacted by this erratum, but that only GICD_TYPER2 has to be worked around so far. Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Robert Richter <rrichter@marvell.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Tim Harvey <tharvey@gateworks.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/r/20191027144234.8395-11-maz@kernel.org Link: https://lore.kernel.org/r/20200311115649.26060-1-maz@kernel.org
2020-03-14KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirectNitesh Narayan Lal
Previously all fields of structure kvm_lapic_irq were not initialized before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause an issue when any of those fields are used for processing a request. For example not initializing the msi_redir_hint field before passing to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of kvm_apic_map_get_dest_lapic(). This will specifically happen when the kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage value of msi_redir_hint, which should not happen as the request belongs to APIC fixed delivery mode and we do not want to deliver the interrupt only to the lowest priority candidate. This patch initializes all the fields of kvm_lapic_irq based on the values of ioapic redirect_entry object before passing it on to kvm_bitmap_or_dest_vcpus(). Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Set level to false since the value doesn't really matter. Suggested by Vitaly Kuznetsov. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1Sean Christopherson
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1 is not supported[*], i.e. intercepting ENCLS to inject a #UD is unnecessary. Avoiding ENCLS-exiting even when it is reported as supported by the CPU works around a reported issue where SGX is "hard" disabled after an S3 suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control are enumerated as unsupported. While the root cause of the S3 issue is unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a workaround for what is most likely a hardware or firmware issue. As a bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and vmcs02. Note, SGX must be disabled in BIOS to take advantage of this workaround [*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are cleared. Soft disabled meaning disabling SGX without clearing the primary CPUID bit (in leaf 0x7) and without poking into non-SGX CPU paths, e.g. for the VMCS controls. Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest") Reported-by: Toni Spets <toni.spets@iki.fi> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTESuravee Suthikulpanit
Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") accidentally left out the ir_data pointer when calling modity_irte_ga(), which causes the function amd_iommu_update_ga() to return prematurely due to struct amd_ir_data.ref is NULL and the "is_run" bit of IRTE does not get updated properly. This results in bad I/O performance since IOMMU AVIC always generate GA Log entry and notify IOMMU driver and KVM when it receives interrupt from the PCI pass-through device instead of directly inject interrupt to the vCPU. Fixes by passing ir_data when calling modify_irte_ga() as done previously. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14iommu/vt-d: Ignore devices with out-of-spec domain numberDaniel Drake
VMD subdevices are created with a PCI domain ID of 0x10000 or higher. These subdevices are also handled like all other PCI devices by dmar_pci_bus_notifier(). However, when dmar_alloc_pci_notify_info() take records of such devices, it will truncate the domain ID to a u16 value (in info->seg). The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if it is 0000:00:02.0. In the unlucky event that a real device also exists at 0000:00:02.0 and also has a device-specific entry in the DMAR table, dmar_insert_dev_scope() will crash on:   BUG_ON(i >= devices_cnt); That's basically a sanity check that only one PCI device matches a single DMAR entry; in this case we seem to have two matching devices. Fix this by ignoring devices that have a domain number higher than what can be looked up in the DMAR table. This problem was carefully diagnosed by Jian-Hong Pan. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Fixes: 59ce0515cdaf3 ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14iommu/vt-d: Fix the wrong printing in RHSA parsingZhenzhong Duan
When base address in RHSA structure doesn't match base address in each DRHD structure, the base address in last DRHD is printed out. This doesn't make sense when there are multiple DRHD units, fix it by printing the buggy RHSA's base address. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Fixes: fd0c8894893cb ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-13Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in drivers: ipr and ufs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ipr: Fix softlockup when rescanning devices in petitboot scsi: ufs: Fix possible unclocked access to auto hibern8 timer register
2020-03-13Merge tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "These are mostly fscontext fixes, but there is also one that fixes collisions seen in fscache: - Ensure the fs_context has the correct fs_type when mounting and submounting - Fix leaking of ctx->nfs_server.hostname - Add minor version to fscache key to prevent collisions" * tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs: add minor version to nfs_server_key for fscache NFS: Fix leak of ctx->nfs_server.hostname NFS: Don't hard-code the fs_type when submounting NFS: Ensure the fs_context has the correct fs_type before mounting
2020-03-13Merge tag 'fuse-fixes-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "Fix an Oops introduced in v5.4" * tag 'fuse-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix stack use after return
2020-03-13Merge tag 'ovl-fixes-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix three bugs introduced in this cycle" * tag 'ovl-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix lockdep warning for async write ovl: fix some xino configurations ovl: fix lock in ovl_llseek()
2020-03-13Merge tag 'pm-5.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix cpupower utility build failures with -fno-common enabled (Mike Gilbert)" * tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpupower: avoid multiple definition with gcc -fno-common
2020-03-13Merge tag 'io_uring-5.6-2020-03-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix here, improving the RCU callback ordering from last week. After a bit more perusing by Paul, he poked a hole in the original" * tag 'io_uring-5.6-2020-03-13' of git://git.kernel.dk/linux-block: io_uring: ensure RCU callback ordering with rcu_barrier()
2020-03-13Merge tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should go into this release. This contains: - Fix for a corruption issue with the s390 dasd driver (Stefan) - Fixup/improvement for the flush insertion change that we had in this series (Ming) - Fix for the partition suppor for host aware zoned devices (Shin'ichiro) - Fix incorrect blk-iocost comparison (Tejun) The diffstat looks large, but that's a) mostly dasd, and b) the flush fix from Ming adds a big comment" * tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-block: block: Fix partition support for host aware zoned block devices blk-mq: insert flush request to the front of dispatch queue s390/dasd: fix data corruption for thin provisioned devices blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
2020-03-13Merge tag 'mmc-v5.6-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix HW busy detection support for host controllers requiring the MMC_RSP_BUSY response flag (R1B) to be set for the command. In particular for CMD6 (eMMC), erase/trim/discard (SD/eMMC) and CMD5 (eMMC sleep). MMC host: - sdhci-omap|tegra: Fix support for HW busy detection" * tag 'mmc-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard mmc: core: Allow host controllers to require R1B for CMD6