Changelog in Linux kernel 6.9.5

9p: add missing locking around taking dentry fid list [+ + +]

Author: Dominique Martinet <asmadeus@codewreck.org>
Date:   Tue May 21 21:13:36 2024 +0900

    9p: add missing locking around taking dentry fid list
    
    commit c898afdc15645efb555acb6d85b484eb40a45409 upstream.
    
    Fix a use-after-free on dentry's d_fsdata fid list when a thread
    looks up a fid through dentry while another thread unlinks it:
    
    UAF thread:
    refcount_t: addition on 0; use-after-free.
     p9_fid_get linux/./include/net/9p/client.h:262
     v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129
     v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181
     v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314
     v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400
     vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248
    
    Freed by:
     p9_fid_destroy (inlined)
     p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456
     p9_fid_put linux/./include/net/9p/client.h:278
     v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55
     v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518
     vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335
    
    The problem is that d_fsdata was not accessed under d_lock, because
    d_release() normally is only called once the dentry is otherwise no
    longer accessible but since we also call it explicitly in v9fs_remove
    that lock is required:
    move the hlist out of the dentry under lock then unref its fids once
    they are no longer accessible.
    
    Fixes: 154372e67d40 ("fs/9p: fix create-unlink-getattr idiom")
    Cc: stable@vger.kernel.org
    Reported-by: Meysam Firouzi
    Reported-by: Amirmohammad Eftekhar
    Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
    Message-ID: <20240521122947.1080227-1-asmadeus@codewreck.org>
    Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: APEI: EINJ: Fix einj_dev release leak [+ + +]

Author: Dan Williams <dan.j.williams@intel.com>
Date:   Tue May 21 15:46:32 2024 -0700

    ACPI: APEI: EINJ: Fix einj_dev release leak
    
    commit 7ff6c798eca05e4a9dcb80163cb454d7787a4bc3 upstream.
    
    The platform driver conversion of EINJ mistakenly used
    platform_device_del() to unwind platform_device_register_full() at
    module exit. This leads to a small leak of one 'struct platform_device'
    instance per module load/unload cycle. Switch to
    platform_device_unregister() which performs both device_del() and final
    put_device().
    
    Fixes: 5621fafaac00 ("EINJ: Migrate to a platform driver")
    Cc: 6.9+ <stable@vger.kernel.org> # 6.9+
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx [+ + +]

Author: Christoffer Sandberg <cs@tuxedo.de>
Date:   Mon Apr 22 10:04:36 2024 +0200

    ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx
    
    commit c81bf14f9db68311c2e75428eea070d97d603975 upstream.
    
    Listed devices need the override for the keyboard to work.
    
    Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
    Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

afs: Don't cross .backup mountpoint from backup volume [+ + +]

Author: Marc Dionne <marc.dionne@auristor.com>
Date:   Fri May 24 17:17:55 2024 +0100

    afs: Don't cross .backup mountpoint from backup volume
    
    commit 29be9100aca2915fab54b5693309bc42956542e5 upstream.
    
    Don't cross a mountpoint that explicitly specifies a backup volume
    (target is <vol>.backup) when starting from a backup volume.
    
    It it not uncommon to mount a volume's backup directly in the volume
    itself.  This can cause tools that are not paying attention to get
    into a loop mounting the volume onto itself as they attempt to
    traverse the tree, leading to a variety of problems.
    
    This doesn't prevent the general case of loops in a sequence of
    mountpoints, but addresses a common special case in the same way
    as other afs clients.
    
    Reported-by: Jan Henrik Sylvester <jan.henrik.sylvester@uni-hamburg.de>
    Link: http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html
    Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
    Link: http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.html
    Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Link: https://lore.kernel.org/r/768760.1716567475@warthog.procyon.org.uk
    Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
    cc: linux-afs@lists.infradead.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: seq: Fix incorrect UMP type for system messages [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed May 29 10:37:59 2024 +0200

    ALSA: seq: Fix incorrect UMP type for system messages
    
    commit edb32776196afa393c074d6a2733e3a69e66b299 upstream.
    
    When converting a legacy system message to a UMP packet, it forgot to
    modify the UMP type field but keeping the default type (either type 2
    or 4).  Correct to the right type for system messages.
    
    Fixes: e9e02819a98a ("ALSA: seq: Automatic conversion of UMP events")
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240529083800.5742-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: ump: Don't accept an invalid UMP protocol number [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed May 29 18:47:16 2024 +0200

    ALSA: ump: Don't accept an invalid UMP protocol number
    
    commit ac0d71ee534e67c7e53439e8e9cb45ed40731660 upstream.
    
    When a UMP Stream Configuration message is received, the driver tries
    to switch the protocol, but there was no sanity check of the protocol,
    hence it can pass an invalid value.  Add the check and bail out if a
    wrong value is passed.
    
    Fixes: a79807683781 ("ALSA: ump: Add helper to change MIDI protocol")
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240529164723.18309-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: ump: Don't clear bank selection after sending a program change [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed May 29 10:38:21 2024 +0200

    ALSA: ump: Don't clear bank selection after sending a program change
    
    commit fe85f6e607d75b856e7229924c71f55e005f8284 upstream.
    
    The current code clears the bank selection MSB/LSB after sending a
    program change, but this can be wrong, as many apps may not send the
    full bank selection with both MSB and LSB but sending only one.
    Better to keep the previous bank set.
    
    Fixes: 0b5288f5fe63 ("ALSA: ump: Add legacy raw MIDI support")
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/20240529083823.5778-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: hi3798cv200: fix the size of GICR [+ + +]

Author: Yang Xiwen <forbidden405@outlook.com>
Date:   Mon Feb 19 23:05:26 2024 +0800

    arm64: dts: hi3798cv200: fix the size of GICR
    
    commit 428a575dc9038846ad259466d5ba109858c0a023 upstream.
    
    During boot, Linux kernel complains:
    
    [    0.000000] GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set
    
    This SoC is using a regular GIC-400 and the GICR space size should be
    8KB rather than 256B.
    
    With this patch:
    
    [    0.000000] GIC: Using split EOI/Deactivate mode
    
    So this should be the correct fix.
    
    Fixes: 2f20182ed670 ("arm64: dts: hisilicon: add dts files for hi3798cv200-poplar board")
    Signed-off-by: Yang Xiwen <forbidden405@outlook.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240219-cache-v3-1-a33c57534ae9@outlook.com
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: qcom: qcs404: fix bluetooth device address [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed May 1 09:52:01 2024 +0200

    arm64: dts: qcom: qcs404: fix bluetooth device address
    
    commit f5f390a77f18eaeb2c93211a1b7c5e66b5acd423 upstream.
    
    The 'local-bd-address' property is used to pass a unique Bluetooth
    device address from the boot firmware to the kernel and should otherwise
    be left unset so that the OS can prevent the controller from being used
    until a valid address has been provided through some other means (e.g.
    using btmgmt).
    
    Fixes: 60f77ae7d1c1 ("arm64: dts: qcom: qcs404-evb: Enable uart3 and add Bluetooth")
    Cc: stable@vger.kernel.org      # 5.10
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
    Link: https://lore.kernel.org/r/20240501075201.4732-1-johan+linaro@kernel.org
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 6 10:56:50 2024 +0100

    arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
    
    commit 2b621971554a94094cf489314dc1c2b65401965c upstream.
    
    Add the missing PCIe CX performance level votes to avoid relying on
    other drivers (e.g. USB or UFS) to maintain the nominal performance
    level required for Gen3 speeds.
    
    Fixes: 813e83157001 ("arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes")
    Cc: stable@vger.kernel.org      # 6.2
    Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Link: https://lore.kernel.org/r/20240306095651.4551-5-johan+linaro@kernel.org
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: ti: verdin-am62: Set memory size to 2gb [+ + +]

Author: Max Krummenacher <max.krummenacher@toradex.com>
Date:   Wed Mar 20 15:29:37 2024 +0100

    arm64: dts: ti: verdin-am62: Set memory size to 2gb
    
    commit f70a88829723c1b462ea0fec15fa75809a0d670b upstream.
    
    The maximum DDR RAM size stuffed on the Verdin AM62 is 2GB,
    correct the memory node accordingly.
    
    Fixes: 316b80246b16 ("arm64: dts: ti: add verdin am62")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
    Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
    Link: https://lore.kernel.org/r/20240320142937.2028707-1-max.oss.09@gmail.com
    Signed-off-by: Nishanth Menon <nm@ti.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: tegra: Correct Tegra132 I2C alias [+ + +]

Author: Krzysztof Kozlowski <krzk@kernel.org>
Date:   Mon Apr 1 16:08:54 2024 +0200

    arm64: tegra: Correct Tegra132 I2C alias
    
    commit 2633c58e1354d7de2c8e7be8bdb6f68a0a01bad7 upstream.
    
    There is no such device as "as3722@40", because its name is "pmic".  Use
    phandles for aliases to fix relying on full node path.  This corrects
    aliases for RTC devices and also fixes dtc W=1 warning:
    
      tegra132-norrin.dts:12.3-36: Warning (alias_paths): /aliases:rtc0: aliases property is not a valid node (/i2c@7000d000/as3722@40)
    
    Fixes: 0f279ebdf3ce ("arm64: tegra: Add NVIDIA Tegra132 Norrin support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
    Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
    Signed-off-by: Thierry Reding <treding@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: samsung: exynos4412-origen: fix keypad no-autorepeat [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Tue Mar 12 19:31:03 2024 +0100

    ARM: dts: samsung: exynos4412-origen: fix keypad no-autorepeat
    
    commit 88208d3cd79821117fd3fb80d9bcab618467d37b upstream.
    
    Although the Samsung SoC keypad binding defined
    linux,keypad-no-autorepeat property, Linux driver never implemented it
    and always used linux,input-no-autorepeat.  Correct the DTS to use
    property actually implemented.
    
    This also fixes dtbs_check errors like:
    
      exynos4412-origen.dtb: keypad@100a0000: 'linux,keypad-no-autorepeat' does not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+'
    
    Cc: <stable@vger.kernel.org>
    Fixes: bd08f6277e44 ("ARM: dts: Add keypad entries to Exynos4412 based Origen")
    Link: https://lore.kernel.org/r/20240312183105.715735-2-krzysztof.kozlowski@linaro.org
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: samsung: smdk4412: fix keypad no-autorepeat [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Tue Mar 12 19:31:04 2024 +0100

    ARM: dts: samsung: smdk4412: fix keypad no-autorepeat
    
    commit 4ac4c1d794e7ff454d191bbdab7585ed8dbf3758 upstream.
    
    Although the Samsung SoC keypad binding defined
    linux,keypad-no-autorepeat property, Linux driver never implemented it
    and always used linux,input-no-autorepeat.  Correct the DTS to use
    property actually implemented.
    
    This also fixes dtbs_check errors like:
    
      exynos4412-smdk4412.dtb: keypad@100a0000: 'key-A', 'key-B', 'key-C', 'key-D', 'key-E', 'linux,keypad-no-autorepeat' do not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+'
    
    Cc: <stable@vger.kernel.org>
    Fixes: c9b92dd70107 ("ARM: dts: Add keypad entries to SMDK4412")
    Link: https://lore.kernel.org/r/20240312183105.715735-3-krzysztof.kozlowski@linaro.org
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: samsung: smdkv310: fix keypad no-autorepeat [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Tue Mar 12 19:31:02 2024 +0100

    ARM: dts: samsung: smdkv310: fix keypad no-autorepeat
    
    commit 87d8e522d6f5a004f0aa06c0def302df65aff296 upstream.
    
    Although the Samsung SoC keypad binding defined
    linux,keypad-no-autorepeat property, Linux driver never implemented it
    and always used linux,input-no-autorepeat.  Correct the DTS to use
    property actually implemented.
    
    This also fixes dtbs_check errors like:
    
      exynos4210-smdkv310.dtb: keypad@100a0000: 'linux,keypad-no-autorepeat' does not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+'
    
    Cc: <stable@vger.kernel.org>
    Fixes: 0561ceabd0f1 ("ARM: dts: Add intial dts file for EXYNOS4210 SoC, SMDKV310 and ORIGEN")
    Link: https://lore.kernel.org/r/20240312183105.715735-1-krzysztof.kozlowski@linaro.org
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: SOF: ipc4-topology: Fix input format query of process modules without base extension [+ + +]

Author: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Date:   Wed May 29 15:12:01 2024 +0300

    ASoC: SOF: ipc4-topology: Fix input format query of process modules without base extension
    
    commit ffa077b2f6ad124ec3d23fbddc5e4b0ff2647af8 upstream.
    
    If a process module does not have base config extension then the same
    format applies to all of it's inputs and the process->base_config_ext is
    NULL, causing NULL dereference when specifically crafted topology and
    sequences used.
    
    Fixes: 648fea128476 ("ASoC: SOF: ipc4-topology: set copier output format for process module")
    Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
    Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
    Cc: stable@vger.kernel.org
    Link: https://msgid.link/r/20240529121201.14687-1-peter.ujfalusi@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ata: pata_legacy: make legacy_exit() work again [+ + +]

Author: Sergey Shtylyov <s.shtylyov@omp.ru>
Date:   Sat May 4 23:27:25 2024 +0300

    ata: pata_legacy: make legacy_exit() work again
    
    commit d4a89339f17c87c4990070e9116462d16e75894f upstream.
    
    Commit defc9cd826e4 ("pata_legacy: resychronize with upstream changes and
    resubmit") missed to update legacy_exit(), so that it now fails to do any
    cleanup -- the loop body there can never be entered.  Fix that and finally
    remove now useless nr_legacy_host variable...
    
    Found by Linux Verification Center (linuxtesting.org) with the Svace static
    analysis tool.
    
    Fixes: defc9cd826e4 ("pata_legacy: resychronize with upstream changes and resubmit")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Niklas Cassel <cassel@kernel.org>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bcache: fix variable length array abuse in btree_iter [+ + +]

Author: Matthew Mirvish <matthew@mm12.xyz>
Date:   Thu May 9 09:11:17 2024 +0800

    bcache: fix variable length array abuse in btree_iter
    
    commit 3a861560ccb35f2a4f0a4b8207fa7c2a35fc7f31 upstream.
    
    btree_iter is used in two ways: either allocated on the stack with a
    fixed size MAX_BSETS, or from a mempool with a dynamic size based on the
    specific cache set. Previously, the struct had a fixed-length array of
    size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized
    iterators, which causes UBSAN to complain.
    
    This patch uses the same approach as in bcachefs's sort_iter and splits
    the iterator into a btree_iter with a flexible array member and a
    btree_iter_stack which embeds a btree_iter as well as a fixed-length
    data array.
    
    Cc: stable@vger.kernel.org
    Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368
    Signed-off-by: Matthew Mirvish <matthew@mm12.xyz>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20240509011117.2697-3-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bonding: fix oops during rmmod [+ + +]

Author: Tony Battersby <tonyb@cybernetics.com>
Date:   Tue May 14 15:57:29 2024 -0400

    bonding: fix oops during rmmod
    
    commit a45835a0bb6ef7d5ddbc0714dd760de979cb6ece upstream.
    
    "rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
    remove redundant NULL check in debugfs function").  Here are the relevant
    functions being called:
    
    bonding_exit()
      bond_destroy_debugfs()
        debugfs_remove_recursive(bonding_debug_root);
        bonding_debug_root = NULL; <--------- SET TO NULL HERE
      bond_netlink_fini()
        rtnl_link_unregister()
          __rtnl_link_unregister()
            unregister_netdevice_many_notify()
              bond_uninit()
                bond_debug_unregister()
                  (commit removed check for bonding_debug_root == NULL)
                  debugfs_remove()
                  simple_recursive_removal()
                    down_write() -> OOPS
    
    However, reverting the bad commit does not solve the problem completely
    because the original code contains a race that could cause the same
    oops, although it was much less likely to be triggered unintentionally:
    
    CPU1
      rmmod bonding
        bonding_exit()
          bond_destroy_debugfs()
            debugfs_remove_recursive(bonding_debug_root);
    
    CPU2
      echo -bond0 > /sys/class/net/bonding_masters
        bond_uninit()
          bond_debug_unregister()
            if (!bonding_debug_root)
    
    CPU1
            bonding_debug_root = NULL;
    
    So do NOT revert the bad commit (since the removed checks were racy
    anyway), and instead change the order of actions taken during module
    removal.  The same oops can also happen if there is an error during
    module init, so apply the same fix there.
    
    Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
    Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: fix multi-uprobe PID filtering logic [+ + +]

Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue May 21 09:33:57 2024 -0700

    bpf: fix multi-uprobe PID filtering logic
    
    commit 46ba0e49b64232adac35a2bc892f1710c5b0fb7f upstream.
    
    Current implementation of PID filtering logic for multi-uprobes in
    uprobe_prog_run() is filtering down to exact *thread*, while the intent
    for PID filtering it to filter by *process* instead. The check in
    uprobe_prog_run() also differs from the analogous one in
    uprobe_multi_link_filter() for some reason. The latter is correct,
    checking task->mm, not the task itself.
    
    Fix the check in uprobe_prog_run() to perform the same task->mm check.
    
    While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
    type of lookup, given the intent is to get a representative task of an
    entire process. This doesn't change behavior, but seems more logical. It
    would hold task group leader task now, not any random thread task.
    
    Last but not least, given multi-uprobe support is half-broken due to
    this PID filtering logic (depending on whether PID filtering is
    important or not), we need to make it easy for user space consumers
    (including libbpf) to easily detect whether PID filtering logic was
    already fixed.
    
    We do it here by adding an early check on passed pid parameter. If it's
    negative (and so has no chance of being a valid PID), we return -EINVAL.
    Previous behavior would eventually return -ESRCH ("No process found"),
    given there can't be any process with negative PID. This subtle change
    won't make any practical change in behavior, but will allow applications
    to detect PID filtering fixes easily. Libbpf fixes take advantage of
    this in the next patch.
    
    Cc: stable@vger.kernel.org
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Fixes: b733eeade420 ("bpf: Add pid filter support for uprobe_multi link")
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: fix crash on racing fsync and size-extending write into prealloc [+ + +]

Author: Omar Sandoval <osandov@fb.com>
Date:   Fri May 24 13:58:11 2024 -0700

    btrfs: fix crash on racing fsync and size-extending write into prealloc
    
    commit 9d274c19a71b3a276949933859610721a453946b upstream.
    
    We have been seeing crashes on duplicate keys in
    btrfs_set_item_key_safe():
    
      BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/ctree.c:2620!
      invalid opcode: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]
    
    With the following stack trace:
    
      #0  btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4)
      #1  btrfs_drop_extents (fs/btrfs/file.c:411:4)
      #2  log_one_extent (fs/btrfs/tree-log.c:4732:9)
      #3  btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9)
      #4  btrfs_log_inode (fs/btrfs/tree-log.c:6626:9)
      #5  btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8)
      #6  btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8)
      #7  btrfs_sync_file (fs/btrfs/file.c:1933:8)
      #8  vfs_fsync_range (fs/sync.c:188:9)
      #9  vfs_fsync (fs/sync.c:202:9)
      #10 do_fsync (fs/sync.c:212:9)
      #11 __do_sys_fdatasync (fs/sync.c:225:9)
      #12 __se_sys_fdatasync (fs/sync.c:223:1)
      #13 __x64_sys_fdatasync (fs/sync.c:223:1)
      #14 do_syscall_x64 (arch/x86/entry/common.c:52:14)
      #15 do_syscall_64 (arch/x86/entry/common.c:83:7)
      #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)
    
    So we're logging a changed extent from fsync, which is splitting an
    extent in the log tree. But this split part already exists in the tree,
    triggering the BUG().
    
    This is the state of the log tree at the time of the crash, dumped with
    drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py)
    to get more details than btrfs_print_leaf() gives us:
    
      >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"])
      leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610
      leaf 33439744 flags 0x100000000000000
      fs uuid e5bd3946-400c-4223-8923-190ef1f18677
      chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
              item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160
                      generation 7 transid 9 size 8192 nbytes 8473563889606862198
                      block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                      sequence 204 flags 0x10(PREALLOC)
                      atime 1716417703.220000000 (2024-05-22 15:41:43)
                      ctime 1716417704.983333333 (2024-05-22 15:41:44)
                      mtime 1716417704.983333333 (2024-05-22 15:41:44)
                      otime 17592186044416.000000000 (559444-03-08 01:40:16)
              item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13
                      index 195 namelen 3 name: 193
              item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37
                      location key (0 UNKNOWN.0 0) type XATTR
                      transid 7 data_len 1 name_len 6
                      name: user.a
                      data a
              item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53
                      generation 9 type 1 (regular)
                      extent data disk byte 303144960 nr 12288
                      extent data offset 0 nr 4096 ram 12288
                      extent compression 0 (none)
              item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53
                      generation 9 type 2 (prealloc)
                      prealloc data disk byte 303144960 nr 12288
                      prealloc data offset 4096 nr 8192
              item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53
                      generation 9 type 2 (prealloc)
                      prealloc data disk byte 303144960 nr 12288
                      prealloc data offset 8192 nr 4096
      ...
    
    So the real problem happened earlier: notice that items 4 (4k-12k) and 5
    (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and
    item 5 starts at i_size.
    
    Here is the state of the filesystem tree at the time of the crash:
    
      >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root
      >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0))
      >>> print_extent_buffer(nodes[0])
      leaf 30425088 level 0 items 184 generation 9 owner 5
      leaf 30425088 flags 0x100000000000000
      fs uuid e5bd3946-400c-4223-8923-190ef1f18677
      chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
            ...
              item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160
                      generation 7 transid 7 size 4096 nbytes 12288
                      block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                      sequence 6 flags 0x10(PREALLOC)
                      atime 1716417703.220000000 (2024-05-22 15:41:43)
                      ctime 1716417703.220000000 (2024-05-22 15:41:43)
                      mtime 1716417703.220000000 (2024-05-22 15:41:43)
                      otime 1716417703.220000000 (2024-05-22 15:41:43)
              item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13
                      index 195 namelen 3 name: 193
              item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37
                      location key (0 UNKNOWN.0 0) type XATTR
                      transid 7 data_len 1 name_len 6
                      name: user.a
                      data a
              item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53
                      generation 9 type 1 (regular)
                      extent data disk byte 303144960 nr 12288
                      extent data offset 0 nr 8192 ram 12288
                      extent compression 0 (none)
              item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53
                      generation 9 type 2 (prealloc)
                      prealloc data disk byte 303144960 nr 12288
                      prealloc data offset 8192 nr 4096
    
    Item 5 in the log tree corresponds to item 183 in the filesystem tree,
    but nothing matches item 4. Furthermore, item 183 is the last item in
    the leaf.
    
    btrfs_log_prealloc_extents() is responsible for logging prealloc extents
    beyond i_size. It first truncates any previously logged prealloc extents
    that start beyond i_size. Then, it walks the filesystem tree and copies
    the prealloc extent items to the log tree.
    
    If it hits the end of a leaf, then it calls btrfs_next_leaf(), which
    unlocks the tree and does another search. However, while the filesystem
    tree is unlocked, an ordered extent completion may modify the tree. In
    particular, it may insert an extent item that overlaps with an extent
    item that was already copied to the log tree.
    
    This may manifest in several ways depending on the exact scenario,
    including an EEXIST error that is silently translated to a full sync,
    overlapping items in the log tree, or this crash. This particular crash
    is triggered by the following sequence of events:
    
    - Initially, the file has i_size=4k, a regular extent from 0-4k, and a
      prealloc extent beyond i_size from 4k-12k. The prealloc extent item is
      the last item in its B-tree leaf.
    - The file is fsync'd, which copies its inode item and both extent items
      to the log tree.
    - An xattr is set on the file, which sets the
      BTRFS_INODE_COPY_EVERYTHING flag.
    - The range 4k-8k in the file is written using direct I/O. i_size is
      extended to 8k, but the ordered extent is still in flight.
    - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this
      calls copy_inode_items_to_log(), which calls
      btrfs_log_prealloc_extents().
    - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the
      filesystem tree. Since it starts before i_size, it skips it. Since it
      is the last item in its B-tree leaf, it calls btrfs_next_leaf().
    - btrfs_next_leaf() unlocks the path.
    - The ordered extent completion runs, which converts the 4k-8k part of
      the prealloc extent to written and inserts the remaining prealloc part
      from 8k-12k.
    - btrfs_next_leaf() does a search and finds the new prealloc extent
      8k-12k.
    - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into
      the log tree. Note that it overlaps with the 4k-12k prealloc extent
      that was copied to the log tree by the first fsync.
    - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k
      extent that was written.
    - This tries to drop the range 4k-8k in the log tree, which requires
      adjusting the start of the 4k-12k prealloc extent in the log tree to
      8k.
    - btrfs_set_item_key_safe() sees that there is already an extent
      starting at 8k in the log tree and calls BUG().
    
    Fix this by detecting when we're about to insert an overlapping file
    extent item in the log tree and truncating the part that would overlap.
    
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Omar Sandoval <osandov@fb.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: fix leak of qgroup extent records after transaction abort [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Jun 3 12:49:08 2024 +0100

    btrfs: fix leak of qgroup extent records after transaction abort
    
    commit fb33eb2ef0d88e75564983ef057b44c5b7e4fded upstream.
    
    Qgroup extent records are created when delayed ref heads are created and
    then released after accounting extents at btrfs_qgroup_account_extents(),
    called during the transaction commit path.
    
    If a transaction is aborted we free the qgroup records by calling
    btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
    unless we don't have delayed references. We are incorrectly assuming
    that no delayed references means we don't have qgroup extents records.
    
    We can currently have no delayed references because we ran them all
    during a transaction commit and the transaction was aborted after that
    due to some error in the commit path.
    
    So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
    btrfs_destroy_delayed_refs() even if we don't have any delayed references.
    
    Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
    Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: protect folio::private when attaching extent buffer folios [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Thu Jun 6 11:01:51 2024 +0930

    btrfs: protect folio::private when attaching extent buffer folios
    
    commit f3a5367c679d31473d3fbb391675055b4792c309 upstream.
    
    [BUG]
    Since v6.8 there are rare kernel crashes reported by various people,
    the common factor is bad page status error messages like this:
    
      BUG: Bad page state in process kswapd0  pfn:d6e840
      page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
      pfn:0xd6e840
      aops:btree_aops ino:1
      flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
      page_type: 0xffffffff()
      raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
      raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: non-NULL mapping
    
    [CAUSE]
    Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
    allocate-then-attach method") changes the sequence when allocating a new
    extent buffer.
    
    Previously we always called grab_extent_buffer() under
    mapping->i_private_lock, to ensure the safety on modification on
    folio::private (which is a pointer to extent buffer for regular
    sectorsize).
    
    This can lead to the following race:
    
    Thread A is trying to allocate an extent buffer at bytenr X, with 4
    4K pages, meanwhile thread B is trying to release the page at X + 4K
    (the second page of the extent buffer at X).
    
               Thread A                |                 Thread B
    -----------------------------------+-------------------------------------
                                       | btree_release_folio()
                                       | | This is for the page at X + 4K,
                                       | | Not page X.
                                       | |
    alloc_extent_buffer()              | |- release_extent_buffer()
    |- filemap_add_folio() for the     | |  |- atomic_dec_and_test(eb->refs)
    |  page at bytenr X (the first     | |  |
    |  page).                          | |  |
    |  Which returned -EEXIST.         | |  |
    |                                  | |  |
    |- filemap_lock_folio()            | |  |
    |  Returned the first page locked. | |  |
    |                                  | |  |
    |- grab_extent_buffer()            | |  |
    |  |- atomic_inc_not_zero()        | |  |
    |  |  Returned false               | |  |
    |  |- folio_detach_private()       | |  |- folio_detach_private() for X
    |     |- folio_test_private()      | |     |- folio_test_private()
          |  Returned true             | |     |  Returned true
          |- folio_put()               |       |- folio_put()
    
    Now there are two puts on the same folio at folio X, leading to refcount
    underflow of the folio X, and eventually causing the BUG_ON() on the
    page->mapping.
    
    The condition is not that easy to hit:
    
    - The release must be triggered for the middle page of an eb
      If the release is on the same first page of an eb, page lock would kick
      in and prevent the race.
    
    - folio_detach_private() has a very small race window
      It's only between folio_test_private() and folio_clear_private().
    
    That's exactly when mapping->i_private_lock is used to prevent such race,
    and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
    allocate-then-attach method") screwed that up.
    
    At that time, I thought the page lock would kick in as
    filemap_release_folio() also requires the page to be locked, but forgot
    the filemap_release_folio() only locks one page, not all pages of an
    extent buffer.
    
    [FIX]
    Move all the code requiring i_private_lock into
    attach_eb_folio_to_filemap(), so that everything is done with proper
    lock protection.
    
    Furthermore to prevent future problems, add an extra
    lockdep_assert_locked() to ensure we're holding the proper lock.
    
    To reproducer that is able to hit the race (takes a few minutes with
    instrumented code inserting delays to alloc_extent_buffer()):
    
      #!/bin/sh
      drop_caches () {
              while(true); do
                      echo 3 > /proc/sys/vm/drop_caches
                      echo 1 > /proc/sys/vm/compact_memory
              done
      }
    
      run_tar () {
              while(true); do
                      for x in `seq 1 80` ; do
                              tar cf /dev/zero /mnt > /dev/null &
                      done
                      wait
              done
      }
    
      mkfs.btrfs -f -d single -m single /dev/vda
      mount -o noatime /dev/vda /mnt
      # create 200,000 files, 1K each
      ./simoop -n 200000 -E -f 1k /mnt
      drop_caches &
      (run_tar)
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/
    Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
    Link: https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqORo-s7diA@mail.gmail.com/
    Reported-by: Toralf Förster <toralf.foerster@gmx.de>
    Link: https://lore.kernel.org/linux-btrfs/e8b3311c-9a75-4903-907f-fc0f7a3fe423@gmx.de/
    Reported-by: syzbot+f80b066392366b4af85e@syzkaller.appspotmail.com
    Fixes: 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to allocate-then-attach method")
    CC: stable@vger.kernel.org # 6.8+
    CC: Chris Mason <clm@fb.com>
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: qgroup: fix initialization of auto inherit array [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Sat May 4 14:38:41 2024 +0300

    btrfs: qgroup: fix initialization of auto inherit array
    
    commit 0e39c9e524479b85c1b83134df0cfc6e3cb5353a upstream.
    
    The "i++" was accidentally left out so it just sets qgids[0] over and
    over.
    
    This can lead to unexpected problems, as the groups[1:] would be all 0,
    leading to later find_qgroup_rb() unable to find a qgroup and cause
    snapshot creation failure.
    
    Fixes: 5343cd9364ea ("btrfs: qgroup: simple quota auto hierarchy for nested subvolumes")
    CC: stable@vger.kernel.org # 6.7+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: qgroup: fix qgroup id collision across mounts [+ + +]

Author: Boris Burkov <boris@bur.io>
Date:   Thu May 9 15:34:40 2024 -0700

    btrfs: qgroup: fix qgroup id collision across mounts
    
    commit 2b8aa78cf1279ec5e418baa26bfed5df682568d8 upstream.
    
    If we delete subvolumes whose ID is the largest in the filesystem, then
    unmount and mount again, then btrfs_init_root_free_objectid on the
    tree_root will select a subvolid smaller than that one and thus allow
    reusing it.
    
    If we are also using qgroups (and particularly squotas) it is possible
    to delete the subvol without deleting the qgroup. In that case, we will
    be able to create a new subvol whose id already has a level 0 qgroup.
    This will result in re-using that qgroup which would then lead to
    incorrect accounting.
    
    Fixes: 6ed05643ddb1 ("btrfs: create qgroup earlier in snapshot creation")
    CC: stable@vger.kernel.org # 6.7+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: qgroup: update rescan message levels and error codes [+ + +]

Author: David Sterba <dsterba@suse.com>
Date:   Thu May 2 22:45:58 2024 +0200

    btrfs: qgroup: update rescan message levels and error codes
    
    commit 1fa7603d569b9e738e9581937ba8725cd7d39b48 upstream.
    
    On filesystems without enabled quotas there's still a warning message in
    the logs when rescan is called. In that case it's not a problem that
    should be reported, rescan can be called unconditionally.  Change the
    error code to ENOTCONN which is used for 'quotas not enabled' elsewhere.
    
    Remove message (also a warning) when rescan is called during an ongoing
    rescan, this brings no useful information and the error code is
    sufficient.
    
    Change message levels to debug for now, they can be removed eventually.
    
    CC: stable@vger.kernel.org # 6.6+
    Reviewed-by: Boris Burkov <boris@bur.io>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: re-introduce 'norecovery' mount option [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Tue May 21 19:27:31 2024 +0930

    btrfs: re-introduce 'norecovery' mount option
    
    commit 440861b1a03c72cc7be4a307e178dcaa6894479b upstream.
    
    Although 'norecovery' mount option was marked as deprecated for a long
    time and a warning message was printed during the deprecation window,
    it's still actively utilized by several projects that need a safer way
    to mount a btrfs without any writes.
    
    Furthermore this 'norecovery' mount option is supported by other major
    filesystems, which makes it less clear what's our motivation to remove
    it.
    
    Re-introduce the 'norecovery' mount option, and output a message to recommend
    'rescue=nologreplay' option.
    
    Link: https://lore.kernel.org/linux-btrfs/ZkxZT0J-z0GYvfy8@gardel-login/#t
    Link: https://github.com/systemd/systemd/pull/32892
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1222429
    Reported-by: Lennart Poettering <lennart@poettering.net>
    Reported-by: Jiri Slaby <jslaby@suse.com>
    Fixes: a1912f712188 ("btrfs: remove code for inode_cache and recovery mount options")
    CC: stable@vger.kernel.org # 6.8+
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cifs: fix creating sockets when using sfu mount options [+ + +]

Author: Steve French <stfrench@microsoft.com>
Date:   Wed May 29 18:16:56 2024 -0500

    cifs: fix creating sockets when using sfu mount options
    
    commit 518549c120e671c4906f77d1802b97e9b23f673a upstream.
    
    When running fstest generic/423 with sfu mount option, it
    was being skipped due to inability to create sockets:
    
      generic/423  [not run] cifs does not support mknod/mkfifo
    
    which can also be easily reproduced with their af_unix tool:
    
      ./src/af_unix /mnt1/socket-two bind: Operation not permitted
    
    Fix sfu mount option to allow creating and reporting sockets.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: bcm: dvp: Assign ->num before accessing ->hws [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Thu Apr 25 09:55:51 2024 -0700

    clk: bcm: dvp: Assign ->num before accessing ->hws
    
    commit 9368cdf90f52a68120d039887ccff74ff33b4444 upstream.
    
    Commit f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with
    __counted_by") annotated the hws member of 'struct clk_hw_onecell_data'
    with __counted_by, which informs the bounds sanitizer about the number
    of elements in hws, so that it can warn when hws is accessed out of
    bounds. As noted in that change, the __counted_by member must be
    initialized with the number of elements before the first array access
    happens, otherwise there will be a warning from each access prior to the
    initialization because the number of elements is zero. This occurs in
    clk_dvp_probe() due to ->num being assigned after ->hws has been
    accessed:
    
      UBSAN: array-index-out-of-bounds in drivers/clk/bcm/clk-bcm2711-dvp.c:59:2
      index 0 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]')
    
    Move the ->num initialization to before the first access of ->hws, which
    clears up the warning.
    
    Cc: stable@vger.kernel.org
    Fixes: f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by")
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/r/20240425-cbl-bcm-assign-counted-by-val-before-access-v1-1-e2db3b82d5ef@kernel.org
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Stephen Boyd <sboyd@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: bcm: rpi: Assign ->num before accessing ->hws [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Thu Apr 25 09:55:52 2024 -0700

    clk: bcm: rpi: Assign ->num before accessing ->hws
    
    commit 6dc445c1905096b2ed4db1a84570375b4e00cc0f upstream.
    
    Commit f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with
    __counted_by") annotated the hws member of 'struct clk_hw_onecell_data'
    with __counted_by, which informs the bounds sanitizer about the number
    of elements in hws, so that it can warn when hws is accessed out of
    bounds. As noted in that change, the __counted_by member must be
    initialized with the number of elements before the first array access
    happens, otherwise there will be a warning from each access prior to the
    initialization because the number of elements is zero. This occurs in
    raspberrypi_discover_clocks() due to ->num being assigned after ->hws
    has been accessed:
    
      UBSAN: array-index-out-of-bounds in drivers/clk/bcm/clk-raspberrypi.c:374:4
      index 3 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]')
    
    Move the ->num initialization to before the first access of ->hws, which
    clears up the warning.
    
    Cc: stable@vger.kernel.org
    Fixes: f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by")
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/r/20240425-cbl-bcm-assign-counted-by-val-before-access-v1-2-e2db3b82d5ef@kernel.org
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Stephen Boyd <sboyd@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: qcom: apss-ipq-pll: use stromer ops for IPQ5018 to fix boot failure [+ + +]

Author: Gabor Juhos <j4g8y7@gmail.com>
Date:   Fri Mar 15 17:16:41 2024 +0100

    clk: qcom: apss-ipq-pll: use stromer ops for IPQ5018 to fix boot failure
    
    commit 5fce38e2a1a97900989d9fedebcf5a4dacdaee30 upstream.
    
    Booting v6.8 results in a hang on various IPQ5018 based boards.
    Investigating the problem showed that the hang happens when the
    clk_alpha_pll_stromer_plus_set_rate() function tries to write
    into the PLL_MODE register of the APSS PLL.
    
    Checking the downstream code revealed that it uses [1] stromer
    specific operations for IPQ5018, whereas in the current code
    the stromer plus specific operations are used.
    
    The ops in the 'ipq_pll_stromer_plus' clock definition can't be
    changed since that is needed for IPQ5332, so add a new alpha pll
    clock declaration which uses the correct stromer ops and use this
    new clock for IPQ5018 to avoid the boot failure.
    
    Also, change pll_type in 'ipq5018_pll_data' to
    CLK_ALPHA_PLL_TYPE_STROMER to better reflect that it is a Stromer
    PLL and change the apss_ipq_pll_probe() function accordingly.
    
    1. https://git.codelinaro.org/clo/qsdk/oss/kernel/linux-ipq-5.4/-/blob/NHSS.QSDK.12.4/drivers/clk/qcom/apss-ipq5018.c#L67
    
    Cc: stable@vger.kernel.org
    Fixes: 50492f929486 ("clk: qcom: apss-ipq-pll: add support for IPQ5018")
    Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
    Tested-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com>
    Reviewed-by: Kathiravan Thirumoorthy <quic_kathirav@quicinc.com>
    Link: https://lore.kernel.org/r/20240315-apss-ipq-pll-ipq5018-hang-v2-1-6fe30ada2009@gmail.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: qcom: clk-alpha-pll: fix rate setting for Stromer PLLs [+ + +]

Author: Gabor Juhos <j4g8y7@gmail.com>
Date:   Thu Mar 28 08:54:31 2024 +0100

    clk: qcom: clk-alpha-pll: fix rate setting for Stromer PLLs
    
    commit 3c5b3e17b8fd1f1add5a9477306c355fab126977 upstream.
    
    The clk_alpha_pll_stromer_set_rate() function writes inproper
    values into the ALPHA_VAL{,_U} registers which results in wrong
    clock rates when the alpha value is used.
    
    The broken behaviour can be seen on IPQ5018 for example, when
    dynamic scaling sets the CPU frequency to 800000 KHz. In this
    case the CPU cores are running only at 792031 KHz:
    
      # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
      800000
      # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
      792031
    
    This happens because the function ignores the fact that the alpha
    value calculated by the alpha_pll_round_rate() function is only
    32 bits wide which must be extended to 40 bits if it is used on
    a hardware which supports 40 bits wide values.
    
    Extend the clk_alpha_pll_stromer_set_rate() function to convert
    the alpha value to 40 bits before wrinting that into the registers
    in order to ensure that the hardware really uses the requested rate.
    
    After the change the CPU frequency is correct:
    
      # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
      800000
      # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
      800000
    
    Cc: stable@vger.kernel.org
    Fixes: e47a4f55f240 ("clk: qcom: clk-alpha-pll: Add support for Stromer PLLs")
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
    Link: https://lore.kernel.org/r/20240328-alpha-pll-fix-stromer-set-rate-v3-1-1b79714c78bc@gmail.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: amd-pstate: Fix the inconsistency in max frequency units [+ + +]

Author: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Date:   Mon May 27 10:41:28 2024 +0530

    cpufreq: amd-pstate: Fix the inconsistency in max frequency units
    
    commit e4731baaf29438508197d3a8a6d4f5a8c51663f8 upstream.
    
    The nominal frequency in cpudata is maintained in MHz whereas all other
    frequencies are in KHz. This means we have to convert nominal frequency
    value to KHz before we do any interaction with other frequency values.
    
    In amd_pstate_set_boost(), this conversion from MHz to KHz is missed,
    fix that.
    
    Tested on a AMD Zen4 EPYC server
    
    Before:
    $ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq | uniq
    2151
    $ cat /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_min_freq | uniq
    400000
    $ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | uniq
    2151
    409422
    
    After:
    $ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq | uniq
    2151000
    $ cat /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_min_freq | uniq
    400000
    $ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | uniq
    2151000
    1799527
    
    Fixes: ec437d71db77 ("cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors")
    Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
    Acked-by: Mario Limonciello <mario.limonciello@amd.com>
    Acked-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
    Tested-by: Peter Jung <ptr1337@cachyos.org>
    Cc: 5.17+ <stable@vger.kernel.org> # 5.17+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ecdsa - Fix module auto-load on add-key [+ + +]

Author: Stefan Berger <stefanb@linux.ibm.com>
Date:   Thu Mar 21 10:44:33 2024 -0400

    crypto: ecdsa - Fix module auto-load on add-key
    
    commit 48e4fd6d54f54d0ceab5a952d73e47a9454a6ccb upstream.
    
    Add module alias with the algorithm cra_name similar to what we have for
    RSA-related and other algorithms.
    
    The kernel attempts to modprobe asymmetric algorithms using the names
    "crypto-$cra_name" and "crypto-$cra_name-all." However, since these
    aliases are currently missing, the modules are not loaded. For instance,
    when using the `add_key` function, the hash algorithm is typically
    loaded automatically, but the asymmetric algorithm is not.
    
    Steps to test:
    
    1. Create certificate
    
      openssl req -x509 -sha256 -newkey ec \
      -pkeyopt "ec_paramgen_curve:secp384r1" -keyout key.pem -days 365 \
      -subj '/CN=test' -nodes -outform der -out nist-p384.der
    
    2. Optionally, trace module requests with: trace-cmd stream -e module &
    
    3. Trigger add_key call for the cert:
    
       # keyctl padd asymmetric "" @u < nist-p384.der
       641069229
       # lsmod | head -2
       Module                  Size  Used by
       ecdsa_generic          16384  0
    
    Fixes: c12d448ba939 ("crypto: ecdsa - Register NIST P384 and extend test suite")
    Cc: stable@vger.kernel.org
    Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
    Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ecrdsa - Fix module auto-load on add_key [+ + +]

Author: Vitaly Chikunov <vt@altlinux.org>
Date:   Mon Mar 18 03:42:40 2024 +0300

    crypto: ecrdsa - Fix module auto-load on add_key
    
    commit eb5739a1efbc9ff216271aeea0ebe1c92e5383e5 upstream.
    
    Add module alias with the algorithm cra_name similar to what we have for
    RSA-related and other algorithms.
    
    The kernel attempts to modprobe asymmetric algorithms using the names
    "crypto-$cra_name" and "crypto-$cra_name-all." However, since these
    aliases are currently missing, the modules are not loaded. For instance,
    when using the `add_key` function, the hash algorithm is typically
    loaded automatically, but the asymmetric algorithm is not.
    
    Steps to test:
    
    1. Cert is generated usings ima-evm-utils test suite with
       `gen-keys.sh`, example cert is provided below:
    
      $ base64 -d >test-gost2012_512-A.cer <<EOF
      MIIB/DCCAWagAwIBAgIUK8+whWevr3FFkSdU9GLDAM7ure8wDAYIKoUDBwEBAwMFADARMQ8wDQYD
      VQQDDAZDQSBLZXkwIBcNMjIwMjAxMjIwOTQxWhgPMjA4MjEyMDUyMjA5NDFaMBExDzANBgNVBAMM
      BkNBIEtleTCBoDAXBggqhQMHAQEBAjALBgkqhQMHAQIBAgEDgYQABIGALXNrTJGgeErBUOov3Cfo
      IrHF9fcj8UjzwGeKCkbCcINzVUbdPmCopeJRHDJEvQBX1CQUPtlwDv6ANjTTRoq5nCk9L5PPFP1H
      z73JIXHT0eRBDVoWy0cWDRz1mmQlCnN2HThMtEloaQI81nTlKZOcEYDtDpi5WODmjEeRNQJMdqCj
      UDBOMAwGA1UdEwQFMAMBAf8wHQYDVR0OBBYEFCwfOITMbE9VisW1i2TYeu1tAo5QMB8GA1UdIwQY
      MBaAFCwfOITMbE9VisW1i2TYeu1tAo5QMAwGCCqFAwcBAQMDBQADgYEAmBfJCMTdC0/NSjz4BBiQ
      qDIEjomO7FEHYlkX5NGulcF8FaJW2jeyyXXtbpnub1IQ8af1KFIpwoS2e93LaaofxpWlpQLlju6m
      KYLOcO4xK3Whwa2hBAz9YbpUSFjvxnkS2/jpH2MsOSXuUEeCruG/RkHHB3ACef9umG6HCNQuAPY=
      EOF
    
    2. Optionally, trace module requests with: trace-cmd stream -e module &
    
    3. Trigger add_key call for the cert:
    
      # keyctl padd asymmetric "" @u <test-gost2012_512-A.cer
      939910969
      # lsmod | head -3
      Module                  Size  Used by
      ecrdsa_generic         16384  0
      streebog_generic       28672  0
    
    Repored-by: Paul Wolneykien <manowar@altlinux.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
    Tested-by: Stefan Berger <stefanb@linux.ibm.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak [+ + +]

Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Wed May 8 16:39:51 2024 +0800

    crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
    
    commit d3b17c6d9dddc2db3670bc9be628b122416a3d26 upstream.
    
    Using completion_done to determine whether the caller has gone
    away only works after a complete call.  Furthermore it's still
    possible that the caller has not yet called wait_for_completion,
    resulting in another potential UAF.
    
    Fix this by making the caller use cancel_work_sync and then freeing
    the memory safely.
    
    Fixes: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery")
    Cc: <stable@vger.kernel.org> #6.8+
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: starfive - Do not free stack buffer [+ + +]

Author: Jia Jie Ho <jiajie.ho@starfivetech.com>
Date:   Mon Apr 29 14:06:39 2024 +0800

    crypto: starfive - Do not free stack buffer
    
    commit d7f01649f4eaf1878472d3d3f480ae1e50d98f6c upstream.
    
    RSA text data uses variable length buffer allocated in software stack.
    Calling kfree on it causes undefined behaviour in subsequent operations.
    
    Cc: <stable@vger.kernel.org> #6.7+
    Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Sun May 26 07:59:08 2024 -0500

    drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms
    
    commit 267cace556e8a53d703119f7435ab556209e5b6a upstream.
    
    commit cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for
    some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues
    that were reported since commit 31729e8c21ec ("drm/amd/pm: fixes a
    random hang in S4 for SMU v13.0.4/11") but caused issues for some
    people.
    
    Adjust the workaround flow to properly only apply in the S4 case:
    -> For shutdown go through SMU_MSG_PrepareMp1ForUnload
    -> For S4 go through SMU_MSG_GfxDeviceDriverReset and
       SMU_MSG_PrepareMp1ForUnload
    
    Reported-and-tested-by: lectrode <electrodexsnet@gmail.com>
    Closes: https://github.com/void-linux/void-packages/issues/50417
    Cc: stable@vger.kernel.org
    Fixes: cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users")
    Reviewed-by: Tim Huang <Tim.Huang@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/atomfirmware: add intergrated info v2.3 table [+ + +]

Author: Li Ma <li.ma@amd.com>
Date:   Mon May 20 18:43:55 2024 +0800

    drm/amdgpu/atomfirmware: add intergrated info v2.3 table
    
    commit e64e8f7c178e5228e0b2dbb504b9dc75953a319f upstream.
    
    [Why]
    The vram width value is 0.
    Because the integratedsysteminfo table in VBIOS has updated to 2.3.
    
    [How]
    Driver needs a new intergrated info v2.3 table too.
    Then the vram width value will be correct.
    
    Signed-off-by: Li Ma <li.ma@amd.com>
    Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: add error handle to avoid out-of-bounds [+ + +]

Author: Bob Zhou <bob.zhou@amd.com>
Date:   Tue Apr 23 16:58:11 2024 +0800

    drm/amdgpu: add error handle to avoid out-of-bounds
    
    commit 8b2faf1a4f3b6c748c0da36cda865a226534d520 upstream.
    
    if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
    be stop to avoid out-of-bounds read, so directly return -EINVAL.
    
    Signed-off-by: Bob Zhou <bob.zhou@amd.com>
    Acked-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Le Ma <le.ma@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms [+ + +]

Author: Lang Yu <Lang.Yu@amd.com>
Date:   Thu Apr 11 17:14:17 2024 +0800

    drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
    
    commit 2a705f3e49d20b59cd9e5cc3061b2d92ebe1e5f0 upstream.
    
    Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
    Two attachments use the same VM, root PD would be locked twice.
    
    [   57.910418] Call Trace:
    [   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
    [   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
    [   57.793923]  ? idr_get_next_ul+0xbe/0x100
    [   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
    [   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
    [   57.794141]  ? process_scheduled_works+0x29c/0x580
    [   57.794147]  process_scheduled_works+0x303/0x580
    [   57.794157]  ? __pfx_worker_thread+0x10/0x10
    [   57.794160]  worker_thread+0x1a2/0x370
    [   57.794165]  ? __pfx_worker_thread+0x10/0x10
    [   57.794167]  kthread+0x11b/0x150
    [   57.794172]  ? __pfx_kthread+0x10/0x10
    [   57.794177]  ret_from_fork+0x3d/0x60
    [   57.794181]  ? __pfx_kthread+0x10/0x10
    [   57.794184]  ret_from_fork_asm+0x1b/0x30
    
    Signed-off-by: Lang Yu <Lang.Yu@amd.com>
    Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/fbdev-generic: Do not set physical framebuffer address [+ + +]

Author: Thomas Zimmermann <tzimmermann@suse.de>
Date:   Fri Apr 19 10:28:54 2024 +0200

    drm/fbdev-generic: Do not set physical framebuffer address
    
    commit 87cb4a612a89690b123e68f6602d9f6581b03597 upstream.
    
    Framebuffer memory is allocated via vzalloc() from non-contiguous
    physical pages. The physical framebuffer start address is therefore
    meaningless. Do not set it.
    
    The value is not used within the kernel and only exported to userspace
    on dedicated ARM configs. No functional change is expected.
    
    v2:
    - refer to vzalloc() in commit message (Javier)
    
    Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
    Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering")
    Cc: Thomas Zimmermann <tzimmermann@suse.de>
    Cc: Javier Martinez Canillas <javierm@redhat.com>
    Cc: Zack Rusin <zackr@vmware.com>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Maxime Ripard <mripard@kernel.org>
    Cc: <stable@vger.kernel.org> # v6.4+
    Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
    Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
    Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev>
    Tested-by: Sui Jingfeng <sui.jingfeng@linux.dev>
    Acked-by: Maxime Ripard <mripard@kernel.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-2-tzimmermann@suse.de
    (cherry picked from commit 73ef0aecba78aa9ebd309b10b6cd17d94e632892)
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/hwmon: Get rid of devm [+ + +]

Author: Ashutosh Dixit <ashutosh.dixit@intel.com>
Date:   Wed Apr 17 07:56:46 2024 -0700

    drm/i915/hwmon: Get rid of devm
    
    commit 5bc9de065b8bb9b8dd8799ecb4592d0403b54281 upstream.
    
    When both hwmon and hwmon drvdata (on which hwmon depends) are device
    managed resources, the expectation, on device unbind, is that hwmon will be
    released before drvdata. However, in i915 there are two separate code
    paths, which both release either drvdata or hwmon and either can be
    released before the other. These code paths (for device unbind) are as
    follows (see also the bug referenced below):
    
    Call Trace:
    release_nodes+0x11/0x70
    devres_release_group+0xb2/0x110
    component_unbind_all+0x8d/0xa0
    component_del+0xa5/0x140
    intel_pxp_tee_component_fini+0x29/0x40 [i915]
    intel_pxp_fini+0x33/0x80 [i915]
    i915_driver_remove+0x4c/0x120 [i915]
    i915_pci_remove+0x19/0x30 [i915]
    pci_device_remove+0x32/0xa0
    device_release_driver_internal+0x19c/0x200
    unbind_store+0x9c/0xb0
    
    and
    
    Call Trace:
    release_nodes+0x11/0x70
    devres_release_all+0x8a/0xc0
    device_unbind_cleanup+0x9/0x70
    device_release_driver_internal+0x1c1/0x200
    unbind_store+0x9c/0xb0
    
    This means that in i915, if use devm, we cannot gurantee that hwmon will
    always be released before drvdata. Which means that we have a uaf if hwmon
    sysfs is accessed when drvdata has been released but hwmon hasn't.
    
    The only way out of this seems to be do get rid of devm_ and release/free
    everything explicitly during device unbind.
    
    v2: Change commit message and other minor code changes
    v3: Cleanup from i915_hwmon_register on error (Armin Wolf)
    v4: Eliminate potential static analyzer warning (Rodrigo)
        Eliminate fetch_and_zero (Jani)
    v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi)
    
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
    Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/xe/bb: assert width in xe_bb_create_job() [+ + +]

Author: Matthew Auld <matthew.auld@intel.com>
Date:   Wed Mar 20 11:27:31 2024 +0000

    drm/xe/bb: assert width in xe_bb_create_job()
    
    commit 1008368e1c7e36bdec01b3cce1e76606dc3ad46f upstream.
    
    The queue width will determine the number of batch buffer emitted into
    the ring. In the case of xe_bb_create_job() we pass exactly one batch
    address, therefore add an assert for the width to make sure we don't go
    out of bounds. While here also convert to the helper to determine if the
    queue is migration based.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Nirmoy Das <nirmoy.das@intel.com>
    Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240320112730.219854-3-matthew.auld@intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

EDAC/amd64: Convert PCIBIOS_* return codes to errnos [+ + +]

Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Mon May 27 16:22:34 2024 +0300

    EDAC/amd64: Convert PCIBIOS_* return codes to errnos
    
    commit 3ec8ebd8a5b782d56347ae884de880af26f93996 upstream.
    
    gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_*
    codes. The return code is then returned all the way into the module
    init function amd64_edac_init() that returns it as is. The module init
    functions, however, should return normal errnos.
    
    Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
    errno before returning it from gpu_get_node_map().
    
    For consistency, convert also the other similar cases which return
    PCIBIOS_* codes even if they do not have any bugs at the moment.
    
    Fixes: 4251566ebc1c ("EDAC/amd64: Cache and use GPU node map")
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

EDAC/igen6: Convert PCIBIOS_* return codes to errnos [+ + +]

Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Mon May 27 16:22:35 2024 +0300

    EDAC/igen6: Convert PCIBIOS_* return codes to errnos
    
    commit f8367a74aebf88dc8b58a0db6a6c90b4cb8fc9d3 upstream.
    
    errcmd_enable_error_reporting() uses pci_{read,write}_config_word()
    that return PCIBIOS_* codes. The return code is then returned all the
    way into the probe function igen6_probe() that returns it as is. The
    probe functions, however, should return normal errnos.
    
    Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
    errno before returning it from errcmd_enable_error_reporting().
    
    Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

erofs: avoid allocating DEFLATE streams before mounting [+ + +]

Author: Gao Xiang <xiang@kernel.org>
Date:   Mon May 20 17:01:06 2024 +0800

    erofs: avoid allocating DEFLATE streams before mounting
    
    commit 80eb4f62056d6ae709bdd0636ab96ce660f494b2 upstream.
    
    Currently, each DEFLATE stream takes one 32 KiB permanent internal
    window buffer even if there is no running instance which uses DEFLATE
    algorithm.
    
    It's unexpected and wasteful on embedded devices with limited resources
    and servers with hundreds of CPU cores if DEFLATE is enabled but unused.
    
    Fixes: ffa09b3bd024 ("erofs: DEFLATE compression support")
    Cc: <stable@vger.kernel.org> # 6.6+
    Reviewed-by: Sandeep Dhavale <dhavale@google.com>
    Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
    Link: https://lore.kernel.org/r/20240520090106.2898681-1-hsiangkao@linux.alibaba.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Fix a possible null pointer dereference in eventfs_find_events() [+ + +]

Author: Hao Ge <gehao@kylinos.cn>
Date:   Mon May 13 13:33:38 2024 +0800

    eventfs: Fix a possible null pointer dereference in eventfs_find_events()
    
    commit d4e9a968738bf66d3bb852dd5588d4c7afd6d7f4 upstream.
    
    In function eventfs_find_events,there is a potential null pointer
    that may be caused by calling update_events_attr which will perform
    some operations on the members of the ei struct when ei is NULL.
    
    Hence,When ei->is_freed is set,return NULL directly.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240513053338.63017-1-hao.ge@linux.dev
    
    Cc: stable@vger.kernel.org
    Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
    Signed-off-by: Hao Ge <gehao@kylinos.cn>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

eventfs: Keep the directories from having the same inode number as files [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Thu May 23 01:14:26 2024 -0400

    eventfs: Keep the directories from having the same inode number as files
    
    commit 8898e7f288c47d450a3cf1511c791a03550c0789 upstream.
    
    The directories require unique inode numbers but all the eventfs files
    have the same inode number. Prevent the directories from having the same
    inode numbers as the files as that can confuse some tooling.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.428826685@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find() [+ + +]

Author: Baokun Li <libaokun1@huawei.com>
Date:   Sat May 4 15:55:25 2024 +0800

    ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find()
    
    commit 0c0b4a49d3e7f49690a6827a41faeffad5df7e21 upstream.
    
    Syzbot reports a warning as follows:
    
    ============================================
    WARNING: CPU: 0 PID: 5075 at fs/mbcache.c:419 mb_cache_destroy+0x224/0x290
    Modules linked in:
    CPU: 0 PID: 5075 Comm: syz-executor199 Not tainted 6.9.0-rc6-gb947cc5bf6d7
    RIP: 0010:mb_cache_destroy+0x224/0x290 fs/mbcache.c:419
    Call Trace:
     <TASK>
     ext4_put_super+0x6d4/0xcd0 fs/ext4/super.c:1375
     generic_shutdown_super+0x136/0x2d0 fs/super.c:641
     kill_block_super+0x44/0x90 fs/super.c:1675
     ext4_kill_sb+0x68/0xa0 fs/ext4/super.c:7327
    [...]
    ============================================
    
    This is because when finding an entry in ext4_xattr_block_cache_find(), if
    ext4_sb_bread() returns -ENOMEM, the ce's e_refcnt, which has already grown
    in the __entry_find(), won't be put away, and eventually trigger the above
    issue in mb_cache_destroy() due to reference count leakage.
    
    So call mb_cache_entry_put() on the -ENOMEM error branch as a quick fix.
    
    Reported-by: syzbot+dd43bd0f7474512edc47@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=dd43bd0f7474512edc47
    Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
    Cc: stable@kernel.org
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240504075526.2254349-2-libaokun@huaweicloud.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: Fixes len calculation in mpage_journal_page_buffers [+ + +]

Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date:   Thu Feb 29 11:40:13 2024 +0530

    ext4: Fixes len calculation in mpage_journal_page_buffers
    
    commit c2a09f3d782de952f09a3962d03b939e7fa7ffa4 upstream.
    
    Truncate operation can race with writeback, in which inode->i_size can get
    truncated and therefore size - folio_pos() can be negative. This fixes the
    len calculation. However this path doesn't get easily triggered even
    with data journaling.
    
    Cc: stable@kernel.org # v6.5
    Fixes: 80be8c5cc925 ("Fixes: ext4: Make mpage_journal_page_buffers use folio")
    Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/cff4953b5c9306aba71e944ab176a5d396b9a1b7.1709182250.git.ritesh.list@gmail.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow [+ + +]

Author: Baokun Li <libaokun1@huawei.com>
Date:   Tue Mar 19 19:33:23 2024 +0800

    ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
    
    commit 9a9f3a9842927e4af7ca10c19c94dad83bebd713 upstream.
    
    Now ac_groups_linear_remaining is of type __u16 and s_mb_max_linear_groups
    is of type unsigned int, so an overflow occurs when setting a value above
    65535 through the mb_max_linear_groups sysfs interface. Therefore, the
    type of ac_groups_linear_remaining is set to __u32 to avoid overflow.
    
    Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
    CC: stable@kernel.org
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240319113325.3110393-8-libaokun1@huawei.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode() [+ + +]

Author: Chao Yu <chao@kernel.org>
Date:   Thu Apr 25 16:58:38 2024 +0800

    f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
    
    commit 20faaf30e55522bba2b56d9c46689233205d7717 upstream.
    
    syzbot reports a kernel bug as below:
    
    F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4
    ==================================================================
    BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline]
    BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline]
    BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600
    Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076
    
    CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
     print_address_description mm/kasan/report.c:377 [inline]
     print_report+0x169/0x550 mm/kasan/report.c:488
     kasan_report+0x143/0x180 mm/kasan/report.c:601
     f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline]
     current_nat_addr fs/f2fs/node.h:213 [inline]
     f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600
     f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline]
     f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925
     ioctl_fiemap fs/ioctl.c:220 [inline]
     do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838
     __do_sys_ioctl fs/ioctl.c:902 [inline]
     __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890
     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
     do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    The root cause is we missed to do sanity check on i_xattr_nid during
    f2fs_iget(), so that in fiemap() path, current_nat_addr() will access
    nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering
    kasan bug report, fix it.
    
    Reported-and-tested-by: syzbot+3694e283cf5c40df6d14@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/linux-f2fs-devel/00000000000094036c0616e72a1d@google.com
    Signed-off-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: savage: Handle err return when savagefb_check_var failed [+ + +]

Author: Cai Xinchen <caixinchen1@huawei.com>
Date:   Tue Apr 16 06:51:37 2024 +0000

    fbdev: savage: Handle err return when savagefb_check_var failed
    
    commit 6ad959b6703e2c4c5d7af03b4cfd5ff608036339 upstream.
    
    The commit 04e5eac8f3ab("fbdev: savage: Error out if pixclock equals zero")
    checks the value of pixclock to avoid divide-by-zero error. However
    the function savagefb_probe doesn't handle the error return of
    savagefb_check_var. When pixclock is 0, it will cause divide-by-zero error.
    
    Fixes: 04e5eac8f3ab ("fbdev: savage: Error out if pixclock equals zero")
    Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

filemap: add helper mapping_max_folio_size() [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Tue May 21 19:49:38 2024 +0800

    filemap: add helper mapping_max_folio_size()
    
    commit 79c137454815ba5554caa8eeb4ad5c94e96e45ce upstream.
    
    Add mapping_max_folio_size() to get the maximum folio size for this
    pagecache mapping.
    
    Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
    Cc: stable@vger.kernel.org
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20240521114939.2541461-1-xu.yang_2@nxp.com
    Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

firmware: qcom_scm: disable clocks if qcom_scm_bw_enable() fails [+ + +]

Author: Gabor Juhos <j4g8y7@gmail.com>
Date:   Mon Mar 4 14:14:53 2024 +0100

    firmware: qcom_scm: disable clocks if qcom_scm_bw_enable() fails
    
    commit 0c50b7fcf2773b4853e83fc15aba1a196ba95966 upstream.
    
    There are several functions which are calling qcom_scm_bw_enable()
    then returns immediately if the call fails and leaves the clocks
    enabled.
    
    Change the code of these functions to disable clocks when the
    qcom_scm_bw_enable() call fails. This also fixes a possible dma
    buffer leak in the qcom_scm_pas_init_image() function.
    
    Compile tested only due to lack of hardware with interconnect
    support.
    
    Cc: stable@vger.kernel.org
    Fixes: 65b7ebda5028 ("firmware: qcom_scm: Add bw voting support to the SCM interface")
    Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
    Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
    Link: https://lore.kernel.org/r/20240304-qcom-scm-disable-clk-v1-1-b36e51577ca1@gmail.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fsverity: use register_sysctl_init() to avoid kmemleak warning [+ + +]

Author: Eric Biggers <ebiggers@google.com>
Date:   Tue Apr 30 19:53:31 2024 -0700

    fsverity: use register_sysctl_init() to avoid kmemleak warning
    
    commit ee5814dddefbaa181cb247a75676dd5103775db1 upstream.
    
    Since the fsverity sysctl registration runs as a builtin initcall, there
    is no corresponding sysctl deregistration and the resulting struct
    ctl_table_header is not used.  This can cause a kmemleak warning just
    after the system boots up.  (A pointer to the ctl_table_header is stored
    in the fsverity_sysctl_header static variable, which kmemleak should
    detect; however, the compiler can optimize out that variable.)  Avoid
    the kmemleak warning by using register_sysctl_init() which is intended
    for use by builtin initcalls and uses kmemleak_not_leak().
    
    Reported-by: Yi Zhang <yi.zhang@redhat.com>
    Closes: https://lore.kernel.org/r/CAHj4cs8DTSvR698UE040rs_pX1k-WVe7aR6N2OoXXuhXJPDC-w@mail.gmail.com
    Cc: stable@vger.kernel.org
    Reviewed-by: Joel Granados <j.granados@samsung.com>
    Link: https://lore.kernel.org/r/20240501025331.594183-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after() [+ + +]

Author: dicken.ding <dicken.ding@mediatek.com>
Date:   Fri May 24 17:17:39 2024 +0800

    genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
    
    commit b84a8aba806261d2f759ccedf4a2a6a80a5e55ba upstream.
    
    irq_find_at_or_after() dereferences the interrupt descriptor which is
    returned by mt_find() while neither holding sparse_irq_lock nor RCU read
    lock, which means the descriptor can be freed between mt_find() and the
    dereference:
    
        CPU0                            CPU1
        desc = mt_find()
                                        delayed_free_desc(desc)
        irq_desc_get_irq(desc)
    
    The use-after-free is reported by KASAN:
    
        Call trace:
         irq_get_next_irq+0x58/0x84
         show_stat+0x638/0x824
         seq_read_iter+0x158/0x4ec
         proc_reg_read_iter+0x94/0x12c
         vfs_read+0x1e0/0x2c8
    
        Freed by task 4471:
         slab_free_freelist_hook+0x174/0x1e0
         __kmem_cache_free+0xa4/0x1dc
         kfree+0x64/0x128
         irq_kobj_release+0x28/0x3c
         kobject_put+0xcc/0x1e0
         delayed_free_desc+0x14/0x2c
         rcu_do_batch+0x214/0x720
    
    Guard the access with a RCU read lock section.
    
    Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management")
    Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

HID: i2c-hid: elan: fix reset suspend current leakage [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Tue May 7 16:48:18 2024 +0200

    HID: i2c-hid: elan: fix reset suspend current leakage
    
    commit 0eafc58f2194dbd01d4be40f99a697681171995b upstream.
    
    The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
    shares the VCC33 supply with other peripherals that may remain powered
    during suspend (e.g. when enabled as wakeup sources).
    
    The reset line is also wired so that it can be left deasserted when the
    supply is off.
    
    This is important as it avoids holding the controller in reset for
    extended periods of time when it remains powered, which can lead to
    increased power consumption, and also avoids leaking current through the
    X13s reset circuitry during suspend (and after driver unbind).
    
    Use the new 'no-reset-on-power-off' devicetree property to determine
    when reset needs to be asserted on power down.
    
    Notably this also avoids wasting power on machine variants without a
    touchscreen for which the driver would otherwise exit probe with reset
    asserted.
    
    Fixes: bd3cba00dcc6 ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
    Cc: <stable@vger.kernel.org>    # 6.0
    Cc: Douglas Anderson <dianders@chromium.org>
    Tested-by: Steev Klimaszewski <steev@kali.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Link: https://lore.kernel.org/r/20240507144821.12275-5-johan+linaro@kernel.org
    Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt() [+ + +]

Author: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Date:   Thu May 23 17:47:14 2024 +0200

    hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
    
    commit a94ff8e50c20bde6d50864849a98b106e45d30c6 upstream.
    
    A new error path was added to the fwnode_for_each_available_node() loop
    in ltc2992_parse_dt(), which leads to an early return that requires a
    call to fwnode_handle_put() to avoid a memory leak in that case.
    
    Add the missing fwnode_handle_put() in the error path from a zero value
    shunt resistor.
    
    Cc: stable@vger.kernel.org
    Fixes: 10b029020487 ("hwmon: (ltc2992) Avoid division by zero")
    Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
    Link: https://lore.kernel.org/r/20240523-fwnode_for_each_available_child_node_scoped-v2-1-701f3a03f2fb@gmail.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: acpi: Unbind mux adapters before delete [+ + +]

Author: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Date:   Wed Mar 13 11:16:32 2024 +1300

    i2c: acpi: Unbind mux adapters before delete
    
    commit 3f858bbf04dbac934ac279aaee05d49eb9910051 upstream.
    
    There is an issue with ACPI overlay table removal specifically related
    to I2C multiplexers.
    
    Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
    existing I2C bus. When this table is loaded we see the creation of a
    device for the overall PCA9548 chip and 8 further devices - one
    i2c_adapter each for the mux channels. These are all bound to their
    ACPI equivalents via an eventual invocation of acpi_bind_one().
    
    When we unload the SSDT overlay we run into the problem. The ACPI
    devices are deleted as normal via acpi_device_del_work_fn() and the
    acpi_device_del_list.
    
    However, the following warning and stack trace is output as the
    deletion does not go smoothly:
    ------------[ cut here ]------------
    kernfs: can not remove 'physical_node', no directory
    WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
    Modules linked in:
    CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
    Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
    Workqueue: kacpi_hotplug acpi_device_del_work_fn
    RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
    Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
    RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
    RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
    RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
    R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
    R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
    FS:  0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
    Call Trace:
     <TASK>
     ? kernfs_remove_by_name_ns+0xb9/0xc0
     ? __warn+0x7c/0x130
     ? kernfs_remove_by_name_ns+0xb9/0xc0
     ? report_bug+0x171/0x1a0
     ? handle_bug+0x3c/0x70
     ? exc_invalid_op+0x17/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? kernfs_remove_by_name_ns+0xb9/0xc0
     ? kernfs_remove_by_name_ns+0xb9/0xc0
     acpi_unbind_one+0x108/0x180
     device_del+0x18b/0x490
     ? srso_return_thunk+0x5/0x5f
     ? srso_return_thunk+0x5/0x5f
     device_unregister+0xd/0x30
     i2c_del_adapter.part.0+0x1bf/0x250
     i2c_mux_del_adapters+0xa1/0xe0
     i2c_device_remove+0x1e/0x80
     device_release_driver_internal+0x19a/0x200
     bus_remove_device+0xbf/0x100
     device_del+0x157/0x490
     ? __pfx_device_match_fwnode+0x10/0x10
     ? srso_return_thunk+0x5/0x5f
     device_unregister+0xd/0x30
     i2c_acpi_notify+0x10f/0x140
     notifier_call_chain+0x58/0xd0
     blocking_notifier_call_chain+0x3a/0x60
     acpi_device_del_work_fn+0x85/0x1d0
     process_one_work+0x134/0x2f0
     worker_thread+0x2f0/0x410
     ? __pfx_worker_thread+0x10/0x10
     kthread+0xe3/0x110
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x2f/0x50
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1b/0x30
     </TASK>
    ---[ end trace 0000000000000000 ]---
    ...
    repeated 7 more times, 1 for each channel of the mux
    ...
    
    The issue is that the binding of the ACPI devices to their peer I2C
    adapters is not correctly cleaned up. Digging deeper into the issue we
    see that the deletion order is such that the ACPI devices matching the
    mux channel i2c adapters are deleted first during the SSDT overlay
    removal. For each of the channels we see a call to i2c_acpi_notify()
    with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
    actually i2c_clients, nothing is done for them.
    
    Later on, after each of the mux channels has been dealt with, we come
    to delete the i2c_client representing the PCA9548 device. This is the
    call stack we see above, whereby the kernel cleans up the i2c_client
    including destruction of the mux and its channel adapters. At this
    point we do attempt to unbind from the ACPI peers but those peers no
    longer exist and so we hit the kernfs errors.
    
    The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
    given that the life cycle of the adapters is linked to the i2c_client,
    instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
    just trigger unbinding of the ACPI device from the adapter device, and
    allow the clean up of the adapter to continue in the way it always has.
    
    Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
    Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
    Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
    Cc: <stable@vger.kernel.org> # v4.8+
    Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i3c: master: svc: fix invalidate IBI type and miss call client IBI handler [+ + +]

Author: Frank Li <Frank.Li@nxp.com>
Date:   Mon May 6 12:40:09 2024 -0400

    i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
    
    commit 38baed9b8600008e5d7bc8cb9ceccc1af3dd54b7 upstream.
    
    In an In-Band Interrupt (IBI) handle, the code logic is as follows:
    
    1: writel(SVC_I3C_MCTRL_REQUEST_AUTO_IBI | SVC_I3C_MCTRL_IBIRESP_AUTO,
              master->regs + SVC_I3C_MCTRL);
    
    2: ret = readl_relaxed_poll_timeout(master->regs + SVC_I3C_MSTATUS, val,
                                        SVC_I3C_MSTATUS_IBIWON(val), 0, 1000);
            ...
    3: ibitype = SVC_I3C_MSTATUS_IBITYPE(status);
       ibiaddr = SVC_I3C_MSTATUS_IBIADDR(status);
    
    SVC_I3C_MSTATUS_IBIWON may be set before step 1. Thus, step 2 will return
    immediately, and the I3C controller has not sent out the 9th SCL yet.
    Consequently, ibitype and ibiaddr are 0, resulting in an unknown IBI type
    occurrence and missing call I3C client driver's IBI handler.
    
    A typical case is that SVC_I3C_MSTATUS_IBIWON is set when an IBI occurs
    during the controller send start frame in svc_i3c_master_xfer().
    
    Clear SVC_I3C_MSTATUS_IBIWON before issue SVC_I3C_MCTRL_REQUEST_AUTO_IBI
    to fix this issue.
    
    Cc: stable@vger.kernel.org
    Fixes: 5e5e3c92e748 ("i3c: master: svc: fix wrong data return when IBI happen during start frame")
    Signed-off-by: Frank Li <Frank.Li@nxp.com>
    Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Link: https://lore.kernel.org/r/20240506164009.21375-3-Frank.Li@nxp.com
    Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

intel_th: pci: Add Meteor Lake-S CPU support [+ + +]

Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date:   Mon Apr 29 16:01:18 2024 +0300

    intel_th: pci: Add Meteor Lake-S CPU support
    
    commit a4f813c3ec9d1c32bc402becd1f011b3904dd699 upstream.
    
    Add support for the Trace Hub in Meteor Lake-S CPU.
    
    Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/r/20240429130119.1518073-15-alexander.shishkin@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring/napi: fix timeout calculation [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Jun 3 13:56:53 2024 -0600

    io_uring/napi: fix timeout calculation
    
    commit 415ce0ea55c5a3afea501a773e002be9ed7149f5 upstream.
    
    Not quite sure what __io_napi_adjust_timeout() was attemping to do, it's
    adjusting both the NAPI timeout and the general overall timeout, and
    calculating a value that is never used. The overall timeout is a super
    set of the NAPI timeout, and doesn't need adjusting. The only thing we
    really need to care about is that the NAPI timeout doesn't exceed the
    overall timeout. If a user asked for a timeout of eg 5 usec and NAPI
    timeout is 10 usec, then we should not spin for 10 usec.
    
    While in there, sanitize the time checking a bit. If we have a negative
    value in the passed in timeout, discard it. Round up the value as well,
    so we don't end up with a NAPI timeout for the majority of the wait,
    with only a tiny sleep value at the end.
    
    Hence the only case we need to care about is if the NAPI timeout is
    larger than the overall timeout. If it is, cap the NAPI timeout at what
    the overall timeout is.
    
    Cc: stable@vger.kernel.org
    Fixes: 8d0c12a80cde ("io-uring: add napi busy poll support")
    Reported-by: Lewis Baker <lewissbaker@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: check for non-NULL file pointer in io_file_can_poll() [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Sat Jun 1 12:25:35 2024 -0600

    io_uring: check for non-NULL file pointer in io_file_can_poll()
    
    commit 5fc16fa5f13b3c06fdb959ef262050bd810416a2 upstream.
    
    In earlier kernels, it was possible to trigger a NULL pointer
    dereference off the forced async preparation path, if no file had
    been assigned. The trace leading to that looks as follows:
    
    BUG: kernel NULL pointer dereference, address: 00000000000000b0
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP
    CPU: 67 PID: 1633 Comm: buf-ring-invali Not tainted 6.8.0-rc3+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
    RIP: 0010:io_buffer_select+0xc3/0x210
    Code: 00 00 48 39 d1 0f 82 ae 00 00 00 48 81 4b 48 00 00 01 00 48 89 73 70 0f b7 50 0c 66 89 53 42 85 ed 0f 85 d2 00 00 00 48 8b 13 <48> 8b 92 b0 00 00 00 48 83 7a 40 00 0f 84 21 01 00 00 4c 8b 20 5b
    RSP: 0018:ffffb7bec38c7d88 EFLAGS: 00010246
    RAX: ffff97af2be61000 RBX: ffff97af234f1700 RCX: 0000000000000040
    RDX: 0000000000000000 RSI: ffff97aecfb04820 RDI: ffff97af234f1700
    RBP: 0000000000000000 R08: 0000000000200030 R09: 0000000000000020
    R10: ffffb7bec38c7dc8 R11: 000000000000c000 R12: ffffb7bec38c7db8
    R13: ffff97aecfb05800 R14: ffff97aecfb05800 R15: ffff97af2be5e000
    FS:  00007f852f74b740(0000) GS:ffff97b1eeec0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000b0 CR3: 000000016deab005 CR4: 0000000000370ef0
    Call Trace:
     <TASK>
     ? __die+0x1f/0x60
     ? page_fault_oops+0x14d/0x420
     ? do_user_addr_fault+0x61/0x6a0
     ? exc_page_fault+0x6c/0x150
     ? asm_exc_page_fault+0x22/0x30
     ? io_buffer_select+0xc3/0x210
     __io_import_iovec+0xb5/0x120
     io_readv_prep_async+0x36/0x70
     io_queue_sqe_fallback+0x20/0x260
     io_submit_sqes+0x314/0x630
     __do_sys_io_uring_enter+0x339/0xbc0
     ? __do_sys_io_uring_register+0x11b/0xc50
     ? vm_mmap_pgoff+0xce/0x160
     do_syscall_64+0x5f/0x180
     entry_SYSCALL_64_after_hwframe+0x46/0x4e
    RIP: 0033:0x55e0a110a67e
    Code: ba cc 00 00 00 45 31 c0 44 0f b6 92 d0 00 00 00 31 d2 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> 90 89 30 eb a9 0f 1f 40 00 48 8b 42 20 8b 00 a8 06 75 af 85 f6
    
    because the request is marked forced ASYNC and has a bad file fd, and
    hence takes the forced async prep path.
    
    Current kernels with the request async prep cleaned up can no longer hit
    this issue, but for ease of backporting, let's add this safety check in
    here too as it really doesn't hurt. For both cases, this will inevitably
    end with a CQE posted with -EBADF.
    
    Cc: stable@vger.kernel.org
    Fixes: a76c0b31eef5 ("io_uring: commit non-pollable provided mapped buffers upfront")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iomap: fault in smaller chunks for non-large folio mappings [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Tue May 21 19:49:39 2024 +0800

    iomap: fault in smaller chunks for non-large folio mappings
    
    commit 4e527d5841e24623181edc7fd6f6598ffa810e10 upstream.
    
    Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
    iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
    mapping doesn't support large folio, only one page of maximum 4KB will
    be created and 4KB data will be writen to pagecache each time. Then,
    next 4KB will be handled in next iteration. This will cause potential
    write performance problem.
    
    If chunk is 2MB, total 512 pages need to be handled finally. During this
    period, fault_in_iov_iter_readable() is called to check iov_iter readable
    validity. Since only 4KB will be handled each time, below address space
    will be checked over and over again:
    
    start           end
    -
    buf,            buf+2MB
    buf+4KB,        buf+2MB
    buf+8KB,        buf+2MB
    ...
    buf+2044KB      buf+2MB
    
    Obviously the checking size is wrong since only 4KB will be handled each
    time. So this will get a correct chunk to let iomap work well in non-large
    folio case.
    
    With this change, the write speed will be stable. Tested on ARM64 device.
    
    Before:
    
     - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)
    
    After:
    
     - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
     - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)
    
    Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
    Cc: stable@vger.kernel.org
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails [+ + +]

Author: Sunil V L <sunilvl@ventanamicro.com>
Date:   Mon May 27 13:41:13 2024 +0530

    irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails
    
    commit 0110c4b110477bb1f19b0d02361846be7ab08300 upstream.
    
    When riscv_intc_init_common() fails, the firmware node allocated is not
    freed. Add the missing free().
    
    Fixes: 7023b9d83f03 ("irqchip/riscv-intc: Add ACPI support")
    Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Anup Patel <anup@brainfault.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240527081113.616189-1-sunilvl@ventanamicro.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kbuild: Remove support for Clang's ThinLTO caching [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Wed May 1 15:55:25 2024 -0700

    kbuild: Remove support for Clang's ThinLTO caching
    
    commit aba091547ef6159d52471f42a3ef531b7b660ed8 upstream.
    
    There is an issue in clang's ThinLTO caching (enabled for the kernel via
    '--thinlto-cache-dir') with .incbin, which the kernel occasionally uses
    to include data within the kernel, such as the .config file for
    /proc/config.gz. For example, when changing the .config and rebuilding
    vmlinux, the copy of .config in vmlinux does not match the copy of
    .config in the build folder:
    
      $ echo 'CONFIG_LTO_NONE=n
      CONFIG_LTO_CLANG_THIN=y
      CONFIG_IKCONFIG=y
      CONFIG_HEADERS_INSTALL=y' >kernel/configs/repro.config
    
      $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config vmlinux
      ...
    
      $ grep CONFIG_HEADERS_INSTALL .config
      CONFIG_HEADERS_INSTALL=y
    
      $ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
      CONFIG_HEADERS_INSTALL=y
    
      $ scripts/config -d HEADERS_INSTALL
    
      $ make -kj"$(nproc)" ARCH=x86_64 LLVM=1 vmlinux
      ...
        UPD     kernel/config_data
        GZIP    kernel/config_data.gz
        CC      kernel/configs.o
      ...
        LD      vmlinux
      ...
    
      $ grep CONFIG_HEADERS_INSTALL .config
      # CONFIG_HEADERS_INSTALL is not set
    
      $ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
      CONFIG_HEADERS_INSTALL=y
    
    Without '--thinlto-cache-dir' or when using full LTO, this issue does
    not occur.
    
    Benchmarking incremental builds on a few different machines with and
    without the cache shows a 20% increase in incremental build time without
    the cache when measured by touching init/main.c and running 'make all'.
    
    ARCH=arm64 defconfig + CONFIG_LTO_CLANG_THIN=y on an arm64 host:
    
      Benchmark 1: With ThinLTO cache
        Time (mean ± σ):     56.347 s ±  0.163 s    [User: 83.768 s, System: 24.661 s]
        Range (min … max):   56.109 s … 56.594 s    10 runs
    
      Benchmark 2: Without ThinLTO cache
        Time (mean ± σ):     67.740 s ±  0.479 s    [User: 718.458 s, System: 31.797 s]
        Range (min … max):   67.059 s … 68.556 s    10 runs
    
      Summary
        With ThinLTO cache ran
          1.20 ± 0.01 times faster than Without ThinLTO cache
    
    ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y on an x86_64 host:
    
      Benchmark 1: With ThinLTO cache
        Time (mean ± σ):     85.772 s ±  0.252 s    [User: 91.505 s, System: 8.408 s]
        Range (min … max):   85.447 s … 86.244 s    10 runs
    
      Benchmark 2: Without ThinLTO cache
        Time (mean ± σ):     103.833 s ±  0.288 s    [User: 232.058 s, System: 8.569 s]
        Range (min … max):   103.286 s … 104.124 s    10 runs
    
      Summary
        With ThinLTO cache ran
          1.21 ± 0.00 times faster than Without ThinLTO cache
    
    While it is unfortunate to take this performance improvement off the
    table, correctness is more important. If/when this is fixed in LLVM, it
    can potentially be brought back in a conditional manner. Alternatively,
    a developer can just disable LTO if doing incremental compiles quickly
    is important, as a full compile cycle can still take over a minute even
    with the cache and it is unlikely that LTO will result in functional
    differences for a kernel change.
    
    Cc: stable@vger.kernel.org
    Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO")
    Reported-by: Yifan Hong <elsk@google.com>
    Closes: https://github.com/ClangBuiltLinux/linux/issues/2021
    Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
    Closes: https://lore.kernel.org/r/20220327115526.cc4b0ff55fc53c97683c3e4d@kernel.org/
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kdb: Fix buffer overflow during tab-complete [+ + +]

Author: Daniel Thompson <daniel.thompson@linaro.org>
Date:   Wed Apr 24 15:03:34 2024 +0100

    kdb: Fix buffer overflow during tab-complete
    
    commit e9730744bf3af04cda23799029342aa3cddbc454 upstream.
    
    Currently, when the user attempts symbol completion with the Tab key, kdb
    will use strncpy() to insert the completed symbol into the command buffer.
    Unfortunately it passes the size of the source buffer rather than the
    destination to strncpy() with predictably horrible results. Most obviously
    if the command buffer is already full but cp, the cursor position, is in
    the middle of the buffer, then we will write past the end of the supplied
    buffer.
    
    Fix this by replacing the dubious strncpy() calls with memmove()/memcpy()
    calls plus explicit boundary checks to make sure we have enough space
    before we start moving characters around.
    
    Reported-by: Justin Stitt <justinstitt@google.com>
    Closes: https://lore.kernel.org/all/CAFhGd8qESuuifuHsNjFPR-Va3P80bxrw+LqvC8deA8GziUJLpw@mail.gmail.com/
    Cc: stable@vger.kernel.org
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Reviewed-by: Justin Stitt <justinstitt@google.com>
    Tested-by: Justin Stitt <justinstitt@google.com>
    Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-1-f236dbe9828d@linaro.org
    Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kdb: Fix console handling when editing and tab-completing commands [+ + +]

Author: Daniel Thompson <daniel.thompson@linaro.org>
Date:   Wed Apr 24 15:03:36 2024 +0100

    kdb: Fix console handling when editing and tab-completing commands
    
    commit db2f9c7dc29114f531df4a425d0867d01e1f1e28 upstream.
    
    Currently, if the cursor position is not at the end of the command buffer
    and the user uses the Tab-complete functions, then the console does not
    leave the cursor in the correct position.
    
    For example consider the following buffer with the cursor positioned
    at the ^:
    
    md kdb_pro 10
              ^
    
    Pressing tab should result in:
    
    md kdb_prompt_str 10
                     ^
    
    However this does not happen. Instead the cursor is placed at the end
    (after then 10) and further cursor movement redraws incorrectly. The
    same problem exists when we double-Tab but in a different part of the
    code.
    
    Fix this by sending a carriage return and then redisplaying the text to
    the left of the cursor.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Tested-by: Justin Stitt <justinstitt@google.com>
    Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-3-f236dbe9828d@linaro.org
    Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kdb: Merge identical case statements in kdb_read() [+ + +]

Author: Daniel Thompson <daniel.thompson@linaro.org>
Date:   Wed Apr 24 15:03:37 2024 +0100

    kdb: Merge identical case statements in kdb_read()
    
    commit 6244917f377bf64719551b58592a02a0336a7439 upstream.
    
    The code that handles case 14 (down) and case 16 (up) has been copy and
    pasted despite being byte-for-byte identical. Combine them.
    
    Cc: stable@vger.kernel.org # Not a bug fix but it is needed for later bug fixes
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Tested-by: Justin Stitt <justinstitt@google.com>
    Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-4-f236dbe9828d@linaro.org
    Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kdb: Use format-specifiers rather than memset() for padding in kdb_read() [+ + +]

Author: Daniel Thompson <daniel.thompson@linaro.org>
Date:   Wed Apr 24 15:03:38 2024 +0100

    kdb: Use format-specifiers rather than memset() for padding in kdb_read()
    
    commit c9b51ddb66b1d96e4d364c088da0f1dfb004c574 upstream.
    
    Currently when the current line should be removed from the display
    kdb_read() uses memset() to fill a temporary buffer with spaces.
    The problem is not that this could be trivially implemented using a
    format string rather than open coding it. The real problem is that
    it is possible, on systems with a long kdb_prompt_str, to write past
    the end of the tmpbuffer.
    
    Happily, as mentioned above, this can be trivially implemented using a
    format string. Make it so!
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Tested-by: Justin Stitt <justinstitt@google.com>
    Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-5-f236dbe9828d@linaro.org
    Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kdb: Use format-strings rather than '\0' injection in kdb_read() [+ + +]

Author: Daniel Thompson <daniel.thompson@linaro.org>
Date:   Wed Apr 24 15:03:35 2024 +0100

    kdb: Use format-strings rather than '\0' injection in kdb_read()
    
    commit 09b35989421dfd5573f0b4683c7700a7483c71f9 upstream.
    
    Currently when kdb_read() needs to reposition the cursor it uses copy and
    paste code that works by injecting an '\0' at the cursor position before
    delivering a carriage-return and reprinting the line (which stops at the
    '\0').
    
    Tidy up the code by hoisting the copy and paste code into an appropriately
    named function. Additionally let's replace the '\0' injection with a
    proper field width parameter so that the string will be abridged during
    formatting instead.
    
    Cc: stable@vger.kernel.org # Not a bug fix but it is needed for later bug fixes
    Tested-by: Justin Stitt <justinstitt@google.com>
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-2-f236dbe9828d@linaro.org
    Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kmsan: do not wipe out origin when doing partial unpoisoning [+ + +]

Author: Alexander Potapenko <glider@google.com>
Date:   Tue May 28 12:48:06 2024 +0200

    kmsan: do not wipe out origin when doing partial unpoisoning
    
    commit 2ef3cec44c60ae171b287db7fc2aa341586d65ba upstream.
    
    As noticed by Brian, KMSAN should not be zeroing the origin when
    unpoisoning parts of a four-byte uninitialized value, e.g.:
    
        char a[4];
        kmsan_unpoison_memory(a, 1);
    
    This led to false negatives, as certain poisoned values could receive zero
    origins, preventing those values from being reported.
    
    To fix the problem, check that kmsan_internal_set_shadow_origin() writes
    zero origins only to slots which have zero shadow.
    
    Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
    Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
    Signed-off-by: Alexander Potapenko <glider@google.com>
    Reported-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
      Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.com/T/
    Reviewed-by: Marco Elver <elver@google.com>
    Tested-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: AArch32: Fix spurious trapping of conditional instructions [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Fri May 24 15:19:56 2024 +0100

    KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
    
    commit c92e8b9eacebb4060634ebd9395bba1b29aadc68 upstream.
    
    We recently upgraded the view of ESR_EL2 to 64bit, in keeping with
    the requirements of the architecture.
    
    However, the AArch32 emulation code was left unaudited, and the
    (already dodgy) code that triages whether a trap is spurious or not
    (because the condition code failed) broke in a subtle way:
    
    If ESR_EL2.ISS2 is ever non-zero (unlikely, but hey, this is the ARM
    architecture we're talking about), the hack that tests the top bits
    of ESR_EL2.EC will break in an interesting way.
    
    Instead, use kvm_vcpu_trap_get_class() to obtain the EC, and list
    all the possible ECs that can fail a condition code check.
    
    While we're at it, add SMC32 to the list, as it is explicitly listed
    as being allowed to trap despite failing a condition code check (as
    described in the HCR_EL2.TSC documentation).
    
    Fixes: 0b12620fddb8 ("KVM: arm64: Treat ESR_EL2 as a 64-bit register")
    Cc: stable@vger.kernel.org
    Acked-by: Oliver Upton <oliver.upton@linux.dev>
    Link: https://lore.kernel.org/r/20240524141956.1450304-4-maz@kernel.org
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Fri May 24 15:19:55 2024 +0100

    KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
    
    commit dfe6d190f38fc5df5ff2614b463a5195a399c885 upstream.
    
    It appears that we don't allow a vcpu to be restored in AArch32
    System mode, as we *never* included it in the list of valid modes.
    
    Just add it to the list of allowed modes.
    
    Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu")
    Cc: stable@vger.kernel.org
    Acked-by: Oliver Upton <oliver.upton@linux.dev>
    Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Fix AArch32 register narrowing on userspace write [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Fri May 24 15:19:54 2024 +0100

    KVM: arm64: Fix AArch32 register narrowing on userspace write
    
    commit 947051e361d551e0590777080ffc4926190f62f2 upstream.
    
    When userspace writes to one of the core registers, we make
    sure to narrow the corresponding GPRs if PSTATE indicates
    an AArch32 context.
    
    The code tries to check whether the context is EL0 or EL1 so
    that it narrows the correct registers. But it does so by checking
    the full PSTATE instead of PSTATE.M.
    
    As a consequence, and if we are restoring an AArch32 EL0 context
    in a 64bit guest, and that PSTATE has *any* bit set outside of
    PSTATE.M, we narrow *all* registers instead of only the first 15,
    destroying the 64bit state.
    
    Obviously, this is not something the guest is likely to enjoy.
    
    Correctly masking PSTATE to only evaluate PSTATE.M fixes it.
    
    Fixes: 90c1f934ed71 ("KVM: arm64: Get rid of the AArch32 register mapping code")
    Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
    Acked-by: Oliver Upton <oliver.upton@linux.dev>
    Link: https://lore.kernel.org/r/20240524141956.1450304-2-maz@kernel.org
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: SVM: WARN on vNMI + NMI window iff NMIs are outright masked [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Tue May 21 19:14:35 2024 -0700

    KVM: SVM: WARN on vNMI + NMI window iff NMIs are outright masked
    
    commit b4bd556467477420ee3a91fbcba73c579669edc6 upstream.
    
    When requesting an NMI window, WARN on vNMI support being enabled if and
    only if NMIs are actually masked, i.e. if the vCPU is already handling an
    NMI.  KVM's ABI for NMIs that arrive simultanesouly (from KVM's point of
    view) is to inject one NMI and pend the other.  When using vNMI, KVM pends
    the second NMI simply by setting V_NMI_PENDING, and lets the CPU do the
    rest (hardware automatically sets V_NMI_BLOCKING when an NMI is injected).
    
    However, if KVM can't immediately inject an NMI, e.g. because the vCPU is
    in an STI shadow or is running with GIF=0, then KVM will request an NMI
    window and trigger the WARN (but still function correctly).
    
    Whether or not the GIF=0 case makes sense is debatable, as the intent of
    KVM's behavior is to provide functionality that is as close to real
    hardware as possible.  E.g. if two NMIs are sent in quick succession, the
    probability of both NMIs arriving in an STI shadow is infinitesimally low
    on real hardware, but significantly larger in a virtual environment, e.g.
    if the vCPU is preempted in the STI shadow.  For GIF=0, the argument isn't
    as clear cut, because the window where two NMIs can collide is much larger
    in bare metal (though still small).
    
    That said, KVM should not have divergent behavior for the GIF=0 case based
    on whether or not vNMI support is enabled.  And KVM has allowed
    simultaneous NMIs with GIF=0 for over a decade, since commit 7460fb4a3400
    ("KVM: Fix simultaneous NMIs").  I.e. KVM's GIF=0 handling shouldn't be
    modified without a *really* good reason to do so, and if KVM's behavior
    were to be modified, it should be done irrespective of vNMI support.
    
    Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI")
    Cc: stable@vger.kernel.org
    Cc: Santosh Shukla <Santosh.Shukla@amd.com>
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-ID: <20240522021435.1684366-1-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: Linux 6.9.5 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Sun Jun 16 13:51:16 2024 +0200

    Linux 6.9.5
    
    Link: https://lore.kernel.org/r/20240613113227.389465891@linuxfoundation.org
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Add all CPUs enabled by fdt to NUMA node 0 [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Mon Jun 3 15:45:53 2024 +0800

    LoongArch: Add all CPUs enabled by fdt to NUMA node 0
    
    commit 3de9c42d02a79a5e09bbee7a4421ddc00cfd5c6d upstream.
    
    NUMA enabled kernel on FDT based machine fails to boot because CPUs
    are all in NUMA_NO_NODE and mm subsystem won't accept that.
    
    Fix by adding them to default NUMA node at FDT parsing phase and move
    numa_add_cpu(0) to a later point.
    
    Cc: stable@vger.kernel.org
    Fixes: 88d4d957edc7 ("LoongArch: Add FDT booting support from efi system table")
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Fix built-in DTB detection [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Mon Jun 3 15:45:53 2024 +0800

    LoongArch: Fix built-in DTB detection
    
    commit b56f67a6c748bb009f313f91651c8020d2338d63 upstream.
    
    fdt_check_header(__dtb_start) will always success because kernel
    provides a dummy dtb, and by coincidence __dtb_start clashed with
    entry of this dummy dtb. The consequence is fdt passed from firmware
    will never be taken.
    
    Fix by trying to utilise __dtb_start only when CONFIG_BUILTIN_DTB is
    enabled.
    
    Cc: stable@vger.kernel.org
    Fixes: 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware")
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Fix entry point in kernel image header [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Mon Jun 3 15:45:53 2024 +0800

    LoongArch: Fix entry point in kernel image header
    
    commit beb2800074c15362cf9f6c7301120910046d6556 upstream.
    
    Currently kernel entry in head.S is in DMW address range, firmware is
    instructed to jump to this address after loading the kernel image.
    
    However kernel should not make any assumption on firmware's DMW
    setting, thus the entry point should be a physical address falls into
    direct translation region.
    
    Fix by converting entry address to physical and amend entry calculation
    logic in libstub accordingly.
    
    BTW, use ABSOLUTE() to calculate variables to make Clang/LLVM happy.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Override higher address bits in JUMP_VIRT_ADDR [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Mon Jun 3 15:45:53 2024 +0800

    LoongArch: Override higher address bits in JUMP_VIRT_ADDR
    
    commit 1098efd299ffe9c8af818425338c7f6c4f930a98 upstream.
    
    In JUMP_VIRT_ADDR we are performing an or calculation on address value
    directly from pcaddi.
    
    This will only work if we are currently running from direct 1:1 mapping
    addresses or firmware's DMW is configured exactly same as kernel. Still,
    we should not rely on such assumption.
    
    Fix by overriding higher bits in address comes from pcaddi, so we can
    get rid of or operator.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING [+ + +]

Author: Yu Kuai <yukuai3@huawei.com>
Date:   Fri Mar 22 16:10:05 2024 +0800

    md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING
    
    commit 151f66bb618d1fd0eeb84acb61b4a9fa5d8bb0fa upstream.
    
    Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with
    small possibility, the root cause is exactly the same as commit
    bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
    
    However, Dan reported another hang after that, and junxiao investigated
    the problem and found out that this is caused by plugged bio can't issue
    from raid5d().
    
    Current implementation in raid5d() has a weird dependence:
    
    1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear
       MD_SB_CHANGE_PENDING;
    2) raid5d() handles IO in a deadloop, until all IO are issued;
    3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared;
    
    This behaviour is introduce before v2.6, and for consequence, if other
    context hold 'reconfig_mutex', and md_check_recovery() can't update
    super_block, then raid5d() will waste one cpu 100% by the deadloop, until
    'reconfig_mutex' is released.
    
    Refer to the implementation from raid1 and raid10, fix this problem by
    skipping issue IO if MD_SB_CHANGE_PENDING is still set after
    md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex'
    is released. Meanwhile, the hang problem will be fixed as well.
    
    Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
    Cc: stable@vger.kernel.org # v5.19+
    Reported-and-tested-by: Dan Moulding <dan@danm.net>
    Closes: https://lore.kernel.org/all/20240123005700.9302-1-dan@danm.net/
    Investigated-by: Junxiao Bi <junxiao.bi@oracle.com>
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Link: https://lore.kernel.org/r/20240322081005.1112401-1-yukuai1@huaweicloud.com
    Signed-off-by: Song Liu <song@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: lgdt3306a: Add a check against null-pointer-def [+ + +]

Author: Zheyu Ma <zheyuma97@gmail.com>
Date:   Tue Apr 5 10:50:18 2022 +0100

    media: lgdt3306a: Add a check against null-pointer-def
    
    commit c1115ddbda9c930fba0fdd062e7a8873ebaf898d upstream.
    
    The driver should check whether the client provides the platform_data.
    
    The following log reveals it:
    
    [   29.610324] BUG: KASAN: null-ptr-deref in kmemdup+0x30/0x40
    [   29.610730] Read of size 40 at addr 0000000000000000 by task bash/414
    [   29.612820] Call Trace:
    [   29.613030]  <TASK>
    [   29.613201]  dump_stack_lvl+0x56/0x6f
    [   29.613496]  ? kmemdup+0x30/0x40
    [   29.613754]  print_report.cold+0x494/0x6b7
    [   29.614082]  ? kmemdup+0x30/0x40
    [   29.614340]  kasan_report+0x8a/0x190
    [   29.614628]  ? kmemdup+0x30/0x40
    [   29.614888]  kasan_check_range+0x14d/0x1d0
    [   29.615213]  memcpy+0x20/0x60
    [   29.615454]  kmemdup+0x30/0x40
    [   29.615700]  lgdt3306a_probe+0x52/0x310
    [   29.616339]  i2c_device_probe+0x951/0xa90
    
    Link: https://lore.kernel.org/linux-media/20220405095018.3993578-1-zheyuma97@gmail.com
    Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: mc: Fix graph walk in media_pipeline_start [+ + +]

Author: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Date:   Mon Mar 18 11:50:59 2024 +0200

    media: mc: Fix graph walk in media_pipeline_start
    
    commit 8a9d420149c477e7c97fbd6453704e4612bdd3fa upstream.
    
    The graph walk tries to follow all links, even if they are not between
    pads. This causes a crash with, e.g. a MEDIA_LNK_FL_ANCILLARY_LINK link.
    
    Fix this by allowing the walk to proceed only for MEDIA_LNK_FL_DATA_LINK
    links.
    
    Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
    Cc: stable@vger.kernel.org # for 6.1 and later
    Fixes: ae219872834a ("media: mc: entity: Rewrite media_pipeline_start()")
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: mc: mark the media devnode as registered from the, start [+ + +]

Author: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Date:   Fri Feb 23 09:46:19 2024 +0100

    media: mc: mark the media devnode as registered from the, start
    
    commit 4bc60736154bc9e0e39d3b88918f5d3762ebe5e0 upstream.
    
    First the media device node was created, and if successful it was
    marked as 'registered'. This leaves a small race condition where
    an application can open the device node and get an error back
    because the 'registered' flag was not yet set.
    
    Change the order: first set the 'registered' flag, then actually
    register the media device node. If that fails, then clear the flag.
    
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Fixes: cf4b9211b568 ("[media] media: Media device node support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: mgb4: Fix double debugfs remove [+ + +]

Author: Martin Tůma <martin.tuma@digiteqautomotive.com>
Date:   Tue May 21 18:22:54 2024 +0200

    media: mgb4: Fix double debugfs remove
    
    commit 825fc49497957310e421454fe3fb8b8d8d8e2dd2 upstream.
    
    Fixes an error where debugfs_remove_recursive() is called first on a parent
    directory and then again on a child which causes a kernel panic.
    
    Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Fixes: 0ab13674a9bd ("media: pci: mgb4: Added Digiteq Automotive MGB4 driver")
    Cc: <stable@vger.kernel.org>
    [hverkuil: added Fixes/Cc tags]
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: mxl5xx: Move xpt structures off stack [+ + +]

Author: Nathan Chancellor <nathan@kernel.org>
Date:   Fri Jan 12 00:40:36 2024 +0000

    media: mxl5xx: Move xpt structures off stack
    
    commit 526f4527545b2d4ce0733733929fac7b6da09ac6 upstream.
    
    When building for LoongArch with clang 18.0.0, the stack usage of
    probe() is larger than the allowed 2048 bytes:
    
      drivers/media/dvb-frontends/mxl5xx.c:1698:12: warning: stack frame size (2368) exceeds limit (2048) in 'probe' [-Wframe-larger-than]
       1698 | static int probe(struct mxl *state, struct mxl5xx_cfg *cfg)
            |            ^
      1 warning generated.
    
    This is the result of the linked LLVM commit, which changes how the
    arrays of structures in config_ts() get handled with
    CONFIG_INIT_STACK_ZERO and CONFIG_INIT_STACK_PATTERN, which causes the
    above warning in combination with inlining, as config_ts() gets inlined
    into probe().
    
    This warning can be easily fixed by moving the array of structures off
    of the stackvia 'static const', which is a better location for these
    variables anyways because they are static data that is only ever read
    from, never modified, so allocating the stack space is wasteful.
    
    This drops the stack usage from 2368 bytes to 256 bytes with the same
    compiler and configuration.
    
    Link: https://lore.kernel.org/linux-media/20240111-dvb-mxl5xx-move-structs-off-stack-v1-1-ca4230e67c11@kernel.org
    Cc: stable@vger.kernel.org
    Closes: https://github.com/ClangBuiltLinux/linux/issues/1977
    Link: https://github.com/llvm/llvm-project/commit/afe8b93ffdfef5d8879e1894b9d7dda40dee2b8d
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: ov2740: Fix LINK_FREQ and PIXEL_RATE control value reporting [+ + +]

Author: Sakari Ailus <sakari.ailus@linux.intel.com>
Date:   Wed Mar 27 10:57:31 2024 +0200

    media: ov2740: Fix LINK_FREQ and PIXEL_RATE control value reporting
    
    commit f7aa5995910cb5e7a5419c6705f465c55973b714 upstream.
    
    The driver dug the supported link frequency up from the V4L2 fwnode
    endpoint and used it internally, but failed to report this in the
    LINK_FREQ and PIXEL_RATE controls. Fix this.
    
    Fixes: 0677a2d9b735 ("media: ov2740: Add support for 180 MHz link frequency")
    Cc: stable@vger.kernel.org # for v6.8 and later
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Bingbu Cao <bingbu.cao@intel.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: v4l2-core: hold videodev_lock until dev reg, finishes [+ + +]

Author: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Date:   Fri Feb 23 09:45:36 2024 +0100

    media: v4l2-core: hold videodev_lock until dev reg, finishes
    
    commit 1ed4477f2ea4743e7c5e1f9f3722152d14e6eeb1 upstream.
    
    After the new V4L2 device node was registered, some additional
    initialization was done before the device node was marked as
    'registered'. During the time between creating the device node
    and marking it as 'registered' it was possible to open the
    device node, which would return -ENODEV since the 'registered'
    flag was not yet set.
    
    Hold the videodev_lock mutex from just before the device node
    is registered until the 'registered' flag is set. Since v4l2_open
    will take the same lock, it will wait until this registration
    process is finished. This resolves this race condition.
    
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Cc: <stable@vger.kernel.org>      # for vi4.18 and up
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: v4l: async: Don't set notifier's V4L2 device if registering fails [+ + +]

Author: Sakari Ailus <sakari.ailus@linux.intel.com>
Date:   Fri Mar 8 15:07:45 2024 +0200

    media: v4l: async: Don't set notifier's V4L2 device if registering fails
    
    commit 46bc0234ad38063ce550ecf135c1a52458f0a804 upstream.
    
    The V4L2 device used to be set when the notifier was registered but this
    has been moved to the notifier initialisation. Don't touch the V4L2 device
    if registration fails.
    
    Fixes: b8ec754ae4c5 ("media: v4l: async: Set v4l2_device and subdev in async notifier init")
    Cc: <stable@vger.kernel.org> # for 6.6 and later
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: v4l: async: Fix notifier list entry init [+ + +]

Author: Alexander Stein <alexander.stein@ew.tq-group.com>
Date:   Thu Mar 7 15:24:51 2024 +0100

    media: v4l: async: Fix notifier list entry init
    
    commit 6d8acd02c4c6a8f917eefac1de2e035521ca119d upstream.
    
    struct v4l2_async_notifier has several list_head members, but only
    waiting_list and done_list are initialized. notifier_entry was kept
    'zeroed' leading to an uninitialized list_head.
    This results in a NULL-pointer dereference if csi2_async_register() fails,
    e.g. node for remote endpoint is disabled, and returns -ENOTCONN.
    The following calls to v4l2_async_nf_unregister() results in a NULL
    pointer dereference.
    Add the missing list head initializer.
    
    Fixes: b8ec754ae4c5 ("media: v4l: async: Set v4l2_device and subdev in async notifier init")
    Cc: <stable@vger.kernel.org> # for 6.6 and later
    Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: v4l: async: Properly re-initialise notifier entry in unregister [+ + +]

Author: Sakari Ailus <sakari.ailus@linux.intel.com>
Date:   Fri Mar 8 15:06:13 2024 +0200

    media: v4l: async: Properly re-initialise notifier entry in unregister
    
    commit 9537a8425a7a0222999d5839a0b394b1e8834b4a upstream.
    
    The notifier_entry of a notifier is not re-initialised after unregistering
    the notifier. This leads to dangling pointers being left there so use
    list_del_init() to return the notifier_entry an empty list.
    
    Fixes: b8ec754ae4c5 ("media: v4l: async: Set v4l2_device and subdev in async notifier init")
    Cc: <stable@vger.kernel.org> # for 6.6 and later
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/cma: drop incorrect alignment check in cma_init_reserved_mem [+ + +]

Author: Frank van der Linden <fvdl@google.com>
Date:   Thu Apr 4 16:25:14 2024 +0000

    mm/cma: drop incorrect alignment check in cma_init_reserved_mem
    
    commit b174f139bdc8aaaf72f5b67ad1bd512c4868a87e upstream.
    
    cma_init_reserved_mem uses IS_ALIGNED to check if the size represented by
    one bit in the cma allocation bitmask is aligned with
    CMA_MIN_ALIGNMENT_BYTES (pageblock size).
    
    However, this is too strict, as this will fail if order_per_bit >
    pageblock_order, which is a valid configuration.
    
    We could check IS_ALIGNED both ways, but since both numbers are powers of
    two, no check is needed at all.
    
    Link: https://lkml.kernel.org/r/20240404162515.527802-1-fvdl@google.com
    Fixes: de9e14eebf33 ("drivers: dma-contiguous: add initialization from device tree")
    Signed-off-by: Frank van der Linden <fvdl@google.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Marek Szyprowski <m.szyprowski@samsung.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/hugetlb: do not call vma_add_reservation upon ENOMEM [+ + +]

Author: Oscar Salvador <osalvador@suse.de>
Date:   Tue May 28 22:53:23 2024 +0200

    mm/hugetlb: do not call vma_add_reservation upon ENOMEM
    
    commit 8daf9c702ee7f825f0de8600abff764acfedea13 upstream.
    
    sysbot reported a splat [1] on __unmap_hugepage_range().  This is because
    vma_needs_reservation() can return -ENOMEM if
    allocate_file_region_entries() fails to allocate the file_region struct
    for the reservation.
    
    Check for that and do not call vma_add_reservation() if that is the case,
    otherwise region_abort() and region_del() will see that we do not have any
    file_regions.
    
    If we detect that vma_needs_reservation() returned -ENOMEM, we clear the
    hugetlb_restore_reserve flag as if this reservation was still consumed, so
    free_huge_folio() will not increment the resv count.
    
    [1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/#ma5983bc1ab18a54910da83416b3f89f3c7ee43aa
    
    Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de
    Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed")
    Signed-off-by: Oscar Salvador <osalvador@suse.de>
    Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/
    Cc: Breno Leitao <leitao@debian.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid [+ + +]

Author: Frank van der Linden <fvdl@google.com>
Date:   Thu Apr 4 16:25:15 2024 +0000

    mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid
    
    commit 55d134a7b499c77e7cfd0ee41046f3c376e791e5 upstream.
    
    The hugetlb_cma code passes 0 in the order_per_bit argument to
    cma_declare_contiguous_nid (the alignment, computed using the page order,
    is correctly passed in).
    
    This causes a bit in the cma allocation bitmap to always represent a 4k
    page, making the bitmaps potentially very large, and slower.
    
    It would create bitmaps that would be pretty big.  E.g.  for a 4k page
    size on x86, hugetlb_cma=64G would mean a bitmap size of (64G / 4k) / 8
    == 2M.  With HUGETLB_PAGE_ORDER as order_per_bit, as intended, this
    would be (64G / 2M) / 8 == 4k.  So, that's quite a difference.
    
    Also, this restricted the hugetlb_cma area to ((PAGE_SIZE <<
    MAX_PAGE_ORDER) * 8) * PAGE_SIZE (e.g.  128G on x86) , since
    bitmap_alloc uses normal page allocation, and is thus restricted by
    MAX_PAGE_ORDER.  Specifying anything about that would fail the CMA
    initialization.
    
    So, correctly pass in the order instead.
    
    Link: https://lkml.kernel.org/r/20240404162515.527802-2-fvdl@google.com
    Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
    Signed-off-by: Frank van der Linden <fvdl@google.com>
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Marek Szyprowski <m.szyprowski@samsung.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/ksm: fix ksm_pages_scanned accounting [+ + +]

Author: Chengming Zhou <chengming.zhou@linux.dev>
Date:   Tue May 28 13:15:21 2024 +0800

    mm/ksm: fix ksm_pages_scanned accounting
    
    commit 730cdc2c72c6905a2eda2fccbbf67dcef1206590 upstream.
    
    Patch series "mm/ksm: fix some accounting problems", v3.
    
    We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during
    some random tests.
    
    1. ksm_pages_scanned unchanged even ksmd scanning has progress.
    2. ksm_zero_pages maybe -1 in some rare cases.
    
    
    This patch (of 2):
    
    During testing, I found ksm_pages_scanned is unchanged although the
    scan_get_next_rmap_item() did return valid rmap_item that is not NULL.
    
    The reason is the scan_get_next_rmap_item() will return NULL after a full
    scan, so ksm_do_scan() just return without accounting of the
    ksm_pages_scanned.
    
    Fix it by just putting ksm_pages_scanned accounting in that loop, and it
    will be accounted more timely if that loop would last for a long time.
    
    Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.dev
    Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.dev
    Fixes: b348b5fe2b5f ("mm/ksm: add pages scanned metric")
    Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: xu xin <xu.xin16@zte.com.cn>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
    Cc: Stefan Roesch <shr@devkernel.io>
    Cc: Yang Yang <yang.yang29@zte.com.cn>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/ksm: fix ksm_zero_pages accounting [+ + +]

Author: Chengming Zhou <chengming.zhou@linux.dev>
Date:   Tue May 28 13:15:22 2024 +0800

    mm/ksm: fix ksm_zero_pages accounting
    
    commit c2dc78b86e0821ecf9a9d0c35dba2618279a5bb6 upstream.
    
    We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
    but ksm_zero_pages-- is done from page tables side, where there is no any
    accessing protection of ksm_zero_pages.
    
    So we can read very exceptional value of ksm_zero_pages in rare cases,
    such as -1, which is very confusing to users.
    
    Fix it by changing to use atomic_long_t, and the same case with the
    mm->ksm_zero_pages.
    
    Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.dev
    Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
    Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process")
    Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
    Cc: Stefan Roesch <shr@devkernel.io>
    Cc: xu xin <xu.xin16@zte.com.cn>
    Cc: Yang Yang <yang.yang29@zte.com.cn>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/memory-failure: fix handling of dissolved but not taken off from buddy pages [+ + +]

Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 23 15:12:17 2024 +0800

    mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
    
    commit 8cf360b9d6a840700e06864236a01a883b34bbad upstream.
    
    When I did memory failure tests recently, below panic occurs:
    
    page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
    flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
    raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
    ------------[ cut here ]------------
    kernel BUG at include/linux/page-flags.h:1009!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    RIP: 0010:__del_page_from_free_list+0x151/0x180
    RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
    RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
    RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
    RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
    R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
    R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
    FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
    Call Trace:
     <TASK>
     __rmqueue_pcplist+0x23b/0x520
     get_page_from_freelist+0x26b/0xe40
     __alloc_pages_noprof+0x113/0x1120
     __folio_alloc_noprof+0x11/0xb0
     alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
     __alloc_fresh_hugetlb_folio+0xe7/0x140
     alloc_pool_huge_folio+0x68/0x100
     set_max_huge_pages+0x13d/0x340
     hugetlb_sysctl_handler_common+0xe8/0x110
     proc_sys_call_handler+0x194/0x280
     vfs_write+0x387/0x550
     ksys_write+0x64/0xe0
     do_syscall_64+0xc2/0x1d0
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7ff916114887
    RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
    RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
    RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
    R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
    R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
     </TASK>
    Modules linked in: mce_inject hwpoison_inject
    ---[ end trace 0000000000000000 ]---
    
    And before the panic, there had an warning about bad page state:
    
    BUG: Bad page state in process page-types  pfn:8cee00
    page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
    flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
    page_type: 0xffffff7f(buddy)
    raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
    raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
    page dumped because: nonzero mapcount
    Modules linked in: mce_inject hwpoison_inject
    CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
    Call Trace:
     <TASK>
     dump_stack_lvl+0x83/0xa0
     bad_page+0x63/0xf0
     free_unref_page+0x36e/0x5c0
     unpoison_memory+0x50b/0x630
     simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
     debugfs_attr_write+0x42/0x60
     full_proxy_write+0x5b/0x80
     vfs_write+0xcd/0x550
     ksys_write+0x64/0xe0
     do_syscall_64+0xc2/0x1d0
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f189a514887
    RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
    RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
    RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
    R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
     </TASK>
    
    The root cause should be the below race:
    
     memory_failure
      try_memory_failure_hugetlb
       me_huge_page
        __page_handle_poison
         dissolve_free_hugetlb_folio
         drain_all_pages -- Buddy page can be isolated e.g. for compaction.
         take_page_off_buddy -- Failed as page is not in the buddy list.
                 -- Page can be putback into buddy after compaction.
        page_ref_inc -- Leads to buddy page with refcnt = 1.
    
    Then unpoison_memory() can unpoison the page and send the buddy page back
    into buddy list again leading to the above bad page state warning.  And
    bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
    page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
    allocate this page.
    
    Fix this issue by only treating __page_handle_poison() as successful when
    it returns 1.
    
    Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
    Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL [+ + +]

Author: Hailong.Liu <hailong.liu@oppo.com>
Date:   Fri May 10 18:01:31 2024 +0800

    mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
    
    commit 8e0545c83d672750632f46e3f9ad95c48c91a0fc upstream.
    
    commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
    includes support for __GFP_NOFAIL, but it presents a conflict with commit
    dd544141b9eb ("vmalloc: back off when the current task is OOM-killed").  A
    possible scenario is as follows:
    
    process-a
    __vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
        __vmalloc_area_node()
            vm_area_alloc_pages()
                    --> oom-killer send SIGKILL to process-a
            if (fatal_signal_pending(current)) break;
    --> return NULL;
    
    To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
    if __GFP_NOFAIL set.
    
    This issue occurred during OPLUS KASAN TEST. Below is part of the log
    -> oom-killer sends signal to process
    [65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
    
    [65731.259685] [T32454] Call trace:
    [65731.259698] [T32454]  dump_backtrace+0xf4/0x118
    [65731.259734] [T32454]  show_stack+0x18/0x24
    [65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
    [65731.259781] [T32454]  dump_stack+0x18/0x38
    [65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
    [65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
    [65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
    [65731.260047] [T32454]  notify_die+0x114/0x198
    [65731.260073] [T32454]  die+0xf4/0x5b4
    [65731.260098] [T32454]  die_kernel_fault+0x80/0x98
    [65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
    [65731.260146] [T32454]  do_bad_area+0x68/0x148
    [65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
    [65731.260204] [T32454]  el1_abort+0x3c/0x5c
    [65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
    [65731.260248] [T32454]  el1h_64_sync+0x68/0x6c
    
    [65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
    --> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
            kernel panic by NULL pointer dereference.
            erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
    [65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
    [65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
    [65731.260339] [T32454]  read_pages+0x170/0xadc
    [65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
    [65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
    [65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
    [65731.260437] [T32454]  __do_fault+0xd0/0x33c
    [65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
    [65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
    [65731.260509] [T32454]  el0_da+0x44/0x94
    [65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
    [65731.260553] [T32454]  el0t_64_sync+0x198/0x19c
    
    Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
    Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
    Signed-off-by: Hailong.Liu <hailong.liu@oppo.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Suggested-by: Barry Song <21cnbao@gmail.com>
    Reported-by: Oven <liyangouwen1@oppo.com>
    Reviewed-by: Barry Song <baohua@kernel.org>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Gao Xiang <xiang@kernel.org>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again [+ + +]

Author: Yuanyuan Zhong <yzhong@purestorage.com>
Date:   Thu May 23 12:35:31 2024 -0600

    mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
    
    commit 6d065f507d82307d6161ac75c025111fb8b08a46 upstream.
    
    After switching smaps_rollup to use VMA iterator, searching for next entry
    is part of the condition expression of the do-while loop.  So the current
    VMA needs to be addressed before the continue statement.
    
    Otherwise, with some VMAs skipped, userspace observed memory
    consumption from /proc/pid/smaps_rollup will be smaller than the sum of
    the corresponding fields from /proc/pid/smaps.
    
    Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
    Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
    Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
    Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: fix race between __split_huge_pmd_locked() and GUP-fast [+ + +]

Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Wed May 1 15:33:10 2024 +0100

    mm: fix race between __split_huge_pmd_locked() and GUP-fast
    
    commit 3a5a8d343e1cf96eb9971b17cbd4b832ab19b8e7 upstream.
    
    __split_huge_pmd_locked() can be called for a present THP, devmap or
    (non-present) migration entry.  It calls pmdp_invalidate() unconditionally
    on the pmdp and only determines if it is present or not based on the
    returned old pmd.  This is a problem for the migration entry case because
    pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a
    present pmd.
    
    On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future
    call to pmd_present() will return true.  And therefore any lockless
    pgtable walker could see the migration entry pmd in this state and start
    interpretting the fields as if it were present, leading to BadThings (TM).
    GUP-fast appears to be one such lockless pgtable walker.
    
    x86 does not suffer the above problem, but instead pmd_mkinvalid() will
    corrupt the offset field of the swap entry within the swap pte.  See link
    below for discussion of that problem.
    
    Fix all of this by only calling pmdp_invalidate() for a present pmd.  And
    for good measure let's add a warning to all implementations of
    pmdp_invalidate[_ad]().  I've manually reviewed all other
    pmdp_invalidate[_ad]() call sites and believe all others to be conformant.
    
    This is a theoretical bug found during code review.  I don't have any test
    case to trigger it in practice.
    
    Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com
    Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
    Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
    Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: core: Add mmc_gpiod_set_cd_config() function [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Apr 10 21:16:34 2024 +0200

    mmc: core: Add mmc_gpiod_set_cd_config() function
    
    commit 63a7cd660246aa36af263b85c33ecc6601bf04be upstream.
    
    Some mmc host drivers may need to fixup a card-detection GPIO's config
    to e.g. enable the GPIO controllers builtin pull-up resistor on devices
    where the firmware description of the GPIO is broken (e.g. GpioInt with
    PullNone instead of PullUp in ACPI DSDT).
    
    Since this is the exception rather then the rule adding a config
    parameter to mmc_gpiod_request_cd() seems undesirable, so instead
    add a new mmc_gpiod_set_cd_config() function. This is simply a wrapper
    to call gpiod_set_config() on the card-detect GPIO acquired through
    mmc_gpiod_request_cd().
    
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-2-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: davinci: Don't strip remove function when driver is builtin [+ + +]

Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Date:   Sun Mar 24 12:40:17 2024 +0100

    mmc: davinci: Don't strip remove function when driver is builtin
    
    commit 55c421b364482b61c4c45313a535e61ed5ae4ea3 upstream.
    
    Using __exit for the remove function results in the remove callback being
    discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g.
    using sysfs or hotplug), the driver is just removed without the cleanup
    being performed. This results in resource leaks. Fix it by compiling in the
    remove callback unconditionally.
    
    This also fixes a W=1 modpost warning:
    
    WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in
    reference: davinci_mmcsd_driver+0x10 (section: .data) ->
    davinci_mmcsd_remove (section: .exit.text)
    
    Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family")
    Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutronix.de
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-acpi: Add quirk to enable pull-up on the card-detect GPIO on Asus T100TA [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Apr 10 21:16:39 2024 +0200

    mmc: sdhci-acpi: Add quirk to enable pull-up on the card-detect GPIO on Asus T100TA
    
    commit 431946c0f640c93421439a6c928efb3152c035a4 upstream.
    
    The card-detect GPIO for the microSD slot on Asus T100TA / T100TAM models
    stopped working under Linux after commit 6fd03f024828 ("gpiolib: acpi:
    support bias pull disable").
    
    The GPIO in question is connected to a mechanical switch in the slot
    which shorts the pin to GND when a card is inserted.
    
    The GPIO pin correctly gets configured with a 20K pull-up by the BIOS,
    but there is a bug in the DSDT where the GpioInt for the card-detect is
    configured with a PullNone setting:
    
        GpioInt (Edge, ActiveBoth, SharedAndWake, PullNone, 0x2710,
            "\\_SB.GPO0", 0x00, ResourceConsumer, ,
            )
            {   // Pin list
            0x0026
            }
    
    Linux now actually honors the PullNone setting and disables the 20K pull-up
    configured by the BIOS.
    
    Add a new DMI_QUIRK_SD_CD_ENABLE_PULL_UP quirk which when set calls
    mmc_gpiod_set_cd_config() to re-enable the pull-up and set this for
    the Asus T100TA models to fix this.
    
    Fixes: 6fd03f024828 ("gpiolib: acpi: support bias pull disable")
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-7-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-acpi: Disable write protect detection on Toshiba WT10-A [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Apr 10 21:16:38 2024 +0200

    mmc: sdhci-acpi: Disable write protect detection on Toshiba WT10-A
    
    commit ef3eab75e17191e5665f52e64e85bc29d5705a7b upstream.
    
    On the Toshiba WT10-A the microSD slot always reports the card being
    write-protected, just like on the Toshiba WT8-B.
    
    Add a DMI quirk to work around this.
    
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-6-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-acpi: Fix Lenovo Yoga Tablet 2 Pro 1380 sdcard slot not working [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Apr 10 21:16:37 2024 +0200

    mmc: sdhci-acpi: Fix Lenovo Yoga Tablet 2 Pro 1380 sdcard slot not working
    
    commit f3521d7cbaefff19cc656325787ed797e5f6a955 upstream.
    
    The Lenovo Yoga Tablet 2 Pro 1380 sdcard slot has an active high cd pin
    and a broken wp pin which always reports the card being write-protected.
    
    Add a DMI quirk to address both issues.
    
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-5-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci-acpi: Sort DMI quirks alphabetically [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Apr 10 21:16:36 2024 +0200

    mmc: sdhci-acpi: Sort DMI quirks alphabetically
    
    commit a92a73b1d9249d155412d8ac237142fa716803ea upstream.
    
    Sort the DMI quirks alphabetically.
    
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-4-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci: Add support for "Tuning Error" interrupts [+ + +]

Author: Adrian Hunter <adrian.hunter@intel.com>
Date:   Wed Apr 10 21:16:35 2024 +0200

    mmc: sdhci: Add support for "Tuning Error" interrupts
    
    commit b3855668d98cf9c6aec2db999dd27d872f8ba878 upstream.
    
    Most Bay Trail devices do not enable UHS modes for the external sdcard slot
    the Lenovo Yoga Tablet 2 830 / 1050 and Lenovo Yoga Tablet 2 Pro 1380 (8",
    10" and 13") models however do enable this.
    
    Using a UHS cards in these tablets results in errors like this one:
    
    [  225.272001] mmc2: Unexpected interrupt 0x04000000.
    [  225.272024] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
    [  225.272034] mmc2: sdhci: Sys addr:  0x0712c400 | Version:  0x0000b502
    [  225.272044] mmc2: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000007
    [  225.272054] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000023
    [  225.272064] mmc2: sdhci: Present:   0x01e20002 | Host ctl: 0x00000016
    [  225.272073] mmc2: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
    [  225.272082] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000107
    [  225.272092] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000001
    [  225.272101] mmc2: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b
    [  225.272110] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
    [  225.272119] mmc2: sdhci: Caps:      0x076864b2 | Caps_1:   0x00000004
    [  225.272129] mmc2: sdhci: Cmd:       0x00000c1b | Max curr: 0x00000000
    [  225.272138] mmc2: sdhci: Resp[0]:   0x00000c00 | Resp[1]:  0x00000000
    [  225.272147] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000900
    [  225.272155] mmc2: sdhci: Host ctl2: 0x0000000c
    [  225.272164] mmc2: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x0712c200
    [  225.272172] mmc2: sdhci: ============================================
    
    which results in IO errors leading to issues accessing the sdcard.
    
    0x04000000 is a so-called "Tuning Error" which sofar the SDHCI driver
    does not support / enable. Modify the IRQ handler to process these.
    
    This fixes UHS microsd cards not working with these tablets.
    
    Link: https://lore.kernel.org/r/199bb4aa-c6b5-453e-be37-58bbf468800c@intel.com
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240410191639.526324-3-hdegoede@redhat.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/9p: fix uninit-value in p9_client_rpc() [+ + +]

Author: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Date:   Mon Apr 8 07:10:39 2024 -0700

    net/9p: fix uninit-value in p9_client_rpc()
    
    commit 25460d6f39024cc3b8241b14c7ccf0d6f11a736a upstream.
    
    Syzbot with the help of KMSAN reported the following error:
    
    BUG: KMSAN: uninit-value in trace_9p_client_res include/trace/events/9p.h:146 [inline]
    BUG: KMSAN: uninit-value in p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
     trace_9p_client_res include/trace/events/9p.h:146 [inline]
     p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
     p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
     v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
     v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
     legacy_get_tree+0x114/0x290 fs/fs_context.c:662
     vfs_get_tree+0xa7/0x570 fs/super.c:1797
     do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
     path_mount+0x742/0x1f20 fs/namespace.c:3679
     do_mount fs/namespace.c:3692 [inline]
     __do_sys_mount fs/namespace.c:3898 [inline]
     __se_sys_mount+0x725/0x810 fs/namespace.c:3875
     __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was created at:
     __alloc_pages+0x9d6/0xe70 mm/page_alloc.c:4598
     __alloc_pages_node include/linux/gfp.h:238 [inline]
     alloc_pages_node include/linux/gfp.h:261 [inline]
     alloc_slab_page mm/slub.c:2175 [inline]
     allocate_slab mm/slub.c:2338 [inline]
     new_slab+0x2de/0x1400 mm/slub.c:2391
     ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
     __slab_alloc mm/slub.c:3610 [inline]
     __slab_alloc_node mm/slub.c:3663 [inline]
     slab_alloc_node mm/slub.c:3835 [inline]
     kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
     p9_tag_alloc net/9p/client.c:278 [inline]
     p9_client_prepare_req+0x20a/0x1770 net/9p/client.c:641
     p9_client_rpc+0x27e/0x1340 net/9p/client.c:688
     p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
     v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
     v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
     legacy_get_tree+0x114/0x290 fs/fs_context.c:662
     vfs_get_tree+0xa7/0x570 fs/super.c:1797
     do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
     path_mount+0x742/0x1f20 fs/namespace.c:3679
     do_mount fs/namespace.c:3692 [inline]
     __do_sys_mount fs/namespace.c:3898 [inline]
     __se_sys_mount+0x725/0x810 fs/namespace.c:3875
     __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    If p9_check_errors() fails early in p9_client_rpc(), req->rc.tag
    will not be properly initialized. However, trace_9p_client_res()
    ends up trying to print it out anyway before p9_client_rpc()
    finishes.
    
    Fix this issue by assigning default values to p9_fcall fields
    such as 'tag' and (just in case KMSAN unearths something new) 'id'
    during the tag allocation stage.
    
    Reported-and-tested-by: syzbot+ff14db38f56329ef68df@syzkaller.appspotmail.com
    Fixes: 348b59012e5c ("net/9p: Convert net/9p protocol dumps to tracepoints")
    Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
    Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
    Cc: stable@vger.kernel.org
    Message-ID: <20240408141039.30428-1-n.zhandarovich@fintech.ru>
    Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/ipv6: Fix route deleting failure when metric equals 0 [+ + +]

Author: xu xin <xu.xin16@zte.com.cn>
Date:   Tue May 14 20:11:02 2024 +0800

    net/ipv6: Fix route deleting failure when metric equals 0
    
    commit bb487272380d120295e955ad8acfcbb281b57642 upstream.
    
    Problem
    =========
    After commit 67f695134703 ("ipv6: Move setting default metric for routes"),
    we noticed that the logic of assigning the default value of fc_metirc
    changed in the ioctl process. That is, when users use ioctl(fd, SIOCADDRT,
    rt) with a non-zero metric to add a route,  then they may fail to delete a
    route with passing in a metric value of 0 to the kernel by ioctl(fd,
    SIOCDELRT, rt). But iproute can succeed in deleting it.
    
    As a reference, when using iproute tools by netlink to delete routes with
    a metric parameter equals 0, like the command as follows:
    
            ip -6 route del fe80::/64 via fe81::5054:ff:fe11:3451 dev eth0 metric 0
    
    the user can still succeed in deleting the route entry with the smallest
    metric.
    
    Root Reason
    ===========
    After commit 67f695134703 ("ipv6: Move setting default metric for routes"),
    When ioctl() pass in SIOCDELRT with a zero metric, rtmsg_to_fib6_config()
    will set a defalut value (1024) to cfg->fc_metric in kernel, and in
    ip6_route_del() and the line 4074 at net/ipv3/route.c, it will check by
    
            if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric)
                    continue;
    
    and the condition is true and skip the later procedure (deleting route)
    because cfg->fc_metric != rt->fib6_metric. But before that commit,
    cfg->fc_metric is still zero there, so the condition is false and it
    will do the following procedure (deleting).
    
    Solution
    ========
    In order to keep a consistent behaviour across netlink() and ioctl(), we
    should allow to delete a route with a metric value of 0. So we only do
    the default setting of fc_metric in route adding.
    
    CC: stable@vger.kernel.org # 5.4+
    Fixes: 67f695134703 ("ipv6: Move setting default metric for routes")
    Co-developed-by: Fan Yu <fan.yu9@zte.com.cn>
    Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
    Signed-off-by: xu xin <xu.xin16@zte.com.cn>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240514201102055dD2Ba45qKbLlUMxu_DTHP@zte.com.cn
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED [+ + +]

Author: Dmitry Safonov <0x7f454c46@gmail.com>
Date:   Wed May 29 18:29:32 2024 +0100

    net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED
    
    commit 33700a0c9b562700c28d31360a5f04508f459a45 upstream.
    
    TCP_CLOSE may or may not have current/rnext keys and should not be
    considered "established". The fast-path for TCP_CLOSE is
    SKB_DROP_REASON_TCP_CLOSE. This is what tcp_rcv_state_process() does
    anyways. Add an early drop path to not spend any time verifying
    segment signatures for sockets in TCP_CLOSE state.
    
    Cc: stable@vger.kernel.org # v6.7
    Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
    Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
    Link: https://lore.kernel.org/r/20240529-tcp_ao-sk_state-v1-1-d69b5d323c52@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS [+ + +]

Author: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date:   Thu Apr 25 16:24:29 2024 -0400

    NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
    
    commit f06d1b10cb016d5aaecdb1804fefca025387bd10 upstream.
    
    Olga showed me a case where the client was sending multiple READ_PLUS
    calls to the server in parallel, and the server replied
    NFS4ERR_OPNOTSUPP to each. The client would fall back to READ for the
    first reply, but fail to retry the other calls.
    
    I fix this by removing the test for NFS_CAP_READ_PLUS in
    nfs4_read_plus_not_supported(). This allows us to reschedule any
    READ_PLUS call that has a NFS4ERR_OPNOTSUPP return value, even after the
    capability has been cleared.
    
    Reported-by: Olga Kornievskaia <kolga@netapp.com>
    Fixes: c567552612ec ("NFS: Add READ_PLUS data segment support")
    Cc: stable@vger.kernel.org # v5.10+
    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfs: fix undefined behavior in nfs_block_bits() [+ + +]

Author: Sergey Shtylyov <s.shtylyov@omp.ru>
Date:   Fri May 10 23:24:04 2024 +0300

    nfs: fix undefined behavior in nfs_block_bits()
    
    commit 3c0a2e0b0ae661457c8505fecc7be5501aa7a715 upstream.
    
    Shifting *signed int* typed constant 1 left by 31 bits causes undefined
    behavior. Specify the correct *unsigned long* type by using 1UL instead.
    
    Found by Linux Verification Center (linuxtesting.org) with the Svace static
    analysis tool.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Tue Jun 4 22:42:55 2024 +0900

    nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
    
    commit 7373a51e7998b508af7136530f3a997b286ce81c upstream.
    
    The error handling in nilfs_empty_dir() when a directory folio/page read
    fails is incorrect, as in the old ext2 implementation, and if the
    folio/page cannot be read or nilfs_check_folio() fails, it will falsely
    determine the directory as empty and corrupt the file system.
    
    In addition, since nilfs_empty_dir() does not immediately return on a
    failed folio/page read, but continues to loop, this can cause a long loop
    with I/O if i_size of the directory's inode is also corrupted, causing the
    log writer thread to wait and hang, as reported by syzbot.
    
    Fix these issues by making nilfs_empty_dir() immediately return a false
    value (0) if it fails to get a directory folio/page.
    
    Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
    Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
    Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix potential kernel bug due to lack of writeback flag waiting [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Thu May 30 23:15:56 2024 +0900

    nilfs2: fix potential kernel bug due to lack of writeback flag waiting
    
    commit a4ca369ca221bb7e06c725792ac107f0e48e82e7 upstream.
    
    Destructive writes to a block device on which nilfs2 is mounted can cause
    a kernel bug in the folio/page writeback start routine or writeback end
    routine (__folio_start_writeback in the log below):
    
     kernel BUG at mm/page-writeback.c:3070!
     Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
     ...
     RIP: 0010:__folio_start_writeback+0xbaa/0x10e0
     Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff
      e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f>
      0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00
     ...
     Call Trace:
      <TASK>
      nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2]
      nilfs_segctor_construct+0x181/0x6b0 [nilfs2]
      nilfs_segctor_thread+0x548/0x11c0 [nilfs2]
      kthread+0x2f0/0x390
      ret_from_fork+0x4b/0x80
      ret_from_fork_asm+0x1a/0x30
      </TASK>
    
    This is because when the log writer starts a writeback for segment summary
    blocks or a super root block that use the backing device's page cache, it
    does not wait for the ongoing folio/page writeback, resulting in an
    inconsistent writeback state.
    
    Fix this issue by waiting for ongoing writebacks when putting
    folios/pages on the backing device into writeback state.
    
    Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com
    Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA [+ + +]

Author: Helge Deller <deller@gmx.de>
Date:   Wed May 15 14:53:25 2024 +0200

    parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
    
    commit d4a599910193b85f76c100e30d8551c8794f8c2a upstream.
    
    Define the HAVE_ARCH_HUGETLB_UNMAPPED_AREA macro like other platforms do in
    their page.h files to avoid this compile warning:
    arch/parisc/mm/hugetlbpage.c:25:1: warning: no previous prototype for 'hugetlb_get_unmapped_area' [-Wmissing-prototypes]
    
    Signed-off-by: Helge Deller <deller@gmx.de>
    Cc: stable@vger.kernel.org  # 6.0+
    Reported-by: John David Anglin <dave.anglin@bell.net>
    Tested-by: John David Anglin <dave.anglin@bell.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

parisc: Define sigset_t in parisc uapi header [+ + +]

Author: Helge Deller <deller@kernel.org>
Date:   Sat Apr 27 19:43:51 2024 +0200

    parisc: Define sigset_t in parisc uapi header
    
    commit 487fa28fa8b60417642ac58e8beda6e2509d18f9 upstream.
    
    The util-linux debian package fails to build on parisc, because
    sigset_t isn't defined in asm/signal.h when included from userspace.
    Move the sigset_t type from internal header to the uapi header to fix the
    build.
    
    Link: https://buildd.debian.org/status/fetch.php?pkg=util-linux&arch=hppa&ver=2.40-7&stamp=1714163443&raw=0
    Signed-off-by: Helge Deller <deller@gmx.de>
    Cc: stable@vger.kernel.org # v6.0+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/chrome: cros_ec: Handle events during suspend after resume completion [+ + +]

Author: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Date:   Mon Apr 29 12:13:45 2024 -0600

    platform/chrome: cros_ec: Handle events during suspend after resume completion
    
    commit 2fbe479c0024e1c6b992184a799055e19932aa48 upstream.
    
    Commit 47ea0ddb1f56 ("platform/chrome: cros_ec_lpc: Separate host
    command and irq disable") re-ordered the resume sequence. Before that
    change, cros_ec resume sequence is:
    1) Enable IRQ
    2) Send resume event
    3) Handle events during suspend
    
    After commit 47ea0ddb1f56 ("platform/chrome: cros_ec_lpc: Separate host
    command and irq disable"), cros_ec resume sequence is:
    1) Enable IRQ
    2) Handle events during suspend
    3) Send resume event.
    
    This re-ordering leads to delayed handling of any events queued between
    items 2) and 3) with the updated sequence. Also in certain platforms, EC
    skips triggering interrupt for certain events eg. mkbp events until the
    resume event is received. Such events are stuck in the host event queue
    indefinitely. This change puts back the original order to avoid any
    delay in handling the pending events.
    
    Fixes: 47ea0ddb1f56 ("platform/chrome: cros_ec_lpc: Separate host command and irq disable")
    Cc: <stable@vger.kernel.org>
    Cc: Lalith Rajendran <lalithkraj@chromium.org>
    Cc: <chrome-platform@lists.linux.dev>
    Signed-off-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
    Link: https://lore.kernel.org/r/20240429121343.v2.1.If2e0cef959f1f6df9f4d1ab53a97c54aa54208af@changeid
    Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/64/bpf: fix tail calls for PCREL addressing [+ + +]

Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Thu May 2 23:02:04 2024 +0530

    powerpc/64/bpf: fix tail calls for PCREL addressing
    
    commit 2ecfe59cd7de1f202e9af2516a61fbbf93d0bd4d upstream.
    
    With PCREL addressing, there is no kernel TOC. So, it is not setup in
    prologue when PCREL addressing is used. But the number of instructions
    to skip on a tail call was not adjusted accordingly. That resulted in
    not so obvious failures while using tailcalls. 'tailcalls' selftest
    crashed the system with the below call trace:
    
      bpf_test_run+0xe8/0x3cc (unreliable)
      bpf_prog_test_run_skb+0x348/0x778
      __sys_bpf+0xb04/0x2b00
      sys_bpf+0x28/0x38
      system_call_exception+0x168/0x340
      system_call_vectored_common+0x15c/0x2ec
    
    Also, as bpf programs are always module addresses and a bpf helper in
    general is a core kernel text address, using PC relative addressing
    often fails with "out of range of pcrel address" error. Switch to
    using kernel base for relative addressing to handle this better.
    
    Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing")
    Cc: stable@vger.kernel.org # v6.4+
    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240502173205.142794-1-hbathini@linux.ibm.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH [+ + +]

Author: Puranjay Mohan <puranjay@kernel.org>
Date:   Mon May 13 10:02:48 2024 +0000

    powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
    
    commit b1e7cee96127468c2483cf10c2899c9b5cf79bf8 upstream.
    
    The Linux Kernel Memory Model [1][2] requires RMW operations that have a
    return value to be fully ordered.
    
    BPF atomic operations with BPF_FETCH (including BPF_XCHG and
    BPF_CMPXCHG) return a value back so they need to be JITed to fully
    ordered operations. POWERPC currently emits relaxed operations for
    these.
    
    We can show this by running the following litmus-test:
    
      PPC SB+atomic_add+fetch
    
      {
          0:r0=x;  (* dst reg assuming offset is 0 *)
          0:r1=2;  (* src reg *)
          0:r2=1;
          0:r4=y;  (* P0 writes to this, P1 reads this *)
          0:r5=z;  (* P1 writes to this, P0 reads this *)
          0:r6=0;
    
          1:r2=1;
          1:r4=y;
          1:r5=z;
      }
    
      P0                      | P1            ;
      stw         r2, 0(r4)   | stw  r2,0(r5) ;
                              |               ;
      loop:lwarx  r3, r6, r0  |               ;
      mr          r8, r3      |               ;
      add         r3, r3, r1  | sync          ;
      stwcx.      r3, r6, r0  |               ;
      bne         loop        |               ;
      mr          r1, r8      |               ;
                              |               ;
      lwa         r7, 0(r5)   | lwa  r7,0(r4) ;
    
      ~exists(0:r7=0 /\ 1:r7=0)
    
      Witnesses
      Positive: 9 Negative: 3
      Condition ~exists (0:r7=0 /\ 1:r7=0)
      Observation SB+atomic_add+fetch Sometimes 3 9
    
    This test shows that the older store in P0 is reordered with a newer
    load to a different address. Although there is a RMW operation with
    fetch between them. Adding a sync before and after RMW fixes the issue:
    
      Witnesses
      Positive: 9 Negative: 0
      Condition ~exists (0:r7=0 /\ 1:r7=0)
      Observation SB+atomic_add+fetch Never 0 9
    
    [1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
    [2] https://www.kernel.org/doc/Documentation/atomic_t.txt
    
    Fixes: aea7ef8a82c0 ("powerpc/bpf/32: add support for BPF_ATOMIC bitwise operations")
    Fixes: 2d9206b22743 ("powerpc/bpf/32: Add instructions for atomic_[cmp]xchg")
    Fixes: dbe6e2456fb0 ("powerpc/bpf/64: add support for atomic fetch operations")
    Fixes: 1e82dfaa7819 ("powerpc/bpf/64: Add instructions for atomic_[cmp]xchg")
    Cc: stable@vger.kernel.org # v6.0+
    Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Naveen N Rao <naveen@kernel.org>
    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240513100248.110535-1-puranjay@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation [+ + +]

Author: Tyler Hicks (Microsoft) <code@tyhicks.com>
Date:   Tue Apr 30 19:56:46 2024 -0500

    proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
    
    commit 0a960ba49869ebe8ff859d000351504dd6b93b68 upstream.
    
    The following commits loosened the permissions of /proc/<PID>/fdinfo/
    directory, as well as the files within it, from 0500 to 0555 while also
    introducing a PTRACE_MODE_READ check between the current task and
    <PID>'s task:
    
     - commit 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
     - commit 1927e498aee1 ("procfs: prevent unprivileged processes accessing fdinfo dir")
    
    Before those changes, inode based system calls like inotify_add_watch(2)
    would fail when the current task didn't have sufficient read permissions:
    
     [...]
     lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
     inotify_add_watch(64, "/proc/1/task/1/fdinfo",
                       IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|
                       IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied)
     [...]
    
    This matches the documented behavior in the inotify_add_watch(2) man
    page:
    
     ERRORS
           EACCES Read access to the given file is not permitted.
    
    After those changes, inotify_add_watch(2) started succeeding despite the
    current task not having PTRACE_MODE_READ privileges on the target task:
    
     [...]
     lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
     inotify_add_watch(64, "/proc/1/task/1/fdinfo",
                       IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|
                       IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = 1757
     openat(AT_FDCWD, "/proc/1/task/1/fdinfo",
            O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 EACCES (Permission denied)
     [...]
    
    This change in behavior broke .NET prior to v7. See the github link
    below for the v7 commit that inadvertently/quietly (?) fixed .NET after
    the kernel changes mentioned above.
    
    Return to the old behavior by moving the PTRACE_MODE_READ check out of
    the file .open operation and into the inode .permission operation:
    
     [...]
     lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
     inotify_add_watch(64, "/proc/1/task/1/fdinfo",
                       IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|
                       IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied)
     [...]
    
    Reported-by: Kevin Parsons (Microsoft) <parsonskev@gmail.com>
    Link: https://github.com/dotnet/runtime/commit/89e5469ac591b82d38510fe7de98346cce74ad4f
    Link: https://stackoverflow.com/questions/75379065/start-self-contained-net6-build-exe-as-service-on-raspbian-system-unauthorizeda
    Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
    Cc: stable@vger.kernel.org
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kalesh Singh <kaleshsingh@google.com>
    Cc: Hardik Garg <hargar@linux.microsoft.com>
    Cc: Allen Pais <apais@linux.microsoft.com>
    Signed-off-by: Tyler Hicks (Microsoft) <code@tyhicks.com>
    Link: https://lore.kernel.org/r/20240501005646.745089-1-code@tyhicks.com
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices" [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon May 20 14:41:31 2024 -0400

    Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
    
    commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.
    
    This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.
    
    Revert this commit as apparently the LLVM code to take advantage of
    this never landed.
    
    Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: Feifei Xu <feifei.xu@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event" [+ + +]

Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Tue Jun 4 11:00:22 2024 -0300

    Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"
    
    commit 5b3cde198878b2f3269d5e7efbc0d514899b1fd8 upstream.
    
    This reverts commit 7d1405c71df21f6c394b8a885aa8a133f749fa22.
    
    This causes segfaults in some cases, as reported by Milian:
    
      ```
      sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e
      raw_syscalls:sys_enter ls
      ...
      [ perf record: Woken up 3 times to write data ]
      malloc(): invalid next size (unsorted)
      Aborted
      ```
    
      Backtrace with GDB + debuginfod:
    
      ```
      malloc(): invalid next size (unsorted)
    
      Thread 1 "perf" received signal SIGABRT, Aborted.
      __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
      no_tid=no_tid@entry=0) at pthread_kill.c:44
      Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
      44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO
      (ret) : 0;
      (gdb) bt
      #0  __pthread_kill_implementation (threadid=<optimized out>,
      signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
      #1  0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>,
      signo=6) at pthread_kill.c:78
      #2  0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/
      raise.c:26
      #3  0x00007ffff6e384c3 in __GI_abort () at abort.c:79
      #4  0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea
      "%s\n") at ../sysdeps/posix/libc_fatal.c:132
      #5  0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850
      "malloc(): invalid next size (unsorted)") at malloc.c:5772
      #6  0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0
      <main_arena>, bytes=bytes@entry=368) at malloc.c:4081
      #7  0x00007ffff6eb877e in __libc_calloc (n=<optimized out>,
      elem_size=<optimized out>) at malloc.c:3754
      #8  0x000055555569bdb6 in perf_session.do_write_header ()
      #9  0x00005555555a373a in __cmd_record.constprop.0 ()
      #10 0x00005555555a6846 in cmd_record ()
      #11 0x000055555564db7f in run_builtin ()
      #12 0x000055555558ed77 in main ()
      ```
    
      Valgrind memcheck:
      ```
      ==45136== Invalid write of size 8
      ==45136==    at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf)
      ==45136==    by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf)
      ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
      ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
      ==45136==    by 0x142D76: main (in /usr/bin/perf)
      ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
      ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
      ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
      ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
      ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
      ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
      ==45136==    by 0x142D76: main (in /usr/bin/perf)
      ==45136==
      ==45136== Syscall param write(buf) points to unaddressable byte(s)
      ==45136==    at 0x575953D: __libc_write (write.c:26)
      ==45136==    by 0x575953D: write (write.c:24)
      ==45136==    by 0x35761F: ion (in /usr/bin/perf)
      ==45136==    by 0x357778: writen (in /usr/bin/perf)
      ==45136==    by 0x1548F7: record__write (in /usr/bin/perf)
      ==45136==    by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf)
      ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
      ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
      ==45136==    by 0x142D76: main (in /usr/bin/perf)
      ==45136==  Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd
      ==45136==    at 0x4849BF3: calloc (vg_replace_malloc.c:1675)
      ==45136==    by 0x3574AB: zalloc (in /usr/bin/perf)
      ==45136==    by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf)
      ==45136==    by 0x15A845: cmd_record (in /usr/bin/perf)
      ==45136==    by 0x201B7E: run_builtin (in /usr/bin/perf)
      ==45136==    by 0x142D76: main (in /usr/bin/perf)
      ==45136==
     -----
    
    Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/
    Reported-by: Milian Wolff <milian.wolff@kdab.com>
    Tested-by: Milian Wolff <milian.wolff@kdab.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: stable@kernel.org # 6.8+
    Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "xsk: Document ability to redirect to any socket bound to the same umem" [+ + +]

Author: Magnus Karlsson <magnus.karlsson@intel.com>
Date:   Tue Jun 4 14:29:26 2024 +0200

    Revert "xsk: Document ability to redirect to any socket bound to the same umem"
    
    commit 03e38d315f3c5258270ad50f2ae784b6372e87c3 upstream.
    
    This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f.
    
    Reported-by: Yuval El-Hanany <YuvalE@radware.com>
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com
    Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "xsk: Support redirect to any socket bound to the same umem" [+ + +]

Author: Magnus Karlsson <magnus.karlsson@intel.com>
Date:   Tue Jun 4 14:29:25 2024 +0200

    Revert "xsk: Support redirect to any socket bound to the same umem"
    
    commit 7fcf26b315bbb728036da0862de6b335da83dff2 upstream.
    
    This reverts commit 2863d665ea41282379f108e4da6c8a2366ba66db.
    
    This patch introduced a potential kernel crash when multiple napi instances
    redirect to the same AF_XDP socket. By removing the queue_index check, it is
    possible for multiple napi instances to access the Rx ring at the same time,
    which will result in a corrupted ring state which can lead to a crash when
    flushing the rings in __xsk_flush(). This can happen when the linked list of
    sockets to flush gets corrupted by concurrent accesses. A quick and small fix
    is not possible, so let us revert this for now.
    
    Reported-by: Yuval El-Hanany <YuvalE@radware.com>
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com
    Link: https://lore.kernel.org/bpf/20240604122927.29080-2-magnus.karlsson@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: dts: starfive: Remove PMIC interrupt info for Visionfive 2 board [+ + +]

Author: Shengyu Qu <wiagn233@outlook.com>
Date:   Thu Mar 7 20:21:12 2024 +0800

    riscv: dts: starfive: Remove PMIC interrupt info for Visionfive 2 board
    
    commit 0f74c64f0a9f6e1e7cf17bea3d4350fa6581e0d7 upstream.
    
    Interrupt line number of the AXP15060 PMIC is not a necessary part of
    its device tree. Originally the binding required one, so the dts patch
    added an invalid interrupt that the driver ignored (0) as the interrupt
    line of the PMIC is not actually connected on this platform. This went
    unnoticed during review as it would have been a valid interrupt for a
    GPIO controller, but it is not for the PLIC. The PLIC, on this platform
    at least, silently ignores the enablement of interrupt 0. Bo Gan is
    running a modified version of OpenSBI that faults if writes are done to
    reserved fields, so their kernel runs into problems.
    
    Delete the invalid interrupt from the device tree.
    
    Cc: stable@vger.kernel.org
    Reported-by: Bo Gan <ganboing@gmail.com>
    Link: https://lore.kernel.org/all/c8b6e960-2459-130f-e4e4-7c9c2ebaa6d3@gmail.com/
    Signed-off-by: Shengyu Qu <wiagn233@outlook.com>
    Fixes: 2378341504de ("riscv: dts: starfive: Enable axp15060 pmic for cpufreq")
    [conor: rewrite the commit message to add more detail]
    Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel [+ + +]

Author: Nam Cao <namcao@linutronix.de>
Date:   Sun May 26 13:01:04 2024 +0200

    riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel
    
    commit 7bed51617401dab2be930b13ed5aacf581f7c8ef upstream.
    
    HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on
    !XIP_KERNEL.
    
    This also fixes a boot problem for XIP kernel introduced by the commit in
    "Fixes:". This commit used huge page mapping for vmemmap, but huge page
    vmap was not enabled for XIP kernel.
    
    Fixes: ff172d4818ad ("riscv: Use hugepage mappings for vmemmap")
    Signed-off-by: Nam Cao <namcao@linutronix.de>
    Cc: <stable@vger.kernel.org>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20240526110104.470429-1-namcao@linutronix.de
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtla/timerlat: Fix histogram report when a cpu count is 0 [+ + +]

Author: John Kacur <jkacur@redhat.com>
Date:   Fri May 10 15:03:18 2024 -0400

    rtla/timerlat: Fix histogram report when a cpu count is 0
    
    commit 01b05fc0e5f3aec443a9a8ffa0022cbca2fd3608 upstream.
    
    On short runs it is possible to get no samples on a cpu, like this:
    
      # rtla timerlat hist -u -T50
    
      Index   IRQ-001   Thr-001   Usr-001   IRQ-002   Thr-002   Usr-002
      2             1         0         0         0         0         0
      33            0         1         0         0         0         0
      36            0         0         1         0         0         0
      49            0         0         0         1         0         0
      52            0         0         0         0         1         0
      over:         0         0         0         0         0         0
      count:        1         1         1         1         1         0
      min:          2        33        36        49        52 18446744073709551615
      avg:          2        33        36        49        52         -
      max:          2        33        36        49        52         0
      rtla timerlat hit stop tracing
        IRQ handler delay:          (exit from idle)            48.21 us (91.09 %)
        IRQ latency:                                                    49.11 us
        Timerlat IRQ duration:                                   2.17 us (4.09 %)
        Blocking thread:                                         1.01 us (1.90 %)
                           swapper/2:0                           1.01 us
      ------------------------------------------------------------------------
        Thread latency:                                         52.93 us (100%)
    
      Max timerlat IRQ latency from idle: 49.11 us in cpu 2
    
    Note, the value 18446744073709551615 is the same as ~0.
    
    Fix this by reporting no results for the min, avg and max if the count
    is 0.
    
    Link: https://lkml.kernel.org/r/20240510190318.44295-1-jkacur@redhat.com
    
    Cc: stable@vger.kernel.org
    Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
    Suggested-by: Daniel Bristot de Oliveria <bristot@kernel.org>
    Signed-off-by: John Kacur <jkacur@redhat.com>
    Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/ap: Fix crash in AP internal function modify_bitmap() [+ + +]

Author: Harald Freudenberger <freude@linux.ibm.com>
Date:   Mon May 13 14:49:13 2024 +0200

    s390/ap: Fix crash in AP internal function modify_bitmap()
    
    commit d4f9d5a99a3fd1b1c691b7a1a6f8f3f25f4116c9 upstream.
    
    A system crash like this
    
      Failing address: 200000cb7df6f000 TEID: 200000cb7df6f403
      Fault in home space mode while using kernel ASCE.
      AS:00000002d71bc007 R3:00000003fe5b8007 S:000000011a446000 P:000000015660c13d
      Oops: 0038 ilc:3 [#1] PREEMPT SMP
      Modules linked in: mlx5_ib ...
      CPU: 8 PID: 7556 Comm: bash Not tainted 6.9.0-rc7 #8
      Hardware name: IBM 3931 A01 704 (LPAR)
      Krnl PSW : 0704e00180000000 0000014b75e7b606 (ap_parse_bitmap_str+0x10e/0x1f8)
      R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000001 ffffffffffffffc0 0000000000000001 00000048f96b75d3
      000000cb00000100 ffffffffffffffff ffffffffffffffff 000000cb7df6fce0
      000000cb7df6fce0 00000000ffffffff 000000000000002b 00000048ffffffff
      000003ff9b2dbc80 200000cb7df6fcd8 0000014bffffffc0 000000cb7df6fbc8
      Krnl Code: 0000014b75e7b5fc: a7840047            brc     8,0000014b75e7b68a
      0000014b75e7b600: 18b2                lr      %r11,%r2
      #0000014b75e7b602: a7f4000a            brc     15,0000014b75e7b616
      >0000014b75e7b606: eb22d00000e6        laog    %r2,%r2,0(%r13)
      0000014b75e7b60c: a7680001            lhi     %r6,1
      0000014b75e7b610: 187b                lr      %r7,%r11
      0000014b75e7b612: 84960021            brxh    %r9,%r6,0000014b75e7b654
      0000014b75e7b616: 18e9                lr      %r14,%r9
      Call Trace:
      [<0000014b75e7b606>] ap_parse_bitmap_str+0x10e/0x1f8
      ([<0000014b75e7b5dc>] ap_parse_bitmap_str+0xe4/0x1f8)
      [<0000014b75e7b758>] apmask_store+0x68/0x140
      [<0000014b75679196>] kernfs_fop_write_iter+0x14e/0x1e8
      [<0000014b75598524>] vfs_write+0x1b4/0x448
      [<0000014b7559894c>] ksys_write+0x74/0x100
      [<0000014b7618a440>] __do_syscall+0x268/0x328
      [<0000014b761a3558>] system_call+0x70/0x98
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
      [<0000014b75e7b636>] ap_parse_bitmap_str+0x13e/0x1f8
      Kernel panic - not syncing: Fatal exception: panic_on_oops
    
    occured when /sys/bus/ap/a[pq]mask was updated with a relative mask value
    (like +0x10-0x12,+60,-90) with one of the numeric values exceeding INT_MAX.
    
    The fix is simple: use unsigned long values for the internal variables. The
    correct checks are already in place in the function but a simple int for
    the internal variables was used with the possibility to overflow.
    
    Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
    Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
    Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
    Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/cpacf: Make use of invalid opcode produce a link error [+ + +]

Author: Harald Freudenberger <freude@linux.ibm.com>
Date:   Tue May 14 10:09:32 2024 +0200

    s390/cpacf: Make use of invalid opcode produce a link error
    
    commit 32e8bd6423fc127d2b37bdcf804fd76af3bbec79 upstream.
    
    Instead of calling BUG() at runtime introduce and use a prototype for a
    non-existing function to produce a link error during compile when a not
    supported opcode is used with the __cpacf_query() or __cpacf_check_opcode()
    inline functions.
    
    Suggested-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
    Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
    Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/cpacf: Split and rework cpacf query functions [+ + +]

Author: Harald Freudenberger <freude@linux.ibm.com>
Date:   Fri May 3 11:31:42 2024 +0200

    s390/cpacf: Split and rework cpacf query functions
    
    commit 830999bd7e72f4128b9dfa37090d9fa8120ce323 upstream.
    
    Rework the cpacf query functions to use the correct RRE
    or RRF instruction formats and set register fields within
    instructions correctly.
    
    Fixes: 1afd43e0fbba ("s390/crypto: allow to query all known cpacf functions")
    Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
    Suggested-by: Heiko Carstens <hca@linux.ibm.com>
    Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
    Suggested-by: Holger Dengler <dengler@linux.ibm.com>
    Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
    Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
    Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: core: Handle devices which return an unusually large VPD page count [+ + +]

Author: Martin K. Petersen <martin.petersen@oracle.com>
Date:   Mon May 20 22:30:40 2024 -0400

    scsi: core: Handle devices which return an unusually large VPD page count
    
    commit d09c05aa35909adb7d29f92f0cd79fdcd1338ef0 upstream.
    
    Peter Schneider reported that a system would no longer boot after
    updating to 6.8.4.  Peter bisected the issue and identified commit
    b5fc07a5fb56 ("scsi: core: Consult supported VPD page list prior to
    fetching page") as being the culprit.
    
    Turns out the enclosure device in Peter's system reports a byteswapped
    page length for VPD page 0. It reports "02 00" as page length instead
    of "00 02". This causes us to attempt to access 516 bytes (page length
    + header) of information despite only 2 pages being present.
    
    Limit the page search scope to the size of our VPD buffer to guard
    against devices returning a larger page count than requested.
    
    Link: https://lore.kernel.org/r/20240521023040.2703884-1-martin.petersen@oracle.com
    Fixes: b5fc07a5fb56 ("scsi: core: Consult supported VPD page list prior to fetching page")
    Cc: stable@vger.kernel.org
    Reported-by: Peter Schneider <pschneider1968@googlemail.com>
    Closes: https://lore.kernel.org/all/eec6ebbf-061b-4a7b-96dc-ea748aa4d035@googlemail.com/
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/mm: compaction_test: fix bogus test success on Aarch64 [+ + +]

Author: Dev Jain <dev.jain@arm.com>
Date:   Tue May 21 13:13:56 2024 +0530

    selftests/mm: compaction_test: fix bogus test success on Aarch64
    
    commit d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 upstream.
    
    Patch series "Fixes for compaction_test", v2.
    
    The compaction_test memory selftest introduces fragmentation in memory
    and then tries to allocate as many hugepages as possible. This series
    addresses some problems.
    
    On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
    compaction_index becomes 0, which is less than 3, due to no division by
    zero exception being raised. We fix that by checking for division by
    zero.
    
    Secondly, correctly set the number of hugepages to zero before trying
    to set a large number of them.
    
    Now, consider a situation in which, at the start of the test, a non-zero
    number of hugepages have been already set (while running the entire
    selftests/mm suite, or manually by the admin). The test operates on 80%
    of memory to avoid OOM-killer invocation, and because some memory is
    already blocked by hugepages, it would increase the chance of OOM-killing.
    Also, since mem_free used in check_compaction() is the value before we
    set nr_hugepages to zero, the chance that the compaction_index will
    be small is very high if the preset nr_hugepages was high, leading to a
    bogus test success.
    
    
    This patch (of 3):
    
    Currently, if at runtime we are not able to allocate a huge page, the test
    will trivially pass on Aarch64 due to no exception being raised on
    division by zero while computing compaction_index.  Fix that by checking
    for nr_hugepages == 0.  Anyways, in general, avoid a division by zero by
    exiting the program beforehand.  While at it, fix a typo, and handle the
    case where the number of hugepages may overflow an integer.
    
    Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
    Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
    Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
    Signed-off-by: Dev Jain <dev.jain@arm.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Sri Jayaramappa <sjayaram@akamai.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages [+ + +]

Author: Dev Jain <dev.jain@arm.com>
Date:   Tue May 21 13:13:57 2024 +0530

    selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
    
    commit 9ad665ef55eaad1ead1406a58a34f615a7c18b5e upstream.
    
    Currently, the test tries to set nr_hugepages to zero, but that is not
    actually done because the file offset is not reset after read().  Fix that
    using lseek().
    
    Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
    Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
    Signed-off-by: Dev Jain <dev.jain@arm.com>
    Cc: <stable@vger.kernel.org>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Sri Jayaramappa <sjayaram@akamai.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/mm: fix build warnings on ppc64 [+ + +]

Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Tue May 21 13:02:19 2024 +1000

    selftests/mm: fix build warnings on ppc64
    
    commit 1901472fa880e5706f90926cd85a268d2d16bf84 upstream.
    
    Fix warnings like:
    
      In file included from uffd-unit-tests.c:8:
      uffd-unit-tests.c: In function `uffd_poison_handle_fault':
      uffd-common.h:45:33: warning: format `%llu' expects argument of type
      `long long unsigned int', but argument 3 has type `__u64' {aka `long
      unsigned int'} [-Wformat=]
    
    By switching to unsigned long long for u64 for ppc64 builds.
    
    Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Shuah Khan <skhan@linuxfoundation.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: net: lib: avoid error removing empty netns name [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jun 5 11:21:17 2024 +0200

    selftests: net: lib: avoid error removing empty netns name
    
    commit 79322174bcc780b99795cb89d237b26006a8b94b upstream.
    
    If there is an error to create the first netns with 'setup_ns()',
    'cleanup_ns()' will be called with an empty string as first parameter.
    
    The consequences is that 'cleanup_ns()' will try to delete an invalid
    netns, and wait 20 seconds if the netns list is empty.
    
    Instead of just checking if the name is not empty, convert the string
    separated by spaces to an array. Manipulating the array is cleaner, and
    calling 'cleanup_ns()' with an empty array will be a no-op.
    
    Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
    Cc: stable@vger.kernel.org
    Acked-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: net: lib: support errexit with busywait [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Wed Jun 5 11:21:16 2024 +0200

    selftests: net: lib: support errexit with busywait
    
    commit 41b02ea4c0adfcc6761fbfed42c3ce6b6412d881 upstream.
    
    If errexit is enabled ('set -e'), loopy_wait -- or busywait and others
    using it -- will stop after the first failure.
    
    Note that if the returned status of loopy_wait is checked, and even if
    errexit is enabled, Bash will not stop at the first error.
    
    Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
    Cc: stable@vger.kernel.org
    Acked-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix deadlock in smb2_find_smb_tcon() [+ + +]

Author: Enzo Matsumiya <ematsumiya@suse.de>
Date:   Thu Jun 6 13:13:13 2024 -0300

    smb: client: fix deadlock in smb2_find_smb_tcon()
    
    commit 02c418774f76a0a36a6195c9dbf8971eb4130a15 upstream.
    
    Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such
    deadlock.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
    Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
    Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request [+ + +]

Author: Maulik Shah <quic_mkshah@quicinc.com>
Date:   Thu Feb 15 10:55:44 2024 +0530

    soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request
    
    commit f592cc5794747b81e53b53dd6e80219ee25f0611 upstream.
    
    Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte aligned
    addresses associated with it. These control voltage, enable state, mode,
    and in legacy targets, voltage headroom. The current in-flight request
    checking logic looks for exact address matches. Requests for different
    addresses of the same RPMh resource as thus not detected as in-flight.
    
    Add new cmd-db API cmd_db_match_resource_addr() to enhance the in-flight
    request check for VRM requests by ignoring the address offset.
    
    This ensures that only one request is allowed to be in-flight for a given
    VRM resource. This is needed to avoid scenarios where request commands are
    carried out by RPMh hardware out-of-order leading to LDO regulator
    over-current protection triggering.
    
    Fixes: 658628e7ef78 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs")
    Cc: stable@vger.kernel.org
    Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Tested-by: Elliot Berman <quic_eberman@quicinc.com> # sm8650-qrd
    Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
    Link: https://lore.kernel.org/r/20240215-rpmh-rsc-fixes-v4-1-9cbddfcba05b@quicinc.com
    Signed-off-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sparc64: Fix number of online CPUs [+ + +]

Author: Sam Ravnborg <sam@ravnborg.org>
Date:   Sat Mar 30 10:57:45 2024 +0100

    sparc64: Fix number of online CPUs
    
    commit 98937707fea8375e8acea0aaa0b68a956dd52719 upstream.
    
    Nick Bowler reported:
        When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
        CPUs, I noticed that only CPU 0 comes up, while older kernels (including
        4.7) are working fine with both CPUs.
    
          I bisected the failure to this commit:
    
          9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
          commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
          Author: Atish Patra <atish.patra@oracle.com>
          Date:   Thu Sep 15 14:54:40 2016 -0600
    
          sparc64: Fix cpu_possible_mask if nr_cpus is set
    
        This is a small change that reverts very easily on top of 5.18: there is
        just one trivial conflict.  Once reverted, both CPUs work again.
    
        Maybe this is related to the fact that the CPUs on this system are
        numbered CPU0 and CPU2 (there is no CPU1)?
    
    The current code that adjust cpu_possible based on nr_cpu_ids do not
    take into account that CPU's may not come one after each other.
    Move the chech to the function that setup the cpu_possible mask
    so there is no need to adjust it later.
    
    Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
    Fixes: 9b2f753ec237 ("sparc64: Fix cpu_possible_mask if nr_cpus is set")
    Reported-by: Nick Bowler <nbowler@draconx.ca>
    Tested-by: Nick Bowler <nbowler@draconx.ca>
    Link: https://lore.kernel.org/sparclinux/20201009161924.c8f031c079dd852941307870@gmx.de/
    Link: https://lore.kernel.org/all/CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com/
    Cc: stable@vger.kernel.org # v4.8+
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Atish Patra <atish.patra@oracle.com>
    Cc: Bob Picco <bob.picco@oracle.com>
    Cc: Vijay Kumar <vijay.ac.kumar@oracle.com>
    Cc: David S. Miller <davem@davemloft.net>
    Reviewed-by: Andreas Larsson <andreas@gaisler.com>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Link: https://lore.kernel.org/r/20240330-sparc64-warnings-v1-9-37201023ee2f@ravnborg.org
    Signed-off-by: Andreas Larsson <andreas@gaisler.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sparc: move struct termio to asm/termios.h [+ + +]

Author: Mike Gilbert <floppym@gentoo.org>
Date:   Wed Mar 6 12:11:47 2024 -0500

    sparc: move struct termio to asm/termios.h
    
    commit c32d18e7942d7589b62e301eb426b32623366565 upstream.
    
    Every other arch declares struct termio in asm/termios.h, so make sparc
    match them.
    
    Resolves a build failure in the PPP software package, which includes
    both bits/ioctl-types.h via sys/ioctl.h (glibc) and asm/termbits.h.
    
    Closes: https://bugs.gentoo.org/918992
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>
    Cc: stable@vger.kernel.org
    Reviewed-by: Andreas Larsson <andreas@gaisler.com>
    Tested-by: Andreas Larsson <andreas@gaisler.com>
    Link: https://lore.kernel.org/r/20240306171149.3843481-1-floppym@gentoo.org
    Signed-off-by: Andreas Larsson <andreas@gaisler.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal/drivers/qcom/lmh: Check for SCM availability at probe [+ + +]

Author: Konrad Dybcio <konrad.dybcio@linaro.org>
Date:   Sat Mar 9 14:15:03 2024 +0100

    thermal/drivers/qcom/lmh: Check for SCM availability at probe
    
    commit d9d3490c48df572edefc0b64655259eefdcbb9be upstream.
    
    Up until now, the necessary scm availability check has not been
    performed, leading to possible null pointer dereferences (which did
    happen for me on RB1).
    
    Fix that.
    
    Fixes: 53bca371cdf7 ("thermal/drivers/qcom: Add support for LMh driver")
    Cc: <stable@vger.kernel.org>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
    Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Link: https://lore.kernel.org/r/20240308-topic-rb1_lmh-v2-2-bac3914b0fe3@linaro.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tpm_tis: Do *not* flush uninitialized work [+ + +]

Author: Jan Beulich <jbeulich@suse.com>
Date:   Wed May 29 15:23:25 2024 +0300

    tpm_tis: Do *not* flush uninitialized work
    
    commit 0ea00e249ca992adee54dc71a526ee70ef109e40 upstream.
    
    tpm_tis_core_init() may fail before tpm_tis_probe_irq_single() is
    called, in which case tpm_tis_remove() unconditionally calling
    flush_work() is triggering a warning for .func still being NULL.
    
    Cc: stable@vger.kernel.org # v6.5+
    Fixes: 481c2d14627d ("tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs")
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracefs: Clear EVENT_INODE flag in tracefs_drop_inode() [+ + +]

Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Thu May 23 01:14:29 2024 -0400

    tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
    
    commit 0bcfd9aa4dafa03b88d68bf66b694df2a3e76cf3 upstream.
    
    When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE
    flag needs to be cleared to prevent a remount from calling
    eventfs_remount() on the tracefs_inode private data. There's a race
    between the inode is dropped (and the dentry freed) to where the inode is
    actually freed. If a remount happens between the two, the eventfs_inode
    could be accessed after it is freed (only the dentry keeps a ref count on
    it).
    
    Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput()
    function. But this is incorrect, as it is possible that the inode has
    another reference to it. The flag should only be cleared when the inode is
    really being dropped and has no more references. That happens in the
    drop_inode callback of the inode, as that gets called when the last
    reference of the inode is released.
    
    Remove the tracefs_d_iput() function and move its logic to the more
    appropriate tracefs_drop_inode() callback function.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.908205106@goodmis.org
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vxlan: Fix regression when dropping packets due to invalid src addresses [+ + +]

Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon Jun 3 10:59:26 2024 +0200

    vxlan: Fix regression when dropping packets due to invalid src addresses
    
    commit 1cd4bc987abb2823836cbb8f887026011ccddc8a upstream.
    
    Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
    has recently been added to vxlan mainly in the context of source
    address snooping/learning so that when it is enabled, an entry in the
    FDB is not being created for an invalid address for the corresponding
    tunnel endpoint.
    
    Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in
    that it passed through whichever macs were set in the L2 header. It
    turns out that this change in behavior breaks setups, for example,
    Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
    passing before the change in f58f45c1e5b9 for both vxlan and geneve.
    After mentioned change it is only passing for geneve as in case of
    vxlan packets are dropped due to vxlan_set_mac() returning false as
    source and destination macs are zero which for E/W traffic via tunnel
    is totally fine.
    
    Fix it by only opting into the is_valid_ether_addr() check in
    vxlan_set_mac() when in fact source address snooping/learning is
    actually enabled in vxlan. This is done by moving the check into
    vxlan_snoop(). With this change, the Cilium connectivity test suite
    passes again for both tunnel flavors.
    
    Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: David Bauer <mail@david-bauer.net>
    Cc: Ido Schimmel <idosch@nvidia.com>
    Cc: Nikolay Aleksandrov <razor@blackwall.org>
    Cc: Martin KaFai Lau <martin.lau@kernel.org>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: David Bauer <mail@david-bauer.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

watchdog: rti_wdt: Set min_hw_heartbeat_ms to accommodate a safety margin [+ + +]

Author: Judith Mendez <jm@ti.com>
Date:   Wed Apr 17 15:57:00 2024 -0500

    watchdog: rti_wdt: Set min_hw_heartbeat_ms to accommodate a safety margin
    
    commit cae58516534e110f4a8558d48aa4435e15519121 upstream.
    
    On AM62x, the watchdog is pet before the valid window is open. Fix
    min_hw_heartbeat and accommodate a 2% + static offset safety margin.
    The static offset accounts for max hardware error.
    
    Remove the hack in the driver which shifts the open window boundary,
    since it is no longer necessary due to the fix mentioned above.
    
    cc: stable@vger.kernel.org
    Fixes: 5527483f8f7c ("watchdog: rti-wdt: attach to running watchdog during probe")
    Signed-off-by: Judith Mendez <jm@ti.com>
    Reviewed-by: Guenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/r/20240417205700.3947408-1-jm@ti.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: ath10k: fix QCOM_RPROC_COMMON dependency [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Fri May 17 10:00:28 2024 +0300

    wifi: ath10k: fix QCOM_RPROC_COMMON dependency
    
    commit 21ae74e1bf18331ae5e279bd96304b3630828009 upstream.
    
    If ath10k_snoc is built-in, while Qualcomm remoteprocs are built as
    modules, compilation fails with:
    
    /usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_init':
    drivers/net/wireless/ath/ath10k/snoc.c:1534: undefined reference to `qcom_register_ssr_notifier'
    /usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_deinit':
    drivers/net/wireless/ath/ath10k/snoc.c:1551: undefined reference to `qcom_unregister_ssr_notifier'
    
    Add corresponding dependency to ATH10K_SNOC Kconfig entry so that it's
    built as module if QCOM_RPROC_COMMON is built as module too.
    
    Fixes: 747ff7d3d742 ("ath10k: Don't always treat modem stop events as crashes")
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
    Link: https://msgid.link/20240511-ath10k-snoc-dep-v1-1-9666e3af5c27@linaro.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtl8xxxu: enable MFP support with security flag of RX descriptor [+ + +]

Author: Martin Kaistra <martin.kaistra@linutronix.de>
Date:   Thu Apr 18 09:18:13 2024 +0200

    wifi: rtl8xxxu: enable MFP support with security flag of RX descriptor
    
    commit cbfbb4ddbc8503478e0a138f9a31f61686cc5f11 upstream.
    
    In order to connect to networks which require 802.11w, add the
    MFP_CAPABLE flag and let mac80211 do the actual crypto in software.
    
    When a robust management frame is received, rx_dec->swdec is not set,
    even though the HW did not decrypt it. Extend the check and don't set
    RX_FLAG_DECRYPTED for these frames in order to use SW decryption.
    
    Use the security flag in the RX descriptor for this purpose, like it is
    done in the rtw88 driver.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/20240418071813.1883174-3-martin.kaistra@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtl8xxxu: Fix the TX power of RTL8192CU, RTL8723AU [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Mon Apr 15 23:59:05 2024 +0300

    wifi: rtl8xxxu: Fix the TX power of RTL8192CU, RTL8723AU
    
    commit 08b5d052d17a89bb8706b2888277d0b682dc1610 upstream.
    
    Don't subtract 1 from the power index. This was added in commit
    2fc0b8e5a17d ("rtl8xxxu: Add TX power base values for gen1 parts")
    for unknown reasons. The vendor drivers don't do this.
    
    Also correct the calculations of values written to
    REG_OFDM0_X{C,D}_TX_IQ_IMBALANCE. According to the vendor driver,
    these are used for TX power training.
    
    With these changes rtl8xxxu sets the TX power of RTL8192CU the same
    as the vendor driver.
    
    None of this appears to have any effect on my RTL8192CU device.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/6ae5945b-644e-45e4-a78f-4c7d9c987910@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtlwifi: rtl8192de: Fix 5 GHz TX power [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Thu Apr 25 21:09:21 2024 +0300

    wifi: rtlwifi: rtl8192de: Fix 5 GHz TX power
    
    commit de4d4be4fa64ed7b4aa1c613061015bd8fa98b24 upstream.
    
    Different channels have different TX power settings. rtl8192de is using
    the TX power setting from the wrong channel in the 5 GHz band because
    _rtl92c_phy_get_rightchnlplace expects an array which includes all the
    channel numbers, but it's using an array which includes only the 5 GHz
    channel numbers.
    
    Use the array channel_all (defined in rtl8192de/phy.c) instead of
    the incorrect channel5g (defined in core.c).
    
    Tested only with rtl8192du, which will use the same TX power code.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/c7653517-cf88-4f57-b79a-8edb0a8b32f0@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtlwifi: rtl8192de: Fix endianness issue in RX path [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Thu Apr 25 21:13:12 2024 +0300

    wifi: rtlwifi: rtl8192de: Fix endianness issue in RX path
    
    commit 2f228d364da95ab58f63a3fedc00d5b2b7db16ab upstream.
    
    Structs rx_desc_92d and rx_fwinfo_92d will not work for big endian
    systems.
    
    Delete rx_desc_92d because it's big and barely used, and instead use
    the get_rx_desc_rxmcs and get_rx_desc_rxht functions, which work on big
    endian systems too.
    
    Fix rx_fwinfo_92d by duplicating four of its members in the correct
    order.
    
    Tested only with RTL8192DU, which will use the same code.
    Tested only on a little endian system.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/698463da-5ef1-40c7-b744-fa51ad847caf@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtlwifi: rtl8192de: Fix low speed with WPA3-SAE [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Thu Apr 25 21:12:38 2024 +0300

    wifi: rtlwifi: rtl8192de: Fix low speed with WPA3-SAE
    
    commit a7c0f48410f546772ac94a0f7b7291a15c4fc173 upstream.
    
    Some (all?) management frames are incorrectly reported to mac80211 as
    decrypted when actually the hardware did not decrypt them. This results
    in speeds 3-5 times lower than expected, 20-30 Mbps instead of 100
    Mbps.
    
    Fix this by checking the encryption type field of the RX descriptor.
    rtw88 does the same thing.
    
    This fix was tested only with rtl8192du, which will use the same code.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/4d600435-f0ea-46b0-bdb4-e60f173da8dd@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtw89: correct aSIFSTime for 6GHz band [+ + +]

Author: Ping-Ke Shih <pkshih@realtek.com>
Date:   Tue Apr 30 10:05:15 2024 +0800

    wifi: rtw89: correct aSIFSTime for 6GHz band
    
    commit f506e3ee547669cd96842e03c8a772aa7df721fa upstream.
    
    aSIFSTime is 10us for 2GHz band and 16us for 5GHz and 6GHz bands.
    Originally, it doesn't consider 6GHz band and use wrong value, so correct
    it accordingly.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/20240430020515.8399-1-pkshih@realtek.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtw89: pci: correct TX resource checking for PCI DMA channel of firmware command [+ + +]

Author: Ping-Ke Shih <pkshih@realtek.com>
Date:   Wed Apr 10 09:13:16 2024 +0800

    wifi: rtw89: pci: correct TX resource checking for PCI DMA channel of firmware command
    
    commit c6330b129786e267b14129335a08fa7c331c308d upstream.
    
    The DMA channel of firmware command doesn't use TX WD (WiFi descriptor), so
    don't need to consider number of TX WD as factor of TX resource. Otherwise,
    during pause state (a transient state to switch to/from low power mode)
    firmware commands could be dropped and driver throws warnings suddenly:
    
       rtw89_8852ce 0000:04:00.0: no tx fwcmd resource
       rtw89_8852ce 0000:04:00.0: failed to send h2c
    
    The case we met is that driver sends RSSI strength of firmware command at
    RX path that could be running concurrently with switching low power mode.
    The missing of this firmware command doesn't affect user experiences,
    because the RSSI strength will be updated again after a while.
    
    The DMA descriptors of normal packets has three layers like:
    
      +-------+
      | TX BD | (*n elements)
      +-------+
          |
          |   +-------+
          +-> | TX WD | (*m elements)
              +-------+
                  |
                  |   +--------+
                  +-> |   SKB  |
                      +--------+
    
    And, firmware command queue (TXCH 12) is a special queue that has only
    two layers:
    
      +-------+
      | TX BD | (*n elements)
      +-------+
          |
          |   +------------------+
          +-> | firmware command |
              +------------------+
    
    Fixes: 4a29213cd775 ("wifi: rtw89: pci: correct TX resource checking in low power mode")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://msgid.link/20240410011316.9906-1-pkshih@realtek.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue May 28 22:21:31 2024 +0200

    x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
    
    commit 34bf6bae3286a58762711cfbce2cf74ecd42e1b5 upstream.
    
    The new AMD/HYGON topology parser evaluates the SMT information in CPUID leaf
    0x8000001e unconditionally while the original code restricted it to CPUs with
    family 0x17 and greater.
    
    This breaks family 0x15 CPUs which advertise that leaf and have a non-zero
    value in the SMT section. The machine boots, but the scheduler complains loudly
    about the mismatch of the core IDs:
    
      WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6482 sched_cpu_starting+0x183/0x250
      WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2408 build_sched_domains+0x76b/0x12b0
    
    Add the condition back to cure it.
    
      [ bp: Make it actually build because grandpa is not concerned with
        trivial stuff. :-P ]
    
    Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
    Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/56
    Reported-by: Tim Teichmann <teichmanntim@outlook.de>
    Reported-by: Christian Heusel <christian@heusel.eu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Tested-by: Tim Teichmann <teichmanntim@outlook.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/7skhx6mwe4hxiul64v6azhlxnokheorksqsdbp7qw6g2jduf6c@7b5pvomauugk
    Signed-off-by: Christian Heusel <christian@heusel.eu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>