diff options
author | Alex Williamson <alex.williamson@redhat.com> | 2023-04-13 13:40:42 -0600 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2023-05-11 23:03:29 +0900 |
commit | 08e9653bb9e88f4df1219857765b69a0874935b6 (patch) | |
tree | cc0a9aa007636baf295dd0534eb5362cfbbb704d /drivers | |
parent | 402299cca89273b62384b5f9645ea49cd5fc4a57 (diff) | |
download | linux-08e9653bb9e88f4df1219857765b69a0874935b6.tar.gz |
PCI/PM: Extend D3hot delay for NVIDIA HDA controllers
[ Upstream commit a5a6dd2624698b6e3045c3a1450874d8c790d5d9 ] Assignment of NVIDIA Ampere-based GPUs have seen a regression since the below referenced commit, where the reduced D3hot transition delay appears to introduce a small window where a D3hot->D0 transition followed by a bus reset can wedge the device. The entire device is subsequently unavailable, returning -1 on config space read and is unrecoverable without a host reset. This has been observed with RTX A2000 and A5000 GPU and audio functions assigned to a Windows VM, where shutdown of the VM places the devices in D3hot prior to vfio-pci performing a bus reset when userspace releases the devices. The issue has roughly a 2-3% chance of occurring per shutdown. Restoring the HDA controller d3hot_delay to the effective value before the below commit has been shown to resolve the issue. NVIDIA confirms this change should be safe for all of their HDA controllers. Fixes: 3e347969a577 ("PCI/PM: Reduce D3hot delay with usleep_range()") Link: https://lore.kernel.org/r/20230413194042.605768-1-alex.williamson@redhat.com Reported-by: Zhiyi Guo <zhguo@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tarun Gupta <targupta@nvidia.com> Cc: Abhishek Sahu <abhsahu@nvidia.com> Cc: Tarun Gupta <targupta@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Diffstat (limited to 'drivers')
-rw-r--r-- | drivers/pci/quirks.c | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 494fa46f5767..8d32a3834688 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -1940,6 +1940,19 @@ static void quirk_radeon_pm(struct pci_dev *dev) DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm); /* + * NVIDIA Ampere-based HDA controllers can wedge the whole device if a bus + * reset is performed too soon after transition to D0, extend d3hot_delay + * to previous effective default for all NVIDIA HDA controllers. + */ +static void quirk_nvidia_hda_pm(struct pci_dev *dev) +{ + quirk_d3hot_delay(dev, 20); +} +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, + PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, + quirk_nvidia_hda_pm); + +/* * Ryzen5/7 XHCI controllers fail upon resume from runtime suspend or s2idle. * https://bugzilla.kernel.org/show_bug.cgi?id=205587 * |