As we just did for context cache flushing, clean up the logic around
whether we need to flush the iotlb or just the write-buffer, depending
on caching mode.
Fix the same bug in qi_flush_iotlb() that qi_flush_context() had -- it
isn't supposed to be returning an error; it's supposed to be returning a
flag which triggers a write-buffer flush.
Remove some superfluous conditional write-buffer flushes which could
never have happened because they weren't for non-present-to-present
mapping changes anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It really doesn't make a lot of sense to have some of the logic to
handle caching vs. non-caching mode duplicated in qi_flush_context() and
__iommu_flush_context(), while the return value indicates whether the
caller should take other action which depends on the same thing.
Especially since qi_flush_context() thought it was returning something
entirely different anyway.
This patch makes qi_flush_context() and __iommu_flush_context() both
return void, removes the 'non_present_entry_flush' argument and makes
the only call site which _set_ that argument to 1 do the right thing.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The domain->id is a sequence number associated with the KVM guest
and should not be used for the context flush. This patch replaces
the domain->id with a proper id value for both bare metal and KVM.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Acked-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This updated patch should fix the compiling errors and remove the extern
iommu_pass_through from drivers/pci/intel-iommu.c file.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The patch adds kernel parameter intel_iommu=pt to set up pass through
mode in context mapping entry. This disables DMAR in linux kernel; but
KVM still runs on VT-d and interrupt remapping still works.
In this mode, kernel uses swiotlb for DMA API functions but other VT-d
functionalities are enabled for KVM. KVM always uses multi level
translation page table in VT-d. By default, pass though mode is disabled
in kernel.
This is useful when people don't want to enable VT-d DMAR in kernel but
still want to use KVM and interrupt remapping for reasons like DMAR
performance concern or debug purpose.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Weidong Han <weidong@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This issue was pointed out by Linus.
In dma_pte_clear_range() in intel-iommu.c
start = PAGE_ALIGN(start);
end &= PAGE_MASK;
npages = (end - start) / VTD_PAGE_SIZE;
In partial page case, start could be bigger than end and npages will be
negative.
Currently the issue doesn't show up as a real bug in because start and
end have been aligned to page boundary already by all callers. So the
issue has been hidden. But it is dangerous programming practice.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It's possible for a device in the drhd->devices[] array to be NULL if
it wasn't found at boot time, which means we have to check for that
case.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We were comparing {bus,devfn} and assuming that a match meant it was the
same device. It doesn't -- the same {bus,devfn} can exist in
multiple PCI domains. Include domain number in device identification
(and call it 'segment' in most places, because there's already a lot of
references to 'domain' which means something else, and this code is
infected with ACPI thinking already).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When the DMAR table identifies that a PCI-PCI bridge belongs to a given
IOMMU, that means that the bridge and all devices behind it should be
associated with the IOMMU. Not just the bridge itself.
This fixes the device_to_iommu() function accordingly.
(It's broken if you have the same PCI bus numbers in multiple domains,
but this function was always broken in that way; I'll be dealing with
that later).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
interrupt remapping must be enabled before enabling x2apic, but
interrupt remapping doesn't depend on x2apic, it can be used
separately. Enable interrupt remapping in init_dmars even x2apic
is not supported.
[dwmw2: Update Kconfig accordingly, fix build with INTR_REMAP && !X2APIC]
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch implements the suspend and resume feature for Intel IOMMU
DMAR. It hooks to kernel suspend and resume interface. When suspend happens, it
saves necessary hardware registers. When resume happens, it restores the
registers and restarts IOMMU by enabling translation, setting up root entry, and
re-enabling queued invalidation.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix address wrap on 32-bit kernel.
intel-iommu: Enable DMAR on 32-bit kernel.
intel-iommu: fix PCI device detach from virtual machine
intel-iommu: VT-d page table to support snooping control bit
iommu: Add domain_has_cap iommu_ops
intel-iommu: Snooping control support
Fixed trivial conflicts in arch/x86/Kconfig and drivers/pci/intel-iommu.c
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
dma-debug: make memory range checks more consistent
dma-debug: warn of unmapping an invalid dma address
dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG
dma-debug/x86: register pci bus for dma-debug leak detection
dma-debug: add a check dma memory leaks
dma-debug: add checks for kernel text and rodata
dma-debug: print stacktrace of mapping path on unmap error
dma-debug: Documentation update
dma-debug: x86 architecture bindings
dma-debug: add function to dump dma mappings
dma-debug: add checks for sync_single_sg_*
dma-debug: add checks for sync_single_range_*
dma-debug: add checks for sync_single_*
dma-debug: add checking for [alloc|free]_coherent
dma-debug: add add checking for map/unmap_sg
dma-debug: add checking for map/unmap_page/single
dma-debug: add core checking functions
dma-debug: add debugfs interface
dma-debug: add kernel command line parameters
dma-debug: add initialization code
...
Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
The problem is in dma_pte_clear_range and dma_pte_free_pagetable. When
intel_unmap_single and intel_unmap_sg call them, the end address may be
zero if the 'start_addr + size' rounds up. So no PTE gets cleared. The
uncleared PTE fires the BUG_ON when it's used again to create new mappings.
After I modified dma_pte_clear_range a bit, the BUG_ON is gone.
Tested both 32 and 32 PAE modes on Intel X58 and Q35 platforms.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If we fix a few highmem-related thinkos and a couple of printk format
warnings, the Intel IOMMU driver works fine in a 32-bit kernel.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When assign a device behind conventional PCI bridge or PCIe to
PCI/PCI-x bridge to a domain, it must assign its bridge and may
also need to assign secondary interface to the same domain.
Dependent assignment is already there, but dependent
deassignment is missed when detach device from virtual machine.
This results in conventional PCI device assignment failure after
it has been assigned once. This patch addes dependent
deassignment, and fixes the issue.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The user can request to enable snooping control through VT-d page table.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This iommu_op can tell if domain have a specific capability, like snooping
control for Intel IOMMU, which can be used by other components of kernel to
adjust the behaviour.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Snooping control enabled IOMMU to guarantee DMA cache coherency and thus reduce
software effort (VMM) in maintaining effective memory type.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
According to kerneljanitors todo list all printk calls (beginning
a new line) should have an according KERN_* constant.
Those are the missing pieces here for the pci subsystem.
Signed-off-by: Frank Seidel <frank@f-seidel.de>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: cleanup/sanitization
Start from a sane state while enabling dma and interrupt-remapping, by
clearing the previous recorded faults and disabling previously
enabled queued invalidation and interrupt-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: interface augmentation (not yet used)
Enable fault handling flow for intr-remapping aswell. Fault handling
code now shared by both dma-remapping and intr-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: code movement
Move page fault handling code to dmar.c
This will be shared both by DMA-remapping and Intr-remapping code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.
Override that bit on the affected chipsets, and everything is happy
again.
Thanks to Chris and Bhavesh and others for helping to debug.
Should resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=479996http://bugzilla.kernel.org/show_bug.cgi?id=12578
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-and-acked-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to recurring issues with DMAR support on certain platforms.
There's a number of filesystem corruption incidents reported:
https://bugzilla.redhat.com/show_bug.cgi?id=479996http://bugzilla.kernel.org/show_bug.cgi?id=12578
Provide a Kconfig option to change whether it is enabled by
default.
If disabled, it can still be reenabled by passing intel_iommu=on to the
kernel. Keep the .config option off by default.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The dma ops unification enables X86 and IA64 to share intel_dma_ops so
we can make dma mapping functions static. This also remove unused
intel_map_single().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dma_mapping_error is used to see if dma_map_single and dma_map_page
succeed. IA64 VT-d dma_mapping_error always says that dma_map_single
is successful even though it could fail. Note that X86 VT-d works
properly in this regard.
This patch fixes IA64 VT-d dma_mapping_error by adding VT-d's own
dma_mapping_error() that works for both X86_64 and IA64. VT-d uses
zero as an error dma address so VT-d's dma_mapping_error returns 1 if
a passed dma address is zero (as x86's VT-d dma_mapping_error does
now).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With some broken BIOSs when VT-d is enabled, the data structures are
filled incorrectly. This can cause a NULL pointer dereference in very
early boot.
Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Acked-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts X86 and IA64 to use include/linux/dma-mapping.h.
It's a bit large but pretty boring. The major change for X86 is
converting 'int dir' to 'enum dma_data_direction dir' in DMA mapping
operations. The major changes for IA64 is using map_page and
unmap_page instead of map_single and unmap_single.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts dma_map_single and dma_unmap_single to use
map_page and unmap_page respectively and removes unnecessary
map_single and unmap_single in struct dma_mapping_ops.
This leaves intel-iommu's dma_map_single and dma_unmap_single since
IA64 uses them. They will be removed after the unification.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a preparation of struct dma_mapping_ops unification. We use
map_page and unmap_page instead of map_single and unmap_single.
This uses a temporary workaround, ifdef X86_64 to avoid IA64
build. The workaround will be removed after the unification. Well,
changing x86's struct dma_mapping_ops could break IA64. It's just
wrong. It's one of problems that this patchset fixes.
We will remove map_single and unmap_single hooks in the last patch in
this patchset.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When domain is related to multiple iommus, need to check if the minimum agaw is sufficient for the mapped memory
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
vm_domid won't be set in context, find available domain id for a device from its iommu.
For a virtual machine domain, a default agaw will be set, and skip top levels of page tables for iommu which has less agaw than default.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
virtual machine domain is different from native DMA-API domain, implement separate allocation and free functions for virtual machine domain.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Because virtual machine domain may have multiple devices from different iommus, it cannot use __iommu_flush_cache.
In some common low level functions, use domain_flush_cache instead of __iommu_flush_cache. On the other hand, in some functions, iommu can is specified or domain cannot be got, still use __iommu_flush_cache
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add iommu reference count in domain, and add a lock to protect iommu setting including iommu_bmp, iommu_count and iommu_coherency.
virtual machine domain may have multiple devices from different iommus, so it needs to do more things when add/remove domain device info. Thus implement separate these functions for virtual machine domain.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>