As we provide a list of all objects that will be accessed from the
batchbuffer, we can build a lut of the handles associated with those
objects for this invocation and use that to avoid the overhead of
looking up those objects again for every relocation.
The cost of building and searching a small hash table is much less than
that of acquiring a spinlock, searching a radix tree and manipulating an
atomic refcnt per relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Don't post a downclocking task if the device is still active when the
idle timer fires. A pathological process could queue up several seconds
worth of processing and then go to sleep, during which time the idle
timer would kick in and downclock the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the tail advances beyond the autoreport HEAD value, then we need to
fallback to an uncached read of the HEAD register in order to ascertain
the correct amount of remaining space in the ringbuffer.
Reported-by: Fang, Xun <xunx.fang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32259
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The DisplayPort standard (1.1a) states that:
The I2C-over-AUX Reply field is valid only when Native AUX CH Reply
field is AUX_ACK (00). When Native AUX CH Reply field is not 00, then,
I2C-over-AUX Reply field must be 00 and be ignored.
This fixes broken EDID reading when using an active DisplayPort to
duallink DVI converter. If the AUX CH replier chooses to defer the
transaction, a short read occurs and erroneous data is returned as
the i2c reply due to a lack of length checking and failure to check
for AUX ACK.
As a result, broken EDIDs can look like:
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
00: bc bc bc ff bc bc bc ff bc bc bc ac bc bc bc 45 ???.???.???????E
10: bc bc bc 10 bc bc bc 34 bc bc bc ee bc bc bc 4c ???????4???????L
20: bc bc bc 50 bc bc bc 00 bc bc bc 40 bc bc bc 00 ???P???.???@???.
30: bc bc bc 01 bc bc bc 01 bc bc bc a0 bc bc bc 40 ???????????????@
40: bc bc bc 00 bc bc bc 00 bc bc bc 00 bc bc bc 55 ???.???.???.???U
50: bc bc bc 35 bc bc bc 31 bc bc bc 20 bc bc bc fc ???5???1??? ????
60: bc bc bc 4c bc bc bc 34 bc bc bc 46 bc bc bc 00 ???L???4???F???.
70: bc bc bc 38 bc bc bc 11 bc bc bc 20 bc bc bc 20 ???8??????? ???
80: bc bc bc ff bc bc bc ff bc bc bc ff bc bc bc ff ???.???.???.???.
...
which can lead to:
[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder
[drm:drm_edid_block_valid] *ERROR* Raw EDID:
<3>30 30 30 30 30 30 30 32 38 32 30 32 63 63 31 61 000000028202cc1a
<3>28 00 02 8c 00 00 00 00 18 00 00 00 00 00 00 00 (...............
<3>20 4c 61 73 74 20 62 65 61 63 6f 6e 3a 20 33 32 Last beacon: 32
<3>32 30 6d 73 20 61 67 6f 46 00 05 8c 00 00 00 00 20ms agoF.......
<3>36 00 00 00 00 00 00 00 00 0c 57 69 2d 46 69 20 6.........Wi-Fi
<3>52 6f 75 74 65 72 01 08 82 84 8b 96 24 30 48 6c Router......$0Hl
<3>03 01 01 06 02 00 00 2a 01 00 2f 01 00 32 04 0c .......*../..2..
<3>12 18 60 dd 09 00 10 18 02 00 00 01 00 00 18 00 ..`.............
Signed-off-by: David Flynn <davidf@rd.bbc.co.uk>
[ickle: fix up some surrounding checkpatch warnings]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
As we may try to power down the link at various times, it is not
necessarily still coupled with an encoder and so we must be careful not
to depend upon an operation that is only valid when the link is still
attached to a pipe.
Fixes regression in 5bddd17.
Reported-and-tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org [after applying 5bddd17]
In order for bos to retire eventually, a request must be sent down the
ring. This is expected, for example, by occlusion queries for which mesa
will wait upon (whilst running glean) before issuing more batches and so
the normal activity upon the ring is suspended and we need to emit a
request to clear the idle ring.
Reported-by: Jinjin, Wang <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm still seeing tiling corruption of PutImage and CopyArea (I think)
under mutter on pnv, so obviously the pipelining logic is deeply flawed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Userspace should not have been declaring that it needed fenced GPU
access with gen4+ as those GPUs have no fenced commands, but to be on
the safe side it is easier to ignore userspace in case they did.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The ability to save the hardware context upon powering down the render
clock through PWRCTXA is only available on a couple of gen4 chipsets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The workaround is hideous and we are using the STORE_DWORD on all other
generations on all other rings, so use for the gen5 render ring as
well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Linus Torvalds pointed out that our code was unbalanced when powering on
the panel with respect to the power off sequence in that we were failing
to restore the panel-fitter. The consequence of this would be that
across a simple DPMS off/on for a non-native mode, without an intervening
modeset, the panel fitter would remain disabled and the output would shift
on the panel.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There's not much we can do here but hope for the best. However the first
failure happens quite frequently and if often remedied by the second
attempt to reset HEAD. So only print the error if that attempt also
fails.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=19802
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Otherwise we can't really fix the abi-braindeadness of forcing
libva to manually wait for rendering when switching rings. Which
in turn makes implementing hw semaphores a pointless exercise
(at least for ironlake).
[Also added the relaxed fencing param to explain the jump in
numbering - relaxed fencing is in -next.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously we enabled this for gen4, only to have to revert it due to it
causing a large number of spurious wakeups. Try again hoping that the
hardware has become more sane in the mean time...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Magic numbers from the specs. This is supposed to allow the PLL some
variance to improve jitter performance and VCO headroom across
manufacturing and environmental variations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... it's because setting the Pixel Multiply bits only takes effect once
the PLL is enabled and stable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes the modesetting on the secondary panel of the Libretto W100 and
presumably many more Ironlake laptops with SDVO LVDS displays.
Reported-and-tested-by: Matthew Willoughby <mattfredwill@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Use the hardware DDA to calculate the ratio with as much accuracy as is
possible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
If we leave the registers in a conflicting state then when we attempt
to teardown the active mode, we will not disable the pipes and planes
in the correct order -- leaving a plane reading from a disabled pipe and
possibly leading to undefined behaviour.
Reported-and-tested-by: Andy Whitcroft <apw@canonical.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32078
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
This workaround only applies to Ironlake.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
The pipe is always set to 8BPC, but here we were leaving whatever
previous bits were set by the BIOS in place.
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
As the tracepoint is now decoupled from when the actual register is
assigned and was never complemented by detailing when the object lost
its fence, it has outlived its limited usefulness. Profiling the actual
stalls is a far more profitable venture anyway.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the userspace mappings are torn down on every GPU write, we prefer to
track when the buffer is activated (via a fresh i915_gem_fault). This
makes the LRU conceptually simpler. With coherent mappings, the
remaining use-case for set_domain_ioctl is GPU synchronisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With this change, every batchbuffer can use all available fences (save
pinned and scanout, of course) without ever stalling the gpu!
In theory. Currently the actual pipelined update of the register is
disabled due to some stability issues. However, just the deferred update
is a significant win.
Based on a series of patches by Daniel Vetter.
The premise is that before every access to a buffer through the GTT we
have to declare whether we need a register or not. If the access is by
the GPU, a pipelined update to the register is made via the ringbuffer,
and we track the last seqno of the batches that access it. If by the
CPU we wait for the last GPU access and update the register (either
to clear or to set it for the current buffer).
One advantage of being able to pipeline changes is that we can defer the
actual updating of the fence register until we first need to access the
object through the GTT, i.e. we can eliminate the stall on set_tiling.
This is important as the userspace bo cache does not track the tiling
status of active buffers which generate frequent stalls on gen3 when
enabling tiling for an already bound buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 869184a675.
This is required for the Sony Vaio Jesse was working on at the time, but
breaks most other eDP machines - machines that were working in earlier
kernels.
Reported-and-tested-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31188
Tested-by: Zhao Jian <jian.j.zhao@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... otherwise the panel-fitter may be left enabled with random settings
and cause unintended filtering (i.e. blurring of native modes on external
panels).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31942
Reported-and-tested-by: Ben Kohler <bkohler@gmail.com>
Tested-by: Ciprian Docan <docan@eden.rutgers.edu>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... so that upon first use after resume we will reacquire the fence reg.
Reported-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the error occurred on the current object, it means that its state was
not changed and so it should be excluded from the unwind.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We don't track gpu flush request in any special way. So even with
obj->write_domain == 0, a gpu flush might be outstanding but no
yet executed. Even worse, the latest request might use the object
only for reading. So and unconditional call to object_wait_rendering
is needed for !pipelined.
Hence revert that patch fully and untangle the flushing from the
synchronization again.
Reported-by: Keith Packard <keithp@keithp.com>
Tested-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Besides the minimal improvement in reducing the execbuffer overhead, the
real benefit is clarifying a few routines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A number of dragons have been seen lurking within the execbuffer code.
The first step is then to isolate them from the rest and begin to
scrutinise them in depth. Suggested by Daniel Vetter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Simply remove our accounting of objects inside the aperture, keeping
only track of what is in the aperture and its current usage. This
removes the over-complication of BUGs that were attempting to keep the
accounting correct and also removes the overhead of the accounting on
the hot-paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With KMS, we can simply relinquish the fence when we idle the GPU and
reassign it upon first use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Commit d09c23de intended to add a 30ms delay to give the ADD time to
detect any TVs connected. However, it used the sdvo->is_tv flag to do so
which is dependent upon the previous detection result and not whether the
output supports TVs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Avoid evicting buffers that will be used later in the batch in order to
make room for the initial buffers by pinning all bound buffers in a
single pass before binding (and evicting for) fresh buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On a few devices, like the Mac Mini, the CRT DDC pins are shared between
the analog connector and the digital connector. In this scenario, rely
on the EDID to determine if a digital panel is connected to the digital
connector.
Reported-and-tested-by: Tino Keitel <tino.keitel@tikei.de>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This used to check the precondition that all fences were to be located
in a mappable area, redundant now as those two parameters are combined
into one.
After pinning, we assert that the buffer is bound into the desired
region.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pipe control object is allocated by the device for the sole use of the
render ringbuffer. Move this detail from the general code to the render
ring buffer initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having seen the effects of erroneous fencing on the batchbuffer, a
useful sanity check is to record the fence registers at the time of an
error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combining map_and_fenceable revealed a bug in
i915_gem_object_gtt_size() in that it always computed the appropriate
fence size for the object regardless of tiling state which caused us to
over-allocate linear buffers when binding to the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This still uses the agp functions to actually reinstate the mappings
(with a gross hack to make agp cooperate), but it wires everything
up correctly for the switchover.
The call to agp_rebind_memory can be dropped because all non-kms drivers
do all their rebinding on EnterVT.
v2: Be more paranoid and flush the chipset cache after restoring gtt
mappings.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is required to restore gtt mappings on resume when agp is gone.
The right way to do this would be to make sturct drm_mm_node embeddable
and use the allocation list maintained by the drm memory manager. But
that's a bigger project. Getting rid of the per bo agp_mem will save
more memory than this wastes, anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently if we hit a pagefault when applying a user relocation for the
execbuffer, we bail and return EFAULT to the application. Instead, we
need to unwind, drop the dev->struct_mutex, copy all the relocation
entries to a vmalloc array (to avoid any potential circular deadlocks
when resolving the pagefault), retake the mutex and then apply the
relocations. Afterwards, we need to again drop the lock and copy the
vmalloc array back to userspace.
v2: Incorporate feedback from Daniel Vetter.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The GATT is a write-only set of registers, reading from them in the
manner of i915_gtt_to_phys() is supposed to be undefined. However a
simple solution exists as we allocate linear memory from the stolen
area, we can simply add the block offset to the base register. As a
side-effect we recover all the unused stolen GTT entries and so enlarge
our aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After a GPU reset, the backlight controller registers may be also reset
to 0. In that case we should restore those to the original values
programmed by the BIOS. Note that we still lack the code to handle the
case where the BIOS failed to program those registers at all...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we conflated intel_sdvo->is_hdmi with both having HDMI support on the
ADD along with having HDMI support on the monitor, we would attempt to
use HDMI encodings even if the interface did not support those commands.
Reported-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Reviewed-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
We were reading our 64-bit value in I915_READ64 and returning 32 bits
of it. The restoration of fence regs at resume then had a zero end
value, and the fence had no effect.
Version 2: Split register access functions into per-size versions
Sharing code between different sizes seemed reasonable when we only
needed a single copy, but as 64-bit access requires its own version,
it makes sense to just split them out for each size.
Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
[ickle: use a macro to create the various read/write routines]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This has proven sufficient to recover from a hang of the GPU using the
gem_bad_blit test while at the KMS console then starting X. When
attempting the same during an X session, the timer doesn't appear to
trigger.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It isn't used for the hangcheck, which does its work right from the
timer trigger, but hangcheck can lead to error state recording, which
is run off of the workqueue.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When trying to diagnose mysterious errors on resume, capture the
display register contents as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pinned buffers are useful for diagnosing errors in setting up state
for the chipset, which may not necessarily be 'active' at the time of
the error, e.g. the cursor buffer object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Under KMS, restoring the cursor is handled upon modeswitch in order to
avoid enabling an undefined set of registers. At the moment, the cursor
is restored before the aperture and modes are fully setup causing some
invalid access during resume, such as:
PGTBL_ER: 0x00040000
Invalid GTT entry during Cursor Fetch
Fix this by only performing cursor register save/restore under UMS where
it is done in the correct sequence.
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Commit 2549d6c2 removed the vmalloc used for temporary storage of the
relocation lists used during execbuffer. However, our use of vmalloc was
being protected by an integer overflow check which we do want to
preserve!
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Frame buffer compression is broken on Ironlake due to buggy hardware.
Currently it is disabled through chicken bits, but it still consumes
over 1W more than if we simply never attempt to enable the FBC code
paths.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Both IBX and CPT have an automatic hotplug detection mode which appears to work reliably enough
that we can dispense with the manual force hotplug trigger stuff. This means that
hotplug detection is as simple as reading the current hotplug register values.
The first time the hotplug detection is activated, the code synchronously waits for a hotplug
sequence in case the hardware hasn't bothered to do a detection cycle since being initialized.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We will use this structure in future patches to store CRT specific
information on the encoder.
Split out and tweaked from a patch by Keith Packard.
Signed-off-by: Keith Packard <keithp@kithp.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Linus Torvalds found that it was rather trivial to trigger a system
freeze:
In fact, with lockdep, I don't even need to do the sysrq-d thing: it
shows the bug as it happens. It's the X server taking the same lock
recursively.
Here's the problem:
=============================================
[ INFO: possible recursive locking detected ]
2.6.37-rc2-00012-gbdbd01a #7
---------------------------------------------
Xorg/2816 is trying to acquire lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c626c>] i915_gem_fault+0x50/0x17e
but task is already holding lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c403b>] i915_mutex_lock_interruptible+0x28/0x4a
other info that might help us debug this:
2 locks held by Xorg/2816:
#0: (&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c403b>] i915_mutex_lock_interruptible+0x28/0x4a
#1: (&mm->mmap_sem){++++++}, at: [<ffffffff81022d4f>] page_fault+0x156/0x37b
This recursion was introduced by rearranging the locking to avoid the
double locking on the fast path (4f27b5d and fbd5a26d) and the
introduction of the prefault to encourage the fast paths (b5e4f2b). In
order to undo the problem, we rearrange the code to perform the access
validation upfront, attempt to prefault and then fight for control of the
mutex. the best case scenario where the mutex is uncontended the
prefaulting is not wasted.
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we may bind an object with the correct alignment, but with an invalid
size, it may pass the current checks on whether the object may be reused
with a fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
g33/pineview doesn't have any alignment constrains for unfenced tiled
buffers. But older chips have. Fix this.
Problem introduced in a00b10c360.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
An old and oft reported bug, is that of the GPU hanging on a
MI_WAIT_FOR_EVENT following a mode switch. The cause is that the GPU is
waiting on a scanline counter on an inactive pipe, and so waits for a
very long time until eventually the user reboots his machine.
We can prevent this either by moving the WAIT into the kernel and
thereby incurring considerable cost on every swapbuffers, or by waiting
for the GPU to retire the last batch that accesses the framebuffer
before installing a new one. As mode switches are much rarer than swap
buffers, this looks like an easy choice.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28964
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29252
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
We only ever used the PRB0, neglecting the secondary ring buffers, and
now with the advent of multiple engines with separate ring buffers we
need to excise the anachronisms from our code (and be explicit about
which ring we mean where). This is doubly important in light of the
FORCEWAKE required to read ring buffer registers on SandyBridge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Before reading ring register, set FORCE_WAKE bit to prevent GT core
power down to low power state, otherwise we may read stale values.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: added a udelay which seemed to do the trick on my SNB]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We use i915_gem_object_get_fence_reg() to do LRU tracking of the fence
registers, so stop trying to be too clever when pinning the fb->obj.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fix many small bugs in I2C adapter registration:
* Properly reject unsupported GPIO pin.
* Fix improper use of I2C_NAME_SIZE (which is the size of
i2c_client.name, not i2c_adapter.name.)
* Prefix adapter names with "i915" so that the user knows what the
I2C channel is connected to.
* Fix swapped characters in the string used to name the GPIO-based
adapter.
* Add missing comma in gmbus name table.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Commit 219adae1 cached the EDID found during LVDS init, but in the
process prevented the init routine from discovering the preferred
fixed-mode for the panel. This was causing us to guess the correct mode,
which sometimes is wide of the mark.
Reported-and-tested-by: Jon Masters <jonathan@jonmasters.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If modeset init failed we attempted to unload the module, before we
finished setting it up and so triggered various oopses.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use POSTING_READ to flush the write to the register before
proceeding, we do not care what the return value is and similar we do
not care for the read to be recorded whilst tracing register
read/writes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
These registers are written very frequently, are timing sensitive, and
not particularly relevant to any debugging, so remove the tracepoints
from these.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This will be used later to hide the frequently written registers
from debug traces in order to increase the signal-to-noise.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add two tracepoints at I915_WRITE/READ for tracing down all the
register write and read.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
My Sandybridge only reports 0 for the ring buffer registers, causing it
to hang as soon as we exhaust the available ring. As a workaround, take
advantage of our huge ring buffers and use the auto-reporting mechanism
to update the status page with the HEAD location every 64 KiB.
Cherry-picked from 6aa56062ea.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31404
Tested-by: Zhao Jian <jian.j.zhao@intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is not known to fix any particular bugs we have, but the spec
says to do it, and the BIOS hadn't already set it up on my system.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and so prevent a potential circular reference:
[ INFO: possible circular locking dependency detected ]
2.6.37-rc1-uwe1+ #4
-------------------------------------------------------
Xorg/1401 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<c01e4ddb>] might_fault+0x4b/0xa0
but task is already holding lock:
(&dev->struct_mutex){+.+.+.}, at: [<f869c3ac>]
i915_mutex_lock_interruptible+0x3c/0x60 [i915]
which lock already depends on the new lock.
When the locking around the pwrite ioctl was simplified, I did not spot
that the phys path never took any locks and so we introduced this
potential circular reference.
Reported-by: Uwe Helm <uwe.helm@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The ring buffer registers return 0 whilst idle (for some values of idle)
on early Sandybridge hw. Persevere even when all appears hopeless...
Fortunately the head auto-reporting prevents most hangs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31370
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Instead of killing the process, just return no page found and reschedule
the process giving the GPU some time to (hopefully) recover.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
a00b10c360 "Only enforce fence limits inside the GTT" also
added a fenceable/mappable disdinction when binding/pinning buffers.
This only complicates the code with no pratical gain:
- In execbuffer this matters on for g33/pineview, as this is the only
chip that needs fences and has an unmappable gtt area. But fences
are only possible in the mappable part of the gtt, so need_fence
implies need_mappable. And need_mappable is only set independantly
with relocations which implies (for sane userspace) that the buffer
is untiled.
- The overlay code is only really used on i8xx, which doesn't have
unmappable gtt. And it doesn't support tiled buffers, currently.
- For all other buffers it's a bug to pass in a tiled bo.
In short, this disdinction doesn't have any practical gain.
I've also reverted mapping the overlay and context pages as possibly
unmappable. It's not worth being overtly clever here, all the big
gains from unmappable are for execbuf bos.
Also add a comment for a clever optimization that confused me
while reading the original patch by Chris Wilson.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In a00b10c360 "Only enforce fence limits inside the GTT"
Chris Wilson implemented an optimization to only pin framebuffers
as mappable for crtc_set_base (but not for pageflips). This breaks
the abi, eg: A double buffering mesa client might leave the last
framebuffer in unmappable space on close. A subsequent glReadPix
by a frontbuffer rendering client then goes boom. My pretty anal
mappable/unmappable consistency checking detected this, see
https://bugs.freedesktop.org/show_bug.cgi?id=31286
Chris Wilson tried to fix this in 085ce26437 by pinning
tiled framebuffers into mappable space. This
a) renders the original optimization of not forcing framebuffers
for pageflipping clients into mappable pointless because all our
scanout buffers are tiled by default.
b) doesn't solve the problem for untiled framebuffers.
So kill this. Emperically it's no gain anyway because framebuffers are
being reused by the ddx and hence there's no chance for them to get
constanly bounced between mappable and unmappable.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We should enable FDI normal training on Sandybridge/CPT system
as well.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
[ickle: removed unrelated chunks]
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes issue where i915_gfx_val was reporting values several
orders of magnitude higher than physically possible (without
leaving scorch marks on my thighs at least.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When merging Daniel's full-gtt patches I had a set of tweaks which I
thought I had undone. I was half right...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31286
Reported-by: jinjin.wang@intel.com
Reported-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I presumed that we would be writing to the batch through the GTT having
bound it, so I converted it to use iomem. Even later as I spotted that
we didn't even move the batch to the GTT (now an issue since we default
to uncached memory on SNB) I still didn't realise that using iomem for
kmapped memory was incorrect. Fix it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Immediate merge to resolve conflicts from applying a stability fix to
both branches.
Conflicts:
drivers/gpu/drm/i915/intel_ringbuffer.c
drivers/gpu/drm/i915/intel_ringbuffer.h
On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.
(cherry picked from commit 8d19215be8)
Conflicts:
drivers/gpu/drm/i915/intel_ringbuffer.c
drivers/gpu/drm/i915/intel_ringbuffer.h
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: rebased for -next]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Part of the issue here was that Eric slipped in a debug hack for
testing the i915 IPS code before the intel_ips.c driver had landed.
This caused the driver to always use the full range of frequencies,
which is only legal when IPS tells us we have the headroom. Once that
hack was removed, there was confusion about the driver's frequency
clamping variables: max_delay is the driver's current limit on the
highest frequency the IPS driver wants us to use, while dev_priv->fmax
is the hardware-reported limit that the IPS driver can increase up to.
Tested with IPS driver loaded or not. Note that on Ironlake systems
without the IPS driver loaded this will result in a performance
reduction, and the inital warmup of frequency limits can impact
benchmarking on systems with IPS loaded.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
[ickle: demoted a debugging printk]
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2.6.36 appears to respect the 0400 mode we assigned to the parameter
preventing it from being adjusted after loading. However, this is safe
to adjust at runtime.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31311
Reported-by: Fernando Lemos <fernandotcl@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In case of an opregion signature mismatch in intel_opregion_setup(),
iounmap the correct address.
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take two passes to evict everything whilst searching for sufficient free
space to bind the batchbuffer. After searching for sufficient free space
using LRU eviction, evict everything that is purgeable and try again.
Only then if there is insufficient free space (or the GTT is too badly
fragmented) evict everything from the aperture and try one last time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Accessing the uninitialised obj->pages instead of the local page lead to
an OOPs.
Reported-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
My Sandybridge only reports 0 for the ring buffer registers, causing it
to hang as soon as we exhaust the available ring. As a workaround, take
advantage of our huge ring buffers and use the auto-reporting mechanism
to update the status page with the HEAD location every 64 KiB.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After switching the MMIO registers to use pci_iomap, remember to dispose
of the mapping with pci_iounmap (for symmetry).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>