The FBC register defines are a mess:
- namespace changes between DPFC_, FBC_, and some platform
specific prefix at a whim
- ilk+ reuses most g4x bits but still has some separate bit
defines elsewhere
- it's not clear from the defines that the bit defines are
shared
So let's clean it up:
- both g4x and ilk register share the same defines now
- only defines which conflict have a _PLATFORM suffix, everyone
else just gets comments to indicate which platforms do what
- namespace is consistent DPFC_ now
- SNB system agent fence registers also get a consistent namespace
- REG_BIT() & co. for everything
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-13-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Eliminate yet another if-ladder by adding .nuke() vfunc.
We also rename all *_recompress() stuff to *_nuke() since
that's the terminology the spec uses. Also "recompress"
is a bit confusing by perhaps implying that this triggers
an immediate recompression. Depending on the hardware that
may definitely not be the case, and in general we don't
specifically know when the hardware decides to compress.
So all we do is "nuke" the current compressed framebuffer
and leave it up to the hardware to recompress later if it
so chooses.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-8-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Declutter the *_fbc_activate() functions by pulling all the
control register value computations into helpers.
I left the enable bit in *_fbc_activate() in the hopes of maybe
using the helpers in the *_fbc_deactivate() paths as well instead
of the current rmw approach. That won't be possible at least
quite yet since we clobber the fbc->params before deactivating
FBC so we could end up changing some of the values live, which
given FBC's lack of/poor double buffering would likely not go
so well.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-6-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
The next patch needs to distinguish between a view's mapping and scanout
stride. Rename the current stride parameter to mapping_stride with the
script below. mapping_stride will keep the same meaning as stride had
on all platforms so far, while the meaning of it will change on ADLP.
No functional changes.
@@
identifier intel_fb_view;
identifier i915_color_plane_view;
identifier color_plane;
expression e;
type T;
@@
struct intel_fb_view {
...
struct i915_color_plane_view {
...
- T stride;
+ T mapping_stride;
...
} color_plane[e];
...
};
@@
struct i915_color_plane_view pv;
@@
pv.
- stride
+ mapping_stride
@@
struct i915_color_plane_view *pvp;
@@
pvp->
- stride
+ mapping_stride
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211026225105.2783797-6-imre.deak@intel.com
Legacy cursor APIs are handled by intel_legacy_cursor_update(), that
calls drm_atomic_helper_update_plane() when going through the
slow/atomic path to update cursor, what was the case for PSR2
selective fetch.
drm_atomic_helper_update_plane() sets
drm_atomic_state->legacy_cursor_update to true when updating the
cursor plane, to allow several cursor updates to happen within the
same frame, as userspace does that.
If drivers waited for a vblank increment at the end of every cursor
movement that would cause a visible lag in the cursor.
But this optimization do not properly work with PSR2 selective fetch
dirt area calculation, for example if within a single frame the cursor
had 3 moves the final dirt area programmed to PSR2_MAN_TRK_CTL would
be based in the second movement as old state and third movement as new
state, not updating the area where cursor was in the first state.
So here switching back to the fast path approach in
intel_legacy_cursor_update() and handling cursor movements as
frontbuffer rendering(psr_force_hw_tracking_exit()), that is not the
most optimal for power-savings but is the solution that we have until
mailbox style updates is implemented.
Also removing the cursor workaround as not it is properly undestand
the issue and is know that it will never cover all the cases.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210930001409.254817-5-jose.souza@intel.com
On FBC1 we can specify an arbitrary cfb stride. The hw will
simply throw away any compressed line that would exceed the
specified limit and keep using the uncompressed data instead.
Thus we can allow arbitrary compression limits.
The one thing we have to keep in mind though is that the cfb
stride is specified in units of 32B (gen2) or 64B (gen3+).
Fortunately X-tile is already 128B (gen2) or 512B (gen3+) wide
so as long as we limit outselves to the same 4x compression
limit that FBC2 has we are guaranteed to have a sufficiently
aligned cfb stride.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921152517.803-5-ville.syrjala@linux.intel.com
Apply the same 512 byte FBC segment alignment to glk+ as we use
on skl+. The only real difference is that we now have a dedicated
register for the FBC override stride. Not 100% sure which
platforms really need the 512B alignment, but it's easiest
to just do it on everything.
Also the hardware no longer seems to misclaculate the CFB stride
for linear, so we can omit the use of the override stride for
linear unless the stride is misaligned.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921152517.803-3-ville.syrjala@linux.intel.com
The code to calculate the cfb stride/size is a bit of mess.
The cfb size is getting calculated based purely on the plane
stride and plane height. That doesn't account for extra
alignment we want for the cfb stride. The gen9 override
stride OTOH is just calculated based on the plane width, and
it does try to make things more aligned but any extra alignment
added there is not considered in the cfb size calculations.
So not at all convinced this is working as intended. Additionally
the compression limit handling is split between the cfb allocation
code and g4x_dpfc_ctl_limit() (for the 16bpp case), which is just
confusing.
Let's streamline the whole thing:
- Start with the plane stride, convert that into cfb stride (cfb is
always 4 bytes per pixel). All the calculations will assume 1:1
compression limit since that will give us the max values, and we
don't yet know how much stolen memory we will be able to allocate
- Align the cfb stride to 512 bytes on modern platforms. This guarantees
the 4 line segment will be 512 byte aligned regardles of the final
compression limit we choose later. The 512 byte alignment for the
segment is required by at least some of the platforms, and just doing
it always seems like the easiest option
- Figure out if we need to use the override stride or not. For X-tiled
it's never needed since the plane stride is already 512 byte aligned,
for Y-tiled it will be needed if the plane stride is not a multiple
of 512 bytes, and for linear it's apparently always needed because the
hardware miscalculates the cfb stride as PLANE_STRIDE*512 instead of
the PLANE_STRIDE*64 that it use with linear.
- The cfb size will be calculated based on the aligned cfb stride to
guarantee we actually reserved enough stolen memory and the FBC hw
won't end up scribbling over whatever else is allocated in stolen
- The compression limit handling we just do fully in the cfb allocation
code to make things less confusing
v2: Write the min cfb segment stride calculation in a more
explicit way to make it clear what is going on
v3: Remeber to update fbc->limit when changing to 16bpp
Reviewed-by: Uma Shankar <uma.shankar@intel.com> #v2
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210923042151.19052-1-ville.syrjala@linux.intel.com
On ILK+ we current do a nuke right after activating FBC. If my
memory isn't playing tricks on me this is actially required if
FBC didn't stay disabled for a full frame. In that case the
deactivate+reactivate may not invalidate the cfb. I'd have to
double chekc to be sure though.
So let's keep the nuke, and just extend it backwards to cover
all the platforms by doing it a bit higher up.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702204603.596-4-ville.syrjala@linux.intel.com
Add support for DPT (display page table). DPT is a
slightly peculiar two level page table scheme used for
tiled scanout buffers (linear uses direct ggtt mapping
still). The plane surface address will point at a page
in the DPT which holds the PTEs for 512 actual pages.
Thus we require 1/512 of the ggttt address space
compared to a direct ggtt mapping.
We create a new DPT address space for each framebuffer and
track two vmas (one for the DPT, another for the ggtt).
TODO:
- Is the i915_address_space approaach sane?
- Maybe don't map the whole DPT to write the PTEs?
- Deal with remapping/rotation? Need to create a
separate DPT for each remapped/rotated plane I
guess. Or else we'd need to make the per-fb DPT
large enough to support potentially several
remapped/rotated vmas. How large should that be?
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Wilson Chris P <Chris.P.Wilson@intel.com>
Cc: Tang CQ <cq.tang@intel.com>
Cc: Auld Matthew <matthew.auld@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Wilson Chris P <Chris.P.Wilson@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210506161930.309688-5-imre.deak@intel.com
Hoist the intel_de.h include from intel_display_types.h one
level up. I need this in order to untangle the include order
so that I can add tracepoints into intel_de.h.
This little cocci script did most of the work for me:
@find@
@@
(
intel_de_read(...)
|
intel_de_read_fw(...)
|
intel_de_write(...)
|
intel_de_write_fw(...)
)
@has_include@
@@
(
#include "intel_de.h"
|
#include "display/intel_de.h"
)
@depends on find && !has_include@
@@
+ #include "intel_de.h"
#include "intel_display_types.h"
@depends on find && !has_include@
@@
+ #include "display/intel_de.h"
#include "display/intel_display_types.h"
Cc: Cooper Chiou <cooper.chiou@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210430143945.6776-1-ville.syrjala@linux.intel.com
While converting the rest of the driver to use GRAPHICS_VER() and
MEDIA_VER(), following what was done for display, some discussions went
back on what we did for display:
1) Why is the == comparison special that deserves a separate
macro instead of just getting the version and comparing directly
like is done for >, >=, <=?
2) IS_DISPLAY_RANGE() is weird in that it omits the "_VER" for
brevity. If we remove the current users of IS_DISPLAY_VER(), we
could actually repurpose it for a range check
With (1) there could be an advantage if we used gen_mask since multiple
conditionals be combined by the compiler in a single and instruction and
check the result. However a) INTEL_GEN() doesn't use the mask since it
would make the code bigger everywhere else and b) in the cases it made
sense, it also made sense to convert to the _RANGE() variant.
So here we repurpose IS_DISPLAY_VER() to work with a [ from, to ] range
like was the IS_DISPLAY_RANGE() and convert the current IS_DISPLAY_VER()
users to use == and != operators. Aside from the definition changes,
this was done by the following semantic patch:
@@ expression dev_priv, E1; @@
- !IS_DISPLAY_VER(dev_priv, E1)
+ DISPLAY_VER(dev_priv) != E1
@@ expression dev_priv, E1; @@
- IS_DISPLAY_VER(dev_priv, E1)
+ DISPLAY_VER(dev_priv) == E1
@@ expression dev_priv, from, until; @@
- IS_DISPLAY_RANGE(dev_priv, from, until)
+ IS_DISPLAY_VER(dev_priv, from, until)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
[Jani: Minor conflict resolve while applying.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20210413051002.92589-4-lucas.demarchi@intel.com
To allow the simplification of FB/plane view computation in the
follow-up patches, unify the corresponding state in the
intel_framebuffer and intel_plane_state structs into a new intel_fb_view
struct.
This adds some overhead to intel_framebuffer as the rotated view will
have now space for 4 color planes instead of the required 2 and it'll
also contain the unused offset for each color_plane info. Imo this is an
acceptable trade-off to get a simplified way of the remap computation.
Use the new intel_fb_view struct for the FB normal view as well, so (in
the follow-up patches) we can remove the special casing for normal view
calculation wrt. the calculation of remapped/rotated views. This also
adds an overhead to the intel_framebuffer struct, as the gtt remap info
and per-color plane offset/pitch is not required for the normal view,
but imo this is an acceptable trade-off as above. The per-color plane
pitch filed will be used by a follow-up patch, so we can retrieve the
pitch for each view in the same way.
No functional changes in this patch.
v2:
- Make the patch have _no functional change_.
(fix skl_check_nv12_aux_surface() and skl_check_main_surface()).
- s/i915_color_plane_view::pitch/stride/ (Ville)
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210325214808.2071517-17-imre.deak@intel.com
Use Coccinelle to convert most of the usage of INTEL_GEN() and IS_GEN()
in the display code to use DISPLAY_VER() comparisons instead. The
following semantic patch was used:
@@ expression dev_priv, E; @@
- INTEL_GEN(dev_priv) == E
+ IS_DISPLAY_VER(dev_priv, E)
@@ expression dev_priv; @@
- INTEL_GEN(dev_priv)
+ DISPLAY_VER(dev_priv)
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ IS_DISPLAY_VER(dev_priv, E)
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_DISPLAY_RANGE(dev_priv, from, until)
There are still some display-related uses of INTEL_GEN() in intel_pm.c
(watermark code) and i915_irq.c. Those will be updated separately.
v2:
- Use new IS_DISPLAY_RANGE and IS_DISPLAY_VER helpers. (Jani)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210320044245.3920043-4-matthew.d.roper@intel.com
ILK is the only platform that we consider "gen5" and SNB is the only
platform we consider "gen6." Add an IS_SANDYBRIDGE() macro and then
replace numeric platform tests for these two generations with direct
platform tests with the following Coccinelle semantic patch:
@@ expression dev_priv; @@
- IS_GEN(dev_priv, 5)
+ IS_IRONLAKE(dev_priv)
@@ expression dev_priv; @@
- IS_GEN(dev_priv, 6)
+ IS_SANDYBRIDGE(dev_priv)
@@ expression dev_priv; @@
- IS_GEN_RANGE(dev_priv, 5, 6)
+ IS_IRONLAKE(dev_priv) || IS_SANDYBRIDGE(dev_priv)
This will simplify our upcoming patches which eliminate INTEL_GEN()
usage in the display code.
v2:
- Reverse ilk/snb order for IS_GEN_RANGE conversion. (Ville)
- Rebase + regenerate from semantic patch
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210320044245.3920043-2-matthew.d.roper@intel.com
There are some corner cases wrt underrun when we enable
FBC with PSR2 on TGL. Recommendation from hardware is to
keep this combination disabled.
Bspec: 50422 HSD: 14010260002
v2: Added psr2 enabled check from crtc_state (Anshuman)
Added Bspec link and HSD referneces (Jose)
v3: Moved the logic to disable fbc to intel_fbc_update_state_cache
and removed the crtc->config usages, as per Ville's recommendation.
v4: Introduced a variable in fbc state_cache instead of the earlier
plane.visible WA, as suggested by Jose.
v5: Dropped an extra check for fbc in intel_fbc_enable and addressed
review comments by Jose.
v6: Move WA to end of function and added Jose's RB.
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201201190406.1752-2-uma.shankar@intel.com