History log of /linux-master/drivers/gpu/drm/i915/i915_perf.c
Revision Date Author Comments
# 48ba4a6d 20-Mar-2024 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Update IP_VER(12, 50)

With no platform using graphics/media IP_VER(12, 50), replace the
checks throughout the code with IP_VER(12, 55) so the code makes sense
by itself with no additional explanation of previous baggage.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>


# cb4046d2 20-Mar-2024 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Drop dead code for xehpsdv

PCI IDs for XEHPSDV were never added and platform always marked with
force_probe. Drop what's not used and rename some places to either be
xehp or dg2, depending on the platform/IP checks.

The registers not used anymore are also removed.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>


# c44d4ef4 08-Jan-2024 Matt Roper <matthew.d.roper@intel.com>

drm/i915/xelpg: Extend some workarounds/tuning to gfx version 12.74

Some of our existing Xe_LPG workarounds and tuning are also applicable
to the version 12.74 variant. Extend the condition bounds accordingly.
Also fix the comment on Wa_14018575942 while we're at it.

v2: Extend some more workarounds (Harish)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240108122738.14399-4-haridhar.kalvala@intel.com


# 0c68132d 18-Dec-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Update handling of MMIO triggered reports

On XEHP platforms user is not able to find MMIO triggered reports in the
OA buffer since i915 squashes the context ID fields. These context ID
fields hold the MMIO trigger markers.

Update logic to not squash the context ID fields of MMIO triggered
reports.

Fixes: 7eeaedf79989 ("drm/i915/perf: Determine context valid in OA reports")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231219000543.1087706-1-umesh.nerlige.ramappa@intel.com


# 0103b449 18-Dec-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Update handling of MMIO triggered reports

On XEHP platforms user is not able to find MMIO triggered reports in the
OA buffer since i915 squashes the context ID fields. These context ID
fields hold the MMIO trigger markers.

Update logic to not squash the context ID fields of MMIO triggered
reports.

Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231219000543.1087706-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 0c68132df6e66244acec1bb5b9e19b0751414389)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 36f27350 27-Oct-2023 Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>

i915/perf: Fix NULL deref bugs with drm_dbg() calls

When i915 perf interface is not available dereferencing it will lead to
NULL dereferences.

As returning -ENOTSUPP is pretty clear return when perf interface is not
available.

Fixes: 2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v6.0+
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027172822.2753059-1-harshit.m.mogalapalli@oracle.com
[tursulin: added stable tag]


# ee11d2d3 18-Dec-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Update handling of MMIO triggered reports

On XEHP platforms user is not able to find MMIO triggered reports in the
OA buffer since i915 squashes the context ID fields. These context ID
fields hold the MMIO trigger markers.

Update logic to not squash the context ID fields of MMIO triggered
reports.

Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231219000543.1087706-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 0c68132df6e66244acec1bb5b9e19b0751414389)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 471aa951 27-Oct-2023 Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>

i915/perf: Fix NULL deref bugs with drm_dbg() calls

When i915 perf interface is not available dereferencing it will lead to
NULL dereferences.

As returning -ENOTSUPP is pretty clear return when perf interface is not
available.

Fixes: 2fec539112e8 ("i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v6.0+
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027172822.2753059-1-harshit.m.mogalapalli@oracle.com
[tursulin: added stable tag]
(cherry picked from commit 36f27350ff745bd228ab04d7845dfbffc177a889)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7eeaedf7 02-Aug-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Determine context valid in OA reports

When supporting OA for TGL, it was seen that the context valid bit in
the report ID was not defined, however revisiting the spec seems to have
this bit defined. The bit is used to determine if a context is valid on
a context switch and is essential to determine active and idle periods
for a context. Re-enable the context valid bit for gen12 platforms.

BSpec: 52196 (description of report_id)

v2: Include BSpec reference (Ashutosh)

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230802202854.1224547-1-umesh.nerlige.ramappa@intel.com


# ccee9a2a 02-Oct-2023 Joel Granados <j.granados@samsung.com>

intel drm: Remove now superfluous sentinel element from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from oa_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 039adf39 09-Oct-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: More use of GT specific print helpers

Update a bunch of GT related print messages in non-GT files to use the
GT specific helpers.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231009183802.673882-3-John.C.Harrison@Intel.com


# a383a021 19-Sep-2023 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail

There is no reason to add gtt_offset to the cached head/tail pointers
stream->oa_buffer.head and stream->oa_buffer.tail. This causes the code to
constantly add gtt_offset and subtract gtt_offset and is error
prone.

It is much simpler to maintain stream->oa_buffer.head and
stream->oa_buffer.tail without adding gtt_offset to them and just allow for
the gtt_offset when reading/writing from/to HW registers.

v2: Minor tweak to commit message due to dropping patch in previous series

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920040211.2351279-1-ashutosh.dixit@intel.com


# 14128d64 21-Aug-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Replace several IS_METEORLAKE with proper IP version checks

Many of the IS_METEORLAKE conditions throughout the driver are supposed
to be checks for Xe_LPG and/or Xe_LPM+ IP, not for the MTL platform
specifically. Update those checks to ensure that the code will still
operate properly if/when these IP versions show up on future platforms.

v2:
- Update two more conditions (one for pg_enable, one for MTL HuC
compatibility).
v3:
- Don't change GuC/HuC compatibility check, which sounds like it truly
is specific to the MTL platform. (Gustavo)
- Drop a non-lineage workaround number for the OA timestamp frequency
workaround. (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-20-matthew.d.roper@intel.com


# 81af8abe 21-Aug-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Eliminate IS_MTL_MEDIA_STEP

Stepping-specific media behavior shouldn't be tied to MTL as a platform,
but rather specifically to the Xe_LPM+ IP. Future non-MTL platforms may
re-use this IP and will need to follow the exact same logic and apply
the same workarounds. IS_MTL_MEDIA_STEP() is dropped in favor of
IS_MEDIA_GT_IP_STEP, which checks the media IP version associated with a
specific IP and also ensures that we're operating on the media GT, not
the primary GT.

v2:
- Switch to the IS_GT_IP_STEP macro.
v3:
- Switch back to long-form IS_MEDIA_GT_IP_STEP. (Jani)
v4:
- Build IS_MEDIA_GT_IP_STEP on top of IS_MEDIA_GT_IP_RANGE and
IS_MEDIA_STEP building blocks and name the parameters from/until
rather than begin/fixed.. (Jani)
v5:
- Tweak macro comment wording. (Gustavo)
- Add a check to catch NULL gt in IS_MEDIA_GT_IP_RANGE; this allows it
to be used safely on i915->media_gt, which may be NULL on some
platforms. (Gustavo)

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-16-matthew.d.roper@intel.com


# c9517783 16-Aug-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915/dg2: Drop Wa_16011777198

Wa_16011777198 only applies to pre-production steppings of DG2, which
we're no longer supporting. Remove the workaround and override_gucrc
handling, which is no longer needed. Since this was the final use of
IS_DG2_GRAPHICS_STEP, that macro can also be removed now.

v2:
- Include the promised removal of override_gucrc handling.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230816214824.548575-2-matthew.d.roper@intel.com


# cba94bbc 02-Aug-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Determine context valid in OA reports

When supporting OA for TGL, it was seen that the context valid bit in
the report ID was not defined, however revisiting the spec seems to have
this bit defined. The bit is used to determine if a context is valid on
a context switch and is essential to determine active and idle periods
for a context. Re-enable the context valid bit for gen12 platforms.

BSpec: 52196 (description of report_id)

v2: Include BSpec reference (Ashutosh)

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230802202854.1224547-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 7eeaedf79989a8f131939782832e21e9218ed2a0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# d3f23ab9 20-Jul-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: use direct alias for i915 in requests

i915_request contains direct alias to i915, there is no point to go via
rq->engine->i915.

v2: added missing rq.i915 initialization in measure_breadcrumb_dw.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720113002.1541572-1-andrzej.hajda@intel.com


# 2f42c5af 11-Jul-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915/perf: add sentinel to xehp_oa_b_counters

Arrays passed to reg_in_range_table should end with empty record.

The patch solves KASAN detected bug with signature:
BUG: KASAN: global-out-of-bounds in xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]
Read of size 4 at addr ffffffffa1555d90 by task perf/1518

CPU: 4 PID: 1518 Comm: perf Tainted: G U 6.4.0-kasan_438-g3303d06107f3+ #1
Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3223.D80.2305311348 05/31/2023
Call Trace:
<TASK>
...
xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]

Fixes: 0fa9349dda03 ("drm/i915/perf: complete programming whitelisting for XEHPSDV")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711153410.1224997-1-andrzej.hajda@intel.com


# 40b1588a 16-Jun-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Consider OA buffer boundary when zeroing out reports

For reports that are not powers of 2, reports at the end of the OA
buffer may get split across the buffer boundary. When zeroing out such
reports, take the split into consideration.

v2: Use OA_BUFFER_SIZE (Ashutosh)

Fixes: 09a36015d9a0 ("drm/i915/perf: Clear out entire reports after reading if not power of 2 size")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230616173402.699776-1-umesh.nerlige.ramappa@intel.com


# 785b3f66 11-Jul-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915/perf: add sentinel to xehp_oa_b_counters

Arrays passed to reg_in_range_table should end with empty record.

The patch solves KASAN detected bug with signature:
BUG: KASAN: global-out-of-bounds in xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]
Read of size 4 at addr ffffffffa1555d90 by task perf/1518

CPU: 4 PID: 1518 Comm: perf Tainted: G U 6.4.0-kasan_438-g3303d06107f3+ #1
Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3223.D80.2305311348 05/31/2023
Call Trace:
<TASK>
...
xehp_is_valid_b_counter_addr+0x2c7/0x350 [i915]

Fixes: 0fa9349dda03 ("drm/i915/perf: complete programming whitelisting for XEHPSDV")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711153410.1224997-1-andrzej.hajda@intel.com
(cherry picked from commit 2f42c5afb34b5696cf5fe79e744f99be9b218798)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# dde4c3d4 16-Jun-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Consider OA buffer boundary when zeroing out reports

For reports that are not powers of 2, reports at the end of the OA
buffer may get split across the buffer boundary. When zeroing out such
reports, take the split into consideration.

v2: Use OA_BUFFER_SIZE (Ashutosh)

Fixes: 09a36015d9a0 ("drm/i915/perf: Clear out entire reports after reading if not power of 2 size")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230616173402.699776-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 40b1588a750240cbe8a83117aa785d778749a77c)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 589f4924 05-Jun-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Do not add ggtt offset to hw_tail

ggtt offset for hw_tail is not required for the calculations, so drop
it.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230605193923.1836048-3-umesh.nerlige.ramappa@intel.com


# 9cc31938 05-Jun-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Drop the aging_tail logic in perf OA

On DG2, capturing OA reports while running heavy render workloads
sometimes results in invalid OA reports where 64-byte chunks inside
reports have stale values. Under memory pressure, high OA sampling rates
(13.3 us) and heavy render workload, occasionally, the OA HW TAIL
pointer does not progress as fast as the sampling rate. When these
glitches occur, the TAIL pointer takes approx. 200us to progress. While
this is expected behavior from the HW perspective, invalid reports are
not expected.

In oa_buffer_check_unlocked(), when we execute the if condition, we are
updating the oa_buffer.tail to the aging tail and then setting pollin
based on this tail value, however, we do not have a chance to rewind and
validate the reports prior to setting pollin. The validation happens
in a subsequent call to oa_buffer_check_unlocked(). If a read occurs
before this validation, then we end up reading reports up until this
oa_buffer.tail value which includes invalid reports. Though found on
DG2, this affects all platforms.

The aging tail logic is no longer necessary since we are explicitly
checking for landed reports.

Start by dropping the aging tail logic.

v2:
- Drop extra blank line
- Add reason to drop aging logic (Ashutosh)
- Add bug links (Ashutosh)
- rename aged_tail to read_tail
- Squash patches 3 and 1

v3: (Ashutosh)
- Remove extra spaces
- Remove gtt_offset from the pollin calculation
- s/Bug:/Link/ in commit message (checkpatch)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7484
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7757
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230605193923.1836048-2-umesh.nerlige.ramappa@intel.com


# 62fe3987 23-May-2023 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915/perf: Clear out entire reports after reading if not power of 2 size

Clearing out report id and timestamp as means to detect unlanded reports
only works if report size is power of 2. That is, only when report size is
a sub-multiple of the OA buffer size can we be certain that reports will
land at the same place each time in the OA buffer (after rewind). If report
size is not a power of 2, we need to zero out the entire report to be able
to detect unlanded reports reliably.

v2: Add Fixes tag (Umesh)

Fixes: 1cc064dce4ed ("drm/i915/perf: Add support for OA media units")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230523204042.4180641-1-ashutosh.dixit@intel.com
(cherry picked from commit 09a36015d9a0940214c080f95afc605c47648bbd)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 09a36015 23-May-2023 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915/perf: Clear out entire reports after reading if not power of 2 size

Clearing out report id and timestamp as means to detect unlanded reports
only works if report size is power of 2. That is, only when report size is
a sub-multiple of the OA buffer size can we be certain that reports will
land at the same place each time in the OA buffer (after rewind). If report
size is not a power of 2, we need to zero out the entire report to be able
to detect unlanded reports reliably.

v2: Add Fixes tag (Umesh)

Fixes: 1cc064dce4ed ("drm/i915/perf: Add support for OA media units")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230523204042.4180641-1-ashutosh.dixit@intel.com


# 9570b039 02-May-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915/perf: fix i915_perf_ioctl_version() kernel-doc

drivers/gpu/drm/i915/i915_perf.c:5307: warning: Function parameter or member 'i915' not described in 'i915_perf_ioctl_version'

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b93ddb95a15d1376936349b32c7facb35c76be82.1683041799.git.jani.nikula@intel.com


# 49f6f648 28-Mar-2023 Min Li <lm0963hack@gmail.com>

drm/i915: fix race condition UAF in i915_perf_add_config_ioctl

Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.

Signed-off-by: Min Li <lm0963hack@gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963hack@gmail.com
[tursulin: Manually added stable tag.]


# 86e11e30 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Wa_14017512683: Disable OAM if media C6 is enabled in BIOS

OAM does not work with media C6 enabled on some steppings of MTL.
Disable OAM if we detect that media C6 was enabled in bios.

v2: (Ashutosh)
- Remove drm_notice from the driver load path
- Log a drm_err when opening an OAM stream on affected steppings

v3:
- Initialize the engine group even if mc6 is enabled (Ashutosh)
- Checkpatch fix

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-12-umesh.nerlige.ramappa@intel.com


# 94d82e95 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Pass i915 object to perf revision helper

In some cases, perf revision may rely on specific steppings of a
platform. To determine the platform, pass i915 object to the perf
revision helper.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-11-umesh.nerlige.ramappa@intel.com


# 1cc064dc 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Add support for OA media units

MTL introduces additional OA units dedicated to media use cases. Add
support for programming these OA units by passing the media engine class
and instance parameters.

UMD specific changes for GPUvis support:
https://patchwork.freedesktop.org/patch/522827/?series=114023
https://patchwork.freedesktop.org/patch/522822/?series=114023
https://patchwork.freedesktop.org/patch/522826/?series=114023
https://patchwork.freedesktop.org/patch/522828/?series=114023
https://patchwork.freedesktop.org/patch/522816/?series=114023
https://patchwork.freedesktop.org/patch/522825/?series=114023

v2: (Ashutosh)
- check for IP_VER(12, 70) instead of MTL
- remove PERF_GROUP_OAG comment in mtl_oa_base
- remove oa_buffer.group
- use engine->oa_group->type in engine_supports_oa_format
- remove fw_domains and use FORCEWAKE_ALL
- remove MPES/MPEC comment
- s/xehp/mtl/ in b counter validation function name
- remove engine_supports_oa in __oa_engine_group
- remove warn_ON from __oam_engine_group
- refactor oa_init_groups and oa_init_regs
- assign g->type correctly
- use enum oa_type definition

v3: (Ashutosh)
- Drop oa_unit_functional as engine_supports_oa is enough

v4:
- s/DRM_DEBUG/drm_dbg/

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-10-umesh.nerlige.ramappa@intel.com


# c61d04c9 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Add engine class instance parameters to perf

One or more engines map to a specific OA unit. All reports from these
engines are captured in the OA buffer managed by this OA unit.

Current i915 OA implementation supports only the OAG unit. OAG primarily
caters to render engine, so i915 OA uses render as the default engine
in the OA implementation. Since there are more OA units on newer
hardware that map to other engines, allow user to pass engine class and
instance to select and program specific OA units.

UMD specific changes for GPUvis support:
https://patchwork.freedesktop.org/patch/522827/?series=114023
https://patchwork.freedesktop.org/patch/522822/?series=114023
https://patchwork.freedesktop.org/patch/522826/?series=114023
https://patchwork.freedesktop.org/patch/522828/?series=114023
https://patchwork.freedesktop.org/patch/522816/?series=114023
https://patchwork.freedesktop.org/patch/522825/?series=114023

v2: (Ashutosh)
- Clarify commit message
- Add drm_dbg
- Clarify uapi description

v3: (Ashutosh)
- Remove irrelevant info from the uapi comment

v4: Ensure engine class:instance is passed together (Ashutosh)
v5: Remove unnecessary quote (Ashutosh)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-9-umesh.nerlige.ramappa@intel.com


# 3c67ce06 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Handle non-power-of-2 reports

Some of the newer OA formats are not powers of 2. For those formats,
adjust the hw_tail accordingly when checking for new reports.

v2: (Ashutosh)
- Switch to OA_TAKEN for diff calculation
- Use OA_BUFFER_SIZE instead of the vma size
- Update comments

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-8-umesh.nerlige.ramappa@intel.com


# dbc9a5fb 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Parse 64bit report header formats correctly

Now that OA formats come in flavor of 64 bit reports, the report header
has 64 bit report-id, timestamp, context-id and gpu-ticks fields. When
filtering these reports, use the right width for these fields.

Note that upper dword of context id is reserved, so squash lower dword
only.

v2: (Ashutosh)
- Drop inline
- Update comment with dword definitions - report id and timestamp

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-7-umesh.nerlige.ramappa@intel.com


# 772a5803 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Fail modprobe if i915_perf_init fails on OOM

i915_perf_init can fail due to OOM. Fail driver init if i915_perf_init
fails.

v2: (Jani)
- Reorder patch in the series

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-6-umesh.nerlige.ramappa@intel.com


# 5f284e9c 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Group engines into respective OA groups

Now that we may have multiple OA units in a single GT as well as on
separate GTs, create an engine group that maps to a single OA unit.

v2: (Jani)
- Drop warning on ENOMEM
- Reorder patch in the series

v3: (Ashutosh)
- Remove unused members from perf structs
- Update comments
- Update engine_supports_oa check
- Just return 1 in num_perf_groups_per_gt for now
- Set engine->oa_group to NULL to begin with

v4: Use engine_supports_oa() check in oa_init_reg_state (Ashutosh)
v5: Rebase after dropping engine_supports_oa helper

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-5-umesh.nerlige.ramappa@intel.com


# 9919d119 23-Mar-2023 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Validate OA sseu config outside switch

Once OA supports media engine class:instance, the engine can only be
validated outside the switch since class and instance parameters are
separate entities. Since OA sseu config depends on engine
class:instance, validate OA sseu config outside the switch.

v2: (Ashutosh)
- Clarify commit message
- Use drm_dbg instead of DRM_DEBUG
- Reorder stack variables

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-4-umesh.nerlige.ramappa@intel.com


# 2810ac6c 23-Mar-2023 Chris Wilson <chris.p.wilson@linux.intel.com>

drm/i915/perf: Drop wakeref on GuC RC error

If we fail to adjust the GuC run-control on opening the perf stream,
make sure we unwind the wakeref just taken.

v2: Retain old goto label names (Ashutosh)
v3: Drop bitfield boolean

Fixes: 01e742746785 ("drm/i915/guc: Support OA when Wa_16011777198 is enabled")
Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-2-umesh.nerlige.ramappa@intel.com


# dc30c011 28-Mar-2023 Min Li <lm0963hack@gmail.com>

drm/i915: fix race condition UAF in i915_perf_add_config_ioctl

Userspace can guess the id value and try to race oa_config object creation
with config remove, resulting in a use-after-free if we dereference the
object after unlocking the metrics_lock. For that reason, unlocking the
metrics_lock must be done after we are done dereferencing the object.

Signed-off-by: Min Li <lm0963hack@gmail.com>
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230328093627.5067-1-lm0963hack@gmail.com
[tursulin: Manually added stable tag.]
(cherry picked from commit 49f6f6483b652108bcb73accd0204a464b922395)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 5c95b2d5 23-Mar-2023 Chris Wilson <chris.p.wilson@linux.intel.com>

drm/i915/perf: Drop wakeref on GuC RC error

If we fail to adjust the GuC run-control on opening the perf stream,
make sure we unwind the wakeref just taken.

v2: Retain old goto label names (Ashutosh)
v3: Drop bitfield boolean

Fixes: 01e742746785 ("drm/i915/guc: Support OA when Wa_16011777198 is enabled")
Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323225901.3743681-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 2810ac6c753d17ee2572ffb57fe2382a786a080a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# d0fa30be 12-Dec-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/mtl: Add OA support by enabling 32 bit OAG formats for MTL

Without an entry in oa_init_supported_formats, OA will not be functional
in MTL. Enable OA support by enabling 32 bit OAG formats for MTL.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20228

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212220902.1819159-5-umesh.nerlige.ramappa@intel.com


# d654ae8b 12-Dec-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/mtl: Update OA mux whitelist for MTL

0x20cc (WAIT_FOR_RC6_EXIT on other platforms) is repurposed on MTL. Use
a separate mux table to verify oa configs passed by user.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212220902.1819159-4-umesh.nerlige.ramappa@intel.com


# a6b44302 12-Dec-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/mtl: Add Wa_14015846243 to fix OA vs CS timestamp mismatch

Similar to ACM, OA timestamp that is part of the OA report is shifted
when compared to the CS timestamp. Add MTL to the WA.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212220902.1819159-3-umesh.nerlige.ramappa@intel.com


# a4b6e74c 12-Dec-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/mtl: Resize noa_wait BO size to save restore GPR regs

On MTL, gt->scratch was using stolen lmem. An MI_SRM to stolen lmem
caused a hang that was attributed to saving and restoring the GPR
registers used for noa_wait.

Add an additional page in noa_wait BO to save/restore GPR registers for
the noa_wait logic.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212220902.1819159-2-umesh.nerlige.ramappa@intel.com


# 95c713d7 23-Nov-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Do not parse context image for HSW

An earlier commit introduced a mechanism to parse the context image to
find the OA context control offset. This resulted in an NPD on haswell
when gem_context was passed into i915_perf_open_ioctl params. Haswell
does not support logical ring contexts, so ensure that the context image
is parsed only for platforms with logical ring contexts and also
validate lrc_reg_state.

v2: Fix build failure
v3: Fix checkpatch error

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7432
Fixes: a5c3a3cbf029 ("drm/i915/perf: Determine gen12 oa ctx offset at runtime")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221123235342.713068-1-umesh.nerlige.ramappa@intel.com


# 8e4ee5e8 30-Nov-2022 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap all access to i915_vma.node.start|size

We already wrap i915_vma.node.start for use with the GGTT, as there we
can perform additional sanity checks that the node belongs to the GGTT
and fits within the 32b registers. In the next couple of patches, we
will introduce guard pages around the objects _inside_ the drm_mm_node
allocation. That is we will offset the vma->pages so that the first page
is at drm_mm_node.start + vma->guard (not 0 as is currently the case).
All users must then not use i915_vma.node.start directly, but compute
the guard offset, thus all users are converted to use a
i915_vma_offset() wrapper.

The notable exceptions are the selftests that are testing exact
behaviour of i915_vma_pin/i915_vma_insert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-3-andi.shyti@linux.intel.com


# d4d4c6fb 23-Nov-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Do not parse context image for HSW

An earlier commit introduced a mechanism to parse the context image to
find the OA context control offset. This resulted in an NPD on haswell
when gem_context was passed into i915_perf_open_ioctl params. Haswell
does not support logical ring contexts, so ensure that the context image
is parsed only for platforms with logical ring contexts and also
validate lrc_reg_state.

v2: Fix build failure
v3: Fix checkpatch error

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7432
Fixes: a5c3a3cbf029 ("drm/i915/perf: Determine gen12 oa ctx offset at runtime")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221123235342.713068-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 95c713d722017b26e301303713d638e0b95b1f68)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 2a76fc89 19-Oct-2022 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: call i915_request_await_object from _i915_vma_move_to_active

Since almost all calls to i915_vma_move_to_active are prepended with
i915_request_await_object, let's call the latter from
_i915_vma_move_to_active by default and add flag allowing bypassing it.
Adjust all callers accordingly.
The patch should not introduce functional changes.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019215906.295296-2-andrzej.hajda@intel.com


# 801543b2 09-Nov-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: stop including i915_irq.h from i915_trace.h

Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.

Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.

Also some places actually needed i915_irq.h too.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com


# a10234fd 09-Nov-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Partial abandonment of legacy DRM logging macros

Convert some usages of legacy DRM logging macros into versions which tell
us on which device have the events occurred.

v2:
* Don't have struct drm_device as local. (Jani, Ville)

v3:
* Store gt, not i915, in workaround list. (John)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109104633.2579245-1-tvrtko.ursulin@linux.intel.com


# 07b444f5 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Enable OA for DG2

OA was disabled for DG2 as support was missing. Enable it back now.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-17-umesh.nerlige.ramappa@intel.com


# 0fa9349d 26-Oct-2022 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: complete programming whitelisting for XEHPSDV

We have an additional register to select which slices contribute to
OAG/OAG counter increments.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-16-umesh.nerlige.ramappa@intel.com


# 01e74274 26-Oct-2022 Vinay Belgaumkar <vinay.belgaumkar@intel.com>

drm/i915/guc: Support OA when Wa_16011777198 is enabled

On DG2, a w/a resets RCS/CCS before it goes into RC6. This breaks OA
since OA does not expect engine resets during its use. Fix it by
disabling RC6.

v2: (Ashutosh)
- Bring back slpc_unset_param helper
- Update commit msg
- Use with_intel_runtime_pm helper for set/unset

v3: (Ashutosh)
- Just use intel_uc_uses_guc_rc

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-15-umesh.nerlige.ramappa@intel.com


# bc7ed4d3 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Apply Wa_18013179988

OA reports in the OA buffer contain an OA timestamp field that helps
user calculate delta between 2 OA reports. The calculation relies on the
CS timestamp frequency to convert the timestamp value to nanoseconds.
The CS timestamp frequency is a function of the CTC_SHIFT value in
RPM_CONFIG0.

In DG2, OA unit assumes that the CTC_SHIFT is 3, instead of using the
actual value from RPM_CONFIG0. At the user level, this results in an
error in calculating delta between 2 OA reports since the OA timestamp
is not shifted in the same manner as CS timestamp. Also the periodicity
of the reports is different from what the user configured because of
mismatch in the CS and OA frequencies.

The issue also affects MI_REPORT_PERF_COUNT command.

To resolve this, return actual OA timestamp frequency to the user in
i915_getparam_ioctl, so that user can calculate the right OA exponent as
well as interpret the reports correctly.

MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893

v2:
- Use REG_FIELD_GET (Ashutosh)
- Update commit msg

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-13-umesh.nerlige.ramappa@intel.com


# ed6b25aa 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Add Wa_1508761755:dg2

Disable Clock gating in EU when gathering the events so that EU events
are not lost.

v2: Fix checkpatch issues
v3: User MCR helpers to write to MC reg
v4: Indent correctly (checkpatch)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-12-umesh.nerlige.ramappa@intel.com


# 90981da6 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Store a pointer to oa_format in oa_buffer

DG2 introduces OA reports with 64 bit report header fields. Perf OA
would need more information about the OA format in order to process such
reports. Store all OA format info in oa_buffer instead of just the size
and format-id.

v2: Drop format_size variable (Ashutosh)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-11-umesh.nerlige.ramappa@intel.com


# cc85345d 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Use gt-specific ggtt for OA and noa-wait buffers

User passes uabi engine class and instance to the perf OA interface. Use
gt corresponding to the engine to pin the buffers to the right ggtt.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-10-umesh.nerlige.ramappa@intel.com


# 2db609c0 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Replace gt->perf.lock with stream->lock for file ops

With multi-gt, user can access multiple OA buffers concurrently. Use
stream->lock instead of gt->perf.lock to serialize file operations.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-9-umesh.nerlige.ramappa@intel.com


# 9677a9f3 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Move gt-specific data from i915->perf to gt->perf

Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-8-umesh.nerlige.ramappa@intel.com


# a5a6d92f 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Simply use stream->ctx

Earlier code used exclusive_stream to check for user passed context.
Simplify this by accessing stream->ctx.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-7-umesh.nerlige.ramappa@intel.com


# cceb0849 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Enable bytes per clock reporting in OA

XEHPSDV and DG2 provide a way to configure bytes per clock vs commands
per clock reporting. Enable bytes per clock setting on enabling OA.

Bspec: 51762
Bspec: 52201

v2:
- Fix commit msg (Ashutosh)
- Fix checkpatch issues

v3:
- s/commands/bytes/ in code comment and commmit msg

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-6-umesh.nerlige.ramappa@intel.com


# a5c3a3cb 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Determine gen12 oa ctx offset at runtime

Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.

v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size

v3: (Ashutosh)
- Drop unnecessary use of double underscores
- fix find_reg_in_lri
- Return error if oa context offset is U32_MAX
- Error out if oa_ctx_ctrl_offset does not find offset

v4: (Ashutosh)
- Warn on odd MI LRI_LEN
- Remove unnecessary check for valid_oactxctrl_offset
- Drop valid_oactxctrl_offset macro

v5: Drop unrelated comment

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-5-umesh.nerlige.ramappa@intel.com


# 2d9da585 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Fix noa wait predication for DG2

Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-4-umesh.nerlige.ramappa@intel.com


# 81d5f7d9 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Add 32-bit OAG and OAR formats for DG2

Add new OA formats for DG2.

MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893

v2:
- Update commit title (Ashutosh)
- Coding style fixes (Lionel)
- 64 bit OA formats need UMD changes in GPUvis, drop for now and send in a
separate series with UMD changes

v3:
- Update commit message to drop 64 bit related description

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #1
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-3-umesh.nerlige.ramappa@intel.com


# 682aa437 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Fix OA filtering logic for GuC mode

With GuC mode of submission, GuC is in control of defining the context
id field that is part of the OA reports. To filter reports, UMD and KMD
must know what sw context id was chosen by GuC. There is not interface
between KMD and GuC to determine this, so read the upper-dword of
EXECLIST_STATUS to filter/squash OA reports for the specific context.

v2: Explain guc id stealing w.r.t OA use case

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-2-umesh.nerlige.ramappa@intel.com


# f1d8e2bf 07-Oct-2022 Colin Ian King <colin.i.king@gmail.com>

drm/i915/perf: remove redundant variable 'taken'

The assignment to variable taken is redundant and so it can be
removed as well as the variable too.

Cleans up clang-scan build warnings:
warning: Although the value stored to 'taken' is used in the enclosing
expression, the value is never actually read from 'taken'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221007195345.2749911-1-colin.i.king@gmail.com


# 6f10c4d6 29-Aug-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915/perf: replace BUG_ON() with WARN_ON()

Avoid BUG_ON(). Replace with WARN_ON() and early return.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220830093411.1511040-4-jani.nikula@intel.com


# ca437b45 07-Jul-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Disable OA sseu config param for gfx12.50+

The global sseu config is applicable only to gen11 platforms where
concurrent media, render and OA use cases may cause some subslices to be
turned off and hence lose NOA configuration. Ideally we want to return
ENODEV for non-gen11 platforms, however, this has shipped with gfx12, so
disable only for gfx12.50+.

v2: gfx12 is already shipped with this, disable for gfx12.50+ (Lionel)

v3: (Matt)
- Update commit message and replace "12.5" with "12.50"
- Replace DRM_DEBUG() with driver specific drm_dbg()

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707193002.2859653-2-umesh.nerlige.ramappa@intel.com


# 2fec5391 07-Jul-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Replace DRM_DEBUG with driver specific drm_dbg call

DRM_DEBUG is not the right debug call to use in i915 OA, replace it with
driver specific drm_dbg() call (Matt).

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707193002.2859653-1-umesh.nerlige.ramappa@intel.com


# 18fb42db 13-May-2022 Nathan Chancellor <nathan@kernel.org>

drm/i915: Fix CFI violation with show_dynamic_id()

When an attribute group is created with sysfs_create_group(), the
->sysfs_ops() callback is set to kobj_sysfs_ops, which sets the ->show()
callback to kobj_attr_show(). kobj_attr_show() uses container_of() to
get the ->show() callback from the attribute it was passed, meaning the
->show() callback needs to be the same type as the ->show() callback in
'struct kobj_attribute'.

However, show_dynamic_id() has the type of the ->show() callback in
'struct device_attribute', which causes a CFI violation when opening the
'id' sysfs node under drm/card0/metrics. This happens to work because
the layout of 'struct kobj_attribute' and 'struct device_attribute' are
the same, so the container_of() cast happens to allow the ->show()
callback to still work.

Change the type of show_dynamic_id() to match the ->show() callback in
'struct kobj_attributes' and update the type of sysfs_metric_id to
match, which resolves the CFI violation.

Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220513075136.1027007-1-tvrtko.ursulin@linux.intel.com


# 58606220 13-May-2022 Nathan Chancellor <nathan@kernel.org>

drm/i915: Fix CFI violation with show_dynamic_id()

When an attribute group is created with sysfs_create_group(), the
->sysfs_ops() callback is set to kobj_sysfs_ops, which sets the ->show()
callback to kobj_attr_show(). kobj_attr_show() uses container_of() to
get the ->show() callback from the attribute it was passed, meaning the
->show() callback needs to be the same type as the ->show() callback in
'struct kobj_attribute'.

However, show_dynamic_id() has the type of the ->show() callback in
'struct device_attribute', which causes a CFI violation when opening the
'id' sysfs node under drm/card0/metrics. This happens to work because
the layout of 'struct kobj_attribute' and 'struct device_attribute' are
the same, so the container_of() cast happens to allow the ->show()
callback to still work.

Change the type of show_dynamic_id() to match the ->show() callback in
'struct kobj_attributes' and update the type of sysfs_metric_id to
match, which resolves the CFI violation.

Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220513075136.1027007-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 18fb42db05a0b93ab5dd5eab5315e50eaa3ca620)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# dd4821ba 14-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915/lrc: move lrc_get_runtime() to intel_lrc.c

Move the static inline next to the only caller.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220214173810.2108975-1-jani.nikula@intel.com


# f2ed8ef3 14-Feb-2022 Ramalingam C <ramalingam.c@intel.com>

drm/i915/perf: Skip the i915_perf_init for dg2

i915_perf is not enabled for dg2 yet, hence skip the feature
initialization.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220215053115.6023-1-ramalingam.c@intel.com


# 5472b3f2 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: split out i915_file_private.h from i915_drv.h

Limit the scope of struct drm_i915_file_private to the files that
actually need it.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e375859dc1729a1b988036e4103e5b1bd48caa00.1644507885.git.jani.nikula@intel.com


# b508d01f 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: split out i915_gem_internal.h from i915_drv.h

We already have the i915_gem_internal.c file.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6715d1f3232c445990630bb3aac00f279f516fee.1644507885.git.jani.nikula@intel.com


# 0d6419e9 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Move GT registers to their own header file

This is a huge, chaotic mass of registers copied over as-is without any
real cleanup. We'll come back and organize these better, align on
consistent coding style, remove dead code, etc. in separate patches
later that will be easier to review.

v2:
- Add missing include in intel_pxp_irq.c
v3:
- Correct a few indentation errors (Lucas)
- Minor conflict resolution

Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com


# e71a7412 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Parameterize MI_PREDICATE registers

The various MI_PREDICATE registers have per-engine instances. Today we
only utilize the RCS0 instance of each, but that will likely change in
the future; switch to parameterized register definitions to make these
easier to work with going forward.

Of special note is MI_PREDICATE_RESULT_2; we only use it in one place in
the driver today in HSW-specific code. It turns out that the bspec
(page 94) lists two different offsets for this register on HSW; one is
in the standard location shared by all other platforms (base + 0x3bc)
and the other is an unusual location (0x2214). We're using the second,
non-standard offset in i915 today; that offset doesn't exist on any
other platforms (and it's not even 100% clear that it's correct for HSW)
so I've renamed the current non-standard definition to
HSW_MI_PREDICATE_RESULT_2; the new cross-platform parameterized macro
(which is still unused at the moment) uses the standard offset.

Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-5-matthew.d.roper@intel.com


# 7d296f36 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Parameterize R_PWR_CLK_STATE register definition

At the moment we only use R_PWR_CLK_STATE in the context of the RCS
engine, but upcoming support for compute engines will start using
instances relative to the CCS engine base offsets. Let's parameterize
the register and move it to the engine reg header.

Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-4-matthew.d.roper@intel.com


# 66a19a3a 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/perf: Express OA register ranges with i915_range

Let's use 'struct i915_range' to express sets of b-counter and mux
registers in the perf code. This makes the code more similar to how we
handle things like multicast register ranges, forcewake tables, shadow
tables, etc. and also lets us avoid needing symbolic register name
definitions for the various range end points. With this change, many of
the OA register definitions are no longer used in the code, so we can
drop their #define's for simplicity.

v2: Drop 'inline' from reg_in_range_table(). (Jani)

v3: Split the first range in gen12_oa_mux_regs[] so that 0xd08 isn't
whitelisted. (Umesh)

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-3-matthew.d.roper@intel.com


# 2ef6d3bf 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/perf: Move OA regs to their own header

The OA unit registers are only used by the perf code; move them to their
own header file.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-2-matthew.d.roper@intel.com


# 3a5d604f 12-Jan-2022 Colin Ian King <colin.king@intel.com>

i915: make array flex_regs static const

Don't populate the read-only array flex_regs on the stack but
instead it static const. Also makes the object code a little smaller.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220112223435.949071-1-colin.i.king@gmail.com


# 202b1f4c 10-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/gt: Move engine registers to their own header

Let's continue breaking up and cleaning up the massive i915_reg.h file
by moving all registers that are defined in relation to an engine base
to their own header.

There are probably a bunch of other "engine registers" that we haven't
moved yet (especially those that belong to the render engine in the
0x2??? range), but this is a relatively straightforward first step.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com


# 204129a2 04-Jan-2022 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Use to_gt() helper for GGTT accesses

GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220104223550.56135-1-andi.shyti@linux.intel.com


# e5a1fd99 21-Jan-2022 Luis Chamberlain <mcgrof@kernel.org>

i915: simplify subdirectory registration with register_sysctl()

There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
- {
- .procname = E1,
- .maxlen = 0,
- .mode = 0555,
- .child = sysctls,
- },
- { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
- {
- .procname = E2,
- .maxlen = 0,
- .mode = 0555,
- .child = subdir,
- },
- { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) };
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Link: https://lkml.kernel.org/r/20211123202422.819032-3-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2cbc876d 14-Dec-2021 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Use to_gt() helper

Use to_gt() helper consistently throughout the codebase.
Pure mechanical s/i915->gt/to_gt(i915). No functional changes.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-10-andi.shyti@linux.intel.com


# 78f613ba 28-Jul-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: finish removal of CNL

With all the users removed, finish removing the CNL platform definitions.
We will leave the PCI IDs around as those are exposed to userspace.
Even if mesa doesn't support CNL anymore, let's avoid build breakages
due to changing the headers.

Also, due to drm/i915/gt still using IS_CANNONLAKE() let's just redefine
it instead of removing.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728215946.1573015-26-lucas.demarchi@intel.com


# 5dae69a9 28-Jul-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: remove GRAPHICS_VER == 10

Replace all remaining handling of GRAPHICS_VER {==,>=} 10 with
{==,>=} 11. With the removal of CNL, there is no platform with graphics
version equals 10.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728215946.1573015-24-lucas.demarchi@intel.com


# 50a9ea08 21-Jul-2021 Stuart Summers <stuart.summers@intel.com>

drm/i915/xehp: Handle new device context ID format

Xe_HP changes the format of the context ID from past platforms.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-9-matthew.d.roper@intel.com


# a04ea6ae 21-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

drm/i915: Use a table for i915_init/exit (v2)

If the driver was not fully loaded, we may still have globals lying
around. If we don't tear those down in i915_exit(), we'll leak a bunch
of memory slabs. This can happen two ways: use_kms = false and if we've
run mock selftests. In either case, we have an early exit from
i915_init which happens after i915_globals_init() and we need to clean
up those globals.

The mock selftests case is especially sticky. The load isn't entirely
a no-op. We actually do quite a bit inside those selftests including
allocating a bunch of mock objects and running tests on them. Once all
those tests are complete, we exit early from i915_init(). Perviously,
i915_init() would return a non-zero error code on failure and a zero
error code on success. In the success case, we would get to i915_exit()
and check i915_pci_driver.driver.owner to detect if i915_init exited early
and do nothing. In the failure case, we would fail i915_init() but
there would be no opportunity to clean up globals.

The most annoying part is that you don't actually notice the failure as
part of the self-tests since leaking a bit of memory, while bad, doesn't
result in anything observable from userspace. Instead, the next time we
load the driver (usually for next IGT test), i915_globals_init() gets
invoked again, we go to allocate a bunch of new memory slabs, those
implicitly create debugfs entries, and debugfs warns that we're trying
to create directories and files that already exist. Since this all
happens as part of the next driver load, it shows up in the dmesg-warn
of whatever IGT test ran after the mock selftests.

While the obvious thing to do here might be to call i915_globals_exit()
after selftests, that's not actually safe. The dma-buf selftests call
i915_gem_prime_export which creates a file. We call dma_buf_put() on
the resulting dmabuf which calls fput() on the file. However, fput()
isn't immediate and gets flushed right before syscall returns. This
means that all the fput()s from the selftests don't happen until right
before the module load syscall used to fire off the selftests returns
which is after i915_init(). If we call i915_globals_exit() in
i915_init() after selftests, we end up freeing slabs out from under
objects which won't get released until fput() is flushed at the end of
the module load syscall.

The solution here is to let i915_init() return success early and detect
the early success in i915_exit() and only tear down globals and nothing
else. This way the module loads successfully, regardless of the success
or failure of the tests. Because we've not enumerated any PCI devices,
no device nodes are created and it's entirely useless from userspace.
The only thing the module does at that point is hold on to a bit of
memory until we unload it and i915_exit() is called. Importantly, this
means that everything from our selftests has the ability to properly
flush out between i915_init() and i915_exit() because there is at least
one syscall boundary in between.

In order to handle all the delicate init/exit cases, we convert the
whole thing to a table of init/exit pairs and track the init status in
the new init_progress global. This allows us to ensure that i915_exit()
always tears down exactly the things that i915_init() successfully
initialized. We also allow early-exit of i915_init() without failure by
an init function returning > 0. This is useful for nomodeset, and
selftests. For the mock selftests, we convert them to always return 1
so we get the desired behavior of the driver always succeeding to load
the driver and then properly tearing down the partially loaded driver.

v2 (Tvrtko Ursulin):
- Guard init_funcs[i].exit with GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs))
v2 (Daniel Vetter):
- Update the docstring for i915.mock_selftests

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721152358.2893314-4-jason@jlekstrand.net


# 046d1660 08-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

drm/i915/gem: Return an error ptr from context_lookup

We're about to start doing lazy context creation which means contexts
get created in i915_gem_context_lookup and we may start having more
errors than -ENOENT.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-23-jason@jlekstrand.net


# 651e7d48 05-Jun-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: replace IS_GEN and friends with GRAPHICS_VER

This was done by the following semantic patch:

@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)

@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E

@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E

@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E

@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)

@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)

@@
identifier def.id;
@@
- id
+ ver

It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com


# c92c36ed 21-May-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Move submission_method into intel_gt

Since we setup the submission method for the engines once, it is easy to
assign an enum and use that instead of probing into the backends.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210521183215.65451-3-matthew.brost@intel.com


# 73c1bf0f 11-May-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Enable OA formats for ADL_P

Enable relevant OA formats for ADL_P.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512042144.2089071-8-matthew.d.roper@intel.com


# ef4985ba 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Increase ww locking for perf.

We need to lock a few more objects, some temporarily,
add ww lock where needed.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-36-maarten.lankhorst@linux.intel.com


# be0bdd67 05-Mar-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Start hrtimer only if sampling the OA buffer

SAMPLE_OA parameter enables sampling of OA buffer and results in a call
to init the OA buffer which initializes the OA unit head/tail pointers.
The OA_EXPONENT parameter controls the periodicity of the OA reports in
the OA buffer and results in starting a hrtimer.

Before gen12, all use cases required the use of the OA buffer and i915
enforced this setting when vetting out the parameters passed. In these
platforms the hrtimer was enabled if OA_EXPONENT was passed. This worked
fine since it was implied that SAMPLE_OA is always passed.

With gen12, this changed. Users can use perf without enabling the OA
buffer as in OAR use cases. While an OAR use case should ideally not
start the hrtimer, we see that passing an OA_EXPONENT parameter will
start the hrtimer even though SAMPLE_OA is not specified. This results
in an uninitialized OA buffer, so the head/tail pointers used to track
the buffer are zero.

This itself does not fail, but if we ran a use-case that SAMPLED the OA
buffer previously, then the OA_TAIL register is still pointing to an old
value. When the timer callback runs, it ends up calculating a
wrong/large number of available reports. Since we do a spinlock_irq_save
and start processing a large number of reports, NMI watchdog fires and
causes a crash.

Start the timer only if SAMPLE_OA is specified.

v2:
- Drop SAMPLE OA check when appending samples (Ashutosh)
- Prevent read if OA buffer is not being sampled

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210305210947.58751-1-umesh.nerlige.ramappa@intel.com


# e074ffe6 10-Feb-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Drop the check for report reason in OA

After fixing the OA_TAIL_PTR corruption, there are no more reports with
reason field of zero. Drop the check for report reason.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210210190106.8586-1-umesh.nerlige.ramappa@intel.com


# 5e4b7385 08-Feb-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Add additional OA formats for gen12

Gen12 supports additional OA formats as compared to what was added
earlier. Include the additional OA formats.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210208174029.45621-3-umesh.nerlige.ramappa@intel.com


# 0f15c5b0 08-Feb-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Move OA formats to single array

Variations in OA formats in the different gens has led to creation of
several sparse arrays to store the formats.

Move oa formats into a single array and format_mask to check for
platform specific oa formats.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210208174029.45621-2-umesh.nerlige.ramappa@intel.com


# 77892f4f 08-Feb-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Store a mask of valid OA formats for a platform

Validity of an OA format is checked by using a sparse array of formats
per gen. Instead maintain a mask of supported formats for a platform in
the perf object.

v2: Use set_bit and test_bit style for the format_mask (Chris)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210208174029.45621-1-umesh.nerlige.ramappa@intel.com


# 6a77c6bb 05-Mar-2021 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

i915/perf: Start hrtimer only if sampling the OA buffer

SAMPLE_OA parameter enables sampling of OA buffer and results in a call
to init the OA buffer which initializes the OA unit head/tail pointers.
The OA_EXPONENT parameter controls the periodicity of the OA reports in
the OA buffer and results in starting a hrtimer.

Before gen12, all use cases required the use of the OA buffer and i915
enforced this setting when vetting out the parameters passed. In these
platforms the hrtimer was enabled if OA_EXPONENT was passed. This worked
fine since it was implied that SAMPLE_OA is always passed.

With gen12, this changed. Users can use perf without enabling the OA
buffer as in OAR use cases. While an OAR use case should ideally not
start the hrtimer, we see that passing an OA_EXPONENT parameter will
start the hrtimer even though SAMPLE_OA is not specified. This results
in an uninitialized OA buffer, so the head/tail pointers used to track
the buffer are zero.

This itself does not fail, but if we ran a use-case that SAMPLED the OA
buffer previously, then the OA_TAIL register is still pointing to an old
value. When the timer callback runs, it ends up calculating a
wrong/large number of available reports. Since we do a spinlock_irq_save
and start processing a large number of reports, NMI watchdog fires and
causes a crash.

Start the timer only if SAMPLE_OA is specified.

v2:
- Drop SAMPLE OA check when appending samples (Ashutosh)
- Prevent read if OA buffer is not being sampled

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210305210947.58751-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit be0bdd67fda9468156c733976688f6487d0c42f7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f170523a 22-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Consolidate the CS timestamp clocks

Pull the GT clock information [used to derive CS timestamps and PM
interval] under the GT so that is it local to the users. In doing so, we
consolidate the two references for the same information, of which the
runtime-info took note of a potential clock source override and scaling
factors.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201223122359.22562-2-chris@chris-wilson.co.uk


# a0d3fdb6 18-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Split logical ring contexts from execlist submission

Split the definition, construction and updating of the Logical Ring
Context from the execlist submission interface. The LRC is used by the
HW, irrespective of our different submission backends.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201219020343.22681-1-chris@chris-wilson.co.uk


# 45233ab2 16-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Move gen8 CS emitters into gen8_engine_cs.h

Reduce the pollution of intel_engine.h by moving gen8_emit_pipe_control
and friends to gen8_engine_cs.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201216135452.6063-1-chris@chris-wilson.co.uk


# 70a2b431 09-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Rename lrc.c to execlists_submission.c

We want to separate the utility functions for controlling the logical
ring context from the execlists submission mechanism (which is an
overgrown scheduler).

This is similar to Daniele's work to split up the files, but being
selfish I wanted to base it after my own changes to intel_lrc.c petered
out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201209233618.4287-2-chris@chris-wilson.co.uk


# fa5d598b 25-Nov-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: also include Gen11 in OATAILPTR workaround

CI shows this workaround is also needed on Gen11.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 059a0beb486344 ("drm/i915/perf: workaround register corruption in OATAILPTR")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126105155.540350-1-lionel.g.landwerlin@intel.com


# 8d989f44 04-Nov-2020 Deepak R Varma <mh12gx2825@gmail.com>

drm/i915/perf: replace idr_init() by idr_init_base()

idr_init() uses base 0 which is an invalid identifier. The new function
idr_init_base allows IDR to set the ID lookup from base 1. This avoids
all lookups that otherwise starts from 0 since 0 is always unused.

References: commit 6ce711f27500 ("idr: Make 1-based IDRs more efficient")

Signed-off-by: Deepak R Varma <mh12gx2825@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104150339.GA68663@localhost


# dd0e2193 25-Nov-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: also include Gen11 in OATAILPTR workaround

CI shows this workaround is also needed on Gen11.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 059a0beb486344 ("drm/i915/perf: workaround register corruption in OATAILPTR")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126105155.540350-1-lionel.g.landwerlin@intel.com
(cherry picked from commit fa5d598b8cbab0af92bac48fd60e74a893550923)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 0305613d 17-Nov-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: workaround register corruption in OATAILPTR

After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.

When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.

The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.

This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117130124.829979-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 059a0beb486344a577ff476acce75e69eab704be)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 059a0beb 17-Nov-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: workaround register corruption in OATAILPTR

After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.

When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.

The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.

This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117130124.829979-1-lionel.g.landwerlin@intel.com


# e9d2871f 16-Nov-2020 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

drm: fix some kernel-doc markups

Some identifiers have different names between their prototypes
and the kernel-doc markup.

Others need to be fixed, as kernel-doc markups should use this format:
identifier - description

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org


# f00ecc2e 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Convert i915_perf to ww locking as well

We have the ordering of timeline->mutex vs resv_lock wrong,
convert the i915_pin_vma and intel_context_pin as well to
future-proof this.

We may need to do future changes to do this more transaction-like,
and only get down to a single i915_gem_ww_ctx, but for now this
should work.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-18-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# aee62e02 09-Jul-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Use GTT when saving/restoring engine GPR

MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
translation to use when saving restoring the engine general purpose
registers to and from the GT scratch. Since GT scratch is mapped to
ggtt, we need to set an additional bit in the command to use GTT.

Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200709224504.11345-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit e43ff99c8deda85234e6233e0f4af6cb09566a37)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e43ff99c 09-Jul-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Use GTT when saving/restoring engine GPR

MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_MEM need to know which
translation to use when saving restoring the engine general purpose
registers to and from the GT scratch. Since GT scratch is mapped to
ggtt, we need to set an additional bit in the command to use GTT.

Fixes: daed3e44396d17 ("drm/i915/perf: implement active wait for noa configurations")
Suggested-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200709224504.11345-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 89d19b2b 08-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release shortlived maps of longlived objects

Some objects we map once during their construction, and then never
access their mappings again, even if they are kept around for the
duration of the driver. Keeping those pages mapped, often vmapped, is
therefore wasteful and we should release the maps as soon as we no
longer need them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-3-chris@chris-wilson.co.uk


# 0b6613c6 07-Jul-2020 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915/sseu: Move sseu_info under gt_info

SSEUs are a GT capability, so track them under gt_info.

Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com


# c1e8d7c6 08-Jun-2020 Michel Lespinasse <walken@google.com>

mmap locking API: convert mmap_sem comments

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 802a5820 02-Mar-2020 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Extract i915_cs_timestamp_{ns_to_ticks,tick_to_ns}()

Pull the code to do the CS timestamp ns<->ticks conversion into
helpers and use them all over.

The check in i915_perf_noa_delay_set() seems a bit dubious,
so we switch it to do what I assume it wanted to do all along
(ie. make sure the resulting delay in CS timestamp ticks
doesn't exceed 32bits)?

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-5-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 56f1b31f 02-Mar-2020 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Store CS timestamp frequency in Hz

kHz isn't accurate enough for storing the CS timestamp
frequency on some of the platforms. Store the value
in Hz instead.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


# 2e270158 02-Mar-2020 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Nuke pointless div by 64bit

Bunch of places use a 64bit divisor needlessly. Switch
to 32bit divisor.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1bc6a601 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track inflight CCID

The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.

However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.

Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk
(cherry picked from commit 5c4a53e3b1cbc38d0906e382f1037290658759bb)
(cherry picked from commit 134711240307d5586ae8e828d2699db70a8b74f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 53b2622e 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Avoid reusing the same logical CCID

The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.

To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
(cherry picked from commit 2632f174a2e1a5fd40a70404fa8ccfd0b1f79ebd)
(cherry picked from commit a4b70fcc587860f4b972f68217d8ebebe295ec15)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 598caf1a 23-Apr-2020 Al Viro <viro@zeniv.linux.org.uk>

i915: alloc_oa_regs(): get rid of pointless access_ok()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5c4a53e3 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track inflight CCID

The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.

However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.

Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk


# 2632f174 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Avoid reusing the same logical CCID

The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.

To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk


# b4892e44 23-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Make define for lrc state offset

More often than not, we need a byte offset into lrc
register state from the start of the hw state. Make it so.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423182355.21837-3-mika.kuoppala@linux.intel.com


# 4e3d3456 02-Apr-2020 Alexey Budankov <alexey.budankov@linux.intel.com>

drm/i915/perf: Open access for CAP_PERFMON privileged process

Open access to i915_perf monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principle of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
principle of least privilege: A security design principle that states
that a process or program be granted only those privileges (e.g.,
capabilities) necessary to accomplish its legitimate function, and only
for the time that such privileges are actually required)

For backward compatibility reasons access to i915_events subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure i915_events monitoring is discouraged with respect to CAP_PERFMON
capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# bcad588d 08-Apr-2020 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915/perf: Do not clear pollin for small user read buffers

It is wrong to block the user thread in the next poll when OA data is
already available which could not fit in the user buffer provided in
the previous read. In several cases the exact user buffer size is not
known. Blocking user space in poll can lead to data loss when the
buffer size used is smaller than the available data.

This change fixes this issue and allows user space to read all OA data
even when using a buffer size smaller than the available data using
multiple non-blocking reads rather than staying blocked in poll till
the next timer interrupt.

v2: Fix ret value for blocking reads (Umesh)
v3: Mistake during patch send (Ashutosh)
v4: Remove -EAGAIN from comment (Umesh)
v5: Improve condition for clearing pollin and return (Lionel)
v6: Improve blocking read loop and other cleanups (Lionel)
v7: Added Cc stable

Testcase: igt/perf/polling-small-buf
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com
(cherry-picked from commit 6352219c39c04ed3f9a8d1cf93f87c21753a213e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 442dbc5c 06-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make exclusive awaits on i915_active optional

Later use will require asynchronous waits on the active timelines, but
will not utilize an async wait on the exclusive channel. Make the await
on the exclusive fence explicit in the selection flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200406155840.1728-1-chris@chris-wilson.co.uk


# 6352219c 02-Apr-2020 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915/perf: Do not clear pollin for small user read buffers

It is wrong to block the user thread in the next poll when OA data is
already available which could not fit in the user buffer provided in
the previous read. In several cases the exact user buffer size is not
known. Blocking user space in poll can lead to data loss when the
buffer size used is smaller than the available data.

This change fixes this issue and allows user space to read all OA data
even when using a buffer size smaller than the available data using
multiple non-blocking reads rather than staying blocked in poll till
the next timer interrupt.

v2: Fix ret value for blocking reads (Umesh)
v3: Mistake during patch send (Ashutosh)
v4: Remove -EAGAIN from comment (Umesh)
v5: Improve condition for clearing pollin and return (Lionel)
v6: Improve blocking read loop and other cleanups (Lionel)
v7: Added Cc stable

Testcase: igt/perf/polling-small-buf
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com


# d16e137e 29-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: don't read head/tail pointers outside critical section

Reading or writing those fields should only happen under
stream->oa_buffer.ptr_lock.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330091411.37357-1-lionel.g.landwerlin@intel.com


# d7d50f80 27-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Schedule oa_config after modifying the contexts

We wish that the scheduler emit the context modification commands prior
to enabling the oa_config, for which we must explicitly inform it of the
ordering constraints. This is especially important as we now wait for
the final oa_config setup to be completed and as this wait may be on a
distinct context to the state modifications, we need that command packet
to be always last in the queue.

We borrow the i915_active for its ability to track multiple timelines
and the last dma_fence on each; a flexible dma_resv. Keeping track of
each dma_fence is important for us so that we can efficiently schedule
the requests and reprioritise as required.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200327112212.16046-3-chris@chris-wilson.co.uk


# 4ef10fe0 24-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add new open param to configure polling of OA buffer

This new parameter let's the application choose how often the OA
buffer should be checked on the CPU side for data availability. Longer
polling period tend to reduce CPU overhead if the application does not
care about somewhat real time data collection.

v2: Allow disabling polling completely with 0 value (Lionel)
v3: Version the new parameter (Joonas)
v4: Rebase (Umesh)
v5: Make poll delay value of 0 invalid (Umesh)
v6:
- Describe poll_oa_period (Ashutosh)
- Fix comment for new poll parameter (Lionel)
- Drop open_flags in read_properties_unlocked (Lionel)
- Rename uapi parameter (Ashutosh)
v7: Reword the comment in uapi (Ashutosh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-4-umesh.nerlige.ramappa@intel.com


# c51dbc6e 24-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: move pollin setup to non hw specific code

This isn't really gen specific stuff, so just move it to the common
code.

v2: Rebase (Umesh)
v3: Remove comment, pollin is a per stream state already (Ashutosh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-3-umesh.nerlige.ramappa@intel.com


# d1df41eb 24-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: rework aging tail workaround

We're about to introduce an options to open the perf stream, giving
the user ability to configure how often it wants the kernel to poll
the OA registers for available data.

Right now the workaround against the OA tail pointer race condition
requires at least twice the internal kernel polling timer to make any
data available.

This changes introduce checks on the OA data written into the circular
buffer to make as much data as possible available on the first
iteration of the polling timer.

v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
v3: (Umesh)
- Rebase
- Change report to report32 from below review
https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1
v4: (Ashutosh, Lionel)
- Fix checkpatch errors
- Fix aging_timestamp initialization
- Check for only one valid landed report
- Fix check for unlanded report
v5: (Ashutosh)
- Fix bug in accurately determining landed report.
- Optimize the check for landed reports by going as far as the
previously determined aged tail.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-2-umesh.nerlige.ramappa@intel.com


# c06aa1b4 09-Mar-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Invalidate OA TLB on when closing perf stream

On running several back to back perf capture sessions involving closing
and opening the perf stream, invalid OA reports are seen in the
beginning of the OA buffer in some sessions. Fix this by invalidating OA
TLB when the perf stream is closed or disabled on gen12.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit a639b0c15065df930467695b76ef38d5edaed049)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# a639b0c1 09-Mar-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Invalidate OA TLB on when closing perf stream

On running several back to back perf capture sessions involving closing
and opening the perf stream, invalid OA reports are seen in the
beginning of the OA buffer in some sessions. Fix this by invalidating OA
TLB when the perf stream is closed or disabled on gen12.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com


# 11ecbddd 17-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: introduce global sseu pinning

On Gen11 powergating half the execution units is a functional
requirement when using the VME samplers. Not fullfilling this
requirement can lead to hangs.

This unfortunately plays fairly poorly with the NOA requirements. NOA
requires a stable power configuration to maintain its configuration.

As a result using OA (and NOA feeding into it) so far has required us
to use a power configuration that can work for all contexts. The only
power configuration fullfilling this is powergating half the execution
units.

This makes performance analysis for 3D workloads somewhat pointless.

Failing to find a solution that would work for everybody, this change
introduces a new i915-perf stream open parameter that punts the
decision off to userspace. If this parameter is omitted, the existing
Gen11 behavior remains (half EU array powergating).

This change takes the initiative to move all perf related sseu
configuration into i915_perf.c

v2: Make parameter priviliged if different from default

v3: Fix context modifying its sseu config while i915-perf is enabled

v4: Always consider global sseu a privileged operation (Tvrtko)
Override req_sseu point in intel_sseu_make_rpcs() (Tvrtko)
Remove unrelated changes (Tvrtko)

v5: Some typos (Tvrtko)
Process sseu param in read_properties_unlocked() (Tvrtko)

v6: Actually commit the bits from v5...
Fixup some checkpath warnings

v7: Only compare engine uabi field (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-3-lionel.g.landwerlin@intel.com


# 371aba6e 17-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: remove redundant power configuration register override

The caller of i915_oa_init_reg_state() already sets this.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-2-lionel.g.landwerlin@intel.com


# 9aba9c18 17-Mar-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: remove generated code

A little bit of history :

Back when i915-perf was introduced (4.13), there was no way to
dynamically add new OA configurations to i915. Only the generated
configs baked in at build time were allowed.

It quickly became obvious that we would need to allow applications
to upload their own configurations, for instance to be able to test
new ones, and so by the next stable version (4.14) we added uAPIs
to allow uploading new configurations.

When adding that capability, we took the opportunity to remove most
HW configurations except the TestOa one which is a configuration
IGT would rely on to verify that the HW is outputting correct
values. At the time it made sense to have that confiuration in at
the same time a given HW platform added to the i915-perf driver.

Now that IGT has become the reference point for HW configurations (see
commit 53f8f541ca ("lib: Add i915_perf library"), previously this was
located in the GPUTop repository), the need for having those
configurations in i915-perf is gone.

On the Mesa side, we haven't relied on this test configuration for a
while. The MDAPI library always required 4.14 feature level and always
loaded its configuration into i915.

I'm sure nobody will miss this generated stuff in i915 :)

v2: Fix selftests by creating an empty config

v3: Fix unlocking on allocation error (Dan Carpenter)

v4: Fixup checkpatch warnings

v5: Fix incorrect unlock in error path (Umesh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-1-lionel.g.landwerlin@intel.com


# 08f56f8f 02-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Reintroduce wait on OA configuration completion

We still need to wait for the initial OA configuration to happen
before we enable OA report writes to the OA buffer.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356
Testcase: igt/perf/stream-open-close
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk
(cherry picked from commit 4b4e973d5eb89244b67d3223b60f752d0479f253)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4b4e973d 02-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Reintroduce wait on OA configuration completion

We still need to wait for the initial OA configuration to happen
before we enable OA report writes to the OA buffer.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356
Testcase: igt/perf/stream-open-close
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk


# d236e2ac 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context

The engine->kernel_context is a special case for request emission. Since
it is used as the barrier within the engine's wakeref, we must acquire the
wakeref before submitting a request to the kernel_context.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-3-chris@chris-wilson.co.uk


# a5af081d 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Mark up the racy use of perf->exclusive_stream

Inside the general i915_oa_init_reg_state() we avoid using the
perf->mutex. However, we rely on perf->exclusive_stream being valid to
access at that point, and for that we have to control the race with
disabling perf. This relies on the disabling being a heavy barrier that
inspects all active contexts, after marking the perf->exclusive_stream
as not available. This should ensure that there are no more concurrent
accesses to the perf->exclusive_stream as we destroy it.

Mark up the races around the perf->exclusive_stream so that they stand
out much more. (And hopefully we will be running kcsan to start
validating that the only races we have are carefully controlled.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-2-chris@chris-wilson.co.uk


# 0bf85735 18-Feb-2020 Wambui Karuga <wambui.karugax@gmail.com>

drm/i915/perf: conversion to struct drm_device based logging macros.

Manual conversion of instances of printk based drm logging macros to the
struct drm_device based logging macros in i915/i915_perf.c.
Also involves extraction of the struct drm_i915_private device from
various intel types for use in the macros.

Instances of the DRM_DEBUG printk macro were not converted due to the
lack of an analogous struct drm_device based logging macro.

v2: remove instances of DRM_DEBUG that were converted.

References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218173936.19664-1-wambui.karugax@gmail.com


# 00376ccf 30-Jan-2020 Wambui Karuga <wambui.karugax@gmail.com>

drm/i915: conversion to drm_device logging macros when drm_i915_private is present.

Converts various instances of the printk drm logging macros to the
struct drm_device based logging macros in the drm/i915 folder using the
following coccinelle script that transforms based on the existence of
the struct drm_i915_private device pointer:
@@
identifier fn, T;
@@

fn(...) {
...
struct drm_i915_private *T = ...;
<+...
(
-DRM_INFO(
+drm_info(&T->drm,
...)
|
-DRM_ERROR(
+drm_err(&T->drm,
...)
|
-DRM_WARN(
+drm_warn(&T->drm,
...)
|
-DRM_DEBUG_KMS(
+drm_dbg_kms(&T->drm,
...)
|
-DRM_DEBUG_DRIVER(
+drm_dbg(&T->drm,
...)
|
-DRM_DEBUG_ATOMIC(
+drm_dbg_atomic(&T->drm,
...)
)
...+>
}

@@
identifier fn, T;
@@

fn(...,struct drm_i915_private *T,...) {
<+...
(
-DRM_INFO(
+drm_info(&T->drm,
...)
|
-DRM_ERROR(
+drm_err(&T->drm,
...)
|
-DRM_WARN(
+drm_warn(&T->drm,
...)
|
-DRM_DEBUG_DRIVER(
+drm_dbg(&T->drm,
...)
|
-DRM_DEBUG_KMS(
+drm_dbg_kms(&T->drm,
...)
|
-DRM_DEBUG_ATOMIC(
+drm_dbg_atomic(&T->drm,
...)
)
...+>
}

Checkpatch warnings were fixed manually.

Instances of the DRM_DEBUG macro were not converted due to lack of a
consensus of an analogous struct drm_device based macro.

References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131093416.28431-2-wambui.karugax@gmail.com


# 6f280b13 23-Jan-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Fix OA context id overlap with idle context id

Engine context pinned in perf OA was set to same context id as
the idle context. Set the context id to an unused value.

Clear the sw context id field in lrc descriptor before ORing with
ce->tag (Chris)

Closes: https://gitlab.freedesktop.org/drm/intel/issues/756
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124013701.40609-1-umesh.nerlige.ramappa@intel.com


# a9f236d1 14-Jan-2020 Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>

drm/i915: Make WARN* drm specific where uncore or stream ptr is available

drm specific WARN* calls include device information in the
backtrace, so we know what device the warnings originate from.

Covert all the calls of WARN* with device specific drm_WARN*
variants in functions where intel_uncore/i915_perf_stream struct
pointer is readily available.

The conversion was done automatically with below coccinelle semantic
patch. checkpatch errors/warnings are fixed manually.

@@
identifier func, T;
@@
func(...) {
...
struct intel_uncore *T = ...;
<...
(
-WARN(
+drm_WARN(&T->i915->drm,
...)
|
-WARN_ON(
+drm_WARN_ON(&T->i915->drm,
...)
|
-WARN_ONCE(
+drm_WARN_ONCE(&T->i915->drm,
...)
|
-WARN_ON_ONCE(
+drm_WARN_ON_ONCE(&T->i915->drm,
...)
)
...>

}

@@
identifier func, T;
@@
func(struct intel_uncore *T,...) {
<...
(
-WARN(
+drm_WARN(&T->i915->drm,
...)
|
-WARN_ON(
+drm_WARN_ON(&T->i915->drm,
...)
|
-WARN_ONCE(
+drm_WARN_ONCE(&T->i915->drm,
...)
|
-WARN_ON_ONCE(
+drm_WARN_ON_ONCE(&T->i915->drm,
...)
)
...>

}

@@
identifier func, T;
@@
func(struct i915_perf_stream *T,...) {
+struct drm_i915_private *i915 = T->perf->i915;
<+...
(
-WARN(
+drm_WARN(&i915->drm,
...)
|
-WARN_ON(
+drm_WARN_ON(&i915->drm,
...)
|
-WARN_ONCE(
+drm_WARN_ONCE(&i915->drm,
...)
|
-WARN_ON_ONCE(
+drm_WARN_ON_ONCE(&i915->drm,
...)
)
...+>

}

command: ls drivers/gpu/drm/i915/*.c | xargs spatch --sp-file <script> \
--linux-spacing --in-place

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115034455.17658-11-pankaj.laxminarayan.bharadiya@intel.com


# feed5c7b 09-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin the context as we work on it

Since we now allow the intel_context_unpin() to run unserialised, we
risk our operations under the intel_context_lock_pinned() being run as
the context is unpinned (and thus invalidating our state). We can
atomically acquire the pin, testing to see if it is pinned in the
process, thus ensuring that the state remains consistent during the
course of the whole operation.

Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109085142.871563-1-chris@chris-wilson.co.uk


# e6ba7648 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915->kernel_context

Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).

Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.

GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk


# 9f3ccd40 20-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop GEM context as a direct link from i915_request

Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk


# 3dc716fd 13-Dec-2019 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915/perf: Register sysctl path globally

We do not require to register the sysctl paths per instance,
so making registration global.

v2: make sysctl path register and unregister function driver
specific (Tvrtko and Lucas).

Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-1-venkata.s.dhanalakota@intel.com


# 177e876a 06-Dec-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Configure OAR for specific context

Gen12 supports saving/restoring render counters per context. Apply OAR
configuration only for the context that is passed in to perf.

v2:
- Fix OACTXCONTROL value to only stop/resume counters.
- Remove gen12_update_reg_state_unlocked as power state is already
applied by the caller.

v3: (Lionel)
- Move register initialization into the array
- Assume a valid oa_config in enable_metric_set

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-2-umesh.nerlige.ramappa@intel.com
(cherry picked from commit ccdeed497042676e13fc1625e2a341880eff5da5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2a264a0f 06-Dec-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Allow non-privileged access when OA buffer is not sampled

SAMPLE_OA_REPORT enables sampling of OA reports from the OA buffer.
Since reports from OA buffer had system wide visibility, collecting
samples from the OA buffer was a privileged operation on previous
platforms. Prior to TGL, it was also necessary to sample the OA buffer
to normalize reports from MI REPORT PERF COUNT.

TGL has a dedicated OAR unit to sample perf reports for a specific
render context. This removes the necessity to sample OA buffer.

- If not sampling the OA buffer, allow non-privileged access. An earlier
patch allows the non-privilege access:
https://patchwork.freedesktop.org/patch/337716/?series=68582&rev=1
- Clear up the path for non-privileged access in this patch

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 322d56aa3145a28445907ecc638a2c3aa3295c6b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ccdeed49 06-Dec-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Configure OAR for specific context

Gen12 supports saving/restoring render counters per context. Apply OAR
configuration only for the context that is passed in to perf.

v2:
- Fix OACTXCONTROL value to only stop/resume counters.
- Remove gen12_update_reg_state_unlocked as power state is already
applied by the caller.

v3: (Lionel)
- Move register initialization into the array
- Assume a valid oa_config in enable_metric_set

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-2-umesh.nerlige.ramappa@intel.com


# 322d56aa 06-Dec-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Allow non-privileged access when OA buffer is not sampled

SAMPLE_OA_REPORT enables sampling of OA reports from the OA buffer.
Since reports from OA buffer had system wide visibility, collecting
samples from the OA buffer was a privileged operation on previous
platforms. Prior to TGL, it was also necessary to sample the OA buffer
to normalize reports from MI REPORT PERF COUNT.

TGL has a dedicated OAR unit to sample perf reports for a specific
render context. This removes the necessity to sample OA buffer.

- If not sampling the OA buffer, allow non-privileged access. An earlier
patch allows the non-privilege access:
https://patchwork.freedesktop.org/patch/337716/?series=68582&rev=1
- Clear up the path for non-privileged access in this patch

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-1-umesh.nerlige.ramappa@intel.com


# c415ef2a 03-Dec-2019 Mao Wenan <maowenan@huawei.com>

drm/i915/perf: drop pointless static qualifier in i915_perf_add_config_ioctl()

There is no need to have the 'T *v' variable static
since new value always be assigned before use it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204010154.152396-1-maowenan@huawei.com


# de5825be 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Serialise with engine-pm around requests on the kernel_context

As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk


# 7e89d508 13-Nov-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: don't forget noa wait after oa config

I'm observing incoherence metric values, changing from run to run.

It appears the patches introducing noa wait & reconfiguration from
command stream switched places in the series multiple times during the
review. This lead to the dependency of one onto the order to go
missing...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 93937659dc644f708def8fa58cb63c5c9f499f26)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# dd590f68 14-Nov-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: Add preemption check while waiting for OA

While we're waiting for the OA configuration to apply, let's give a
chance to other contexts that might need to run other workloads.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191114140224.21818-1-lionel.g.landwerlin@intel.com


# 93937659 13-Nov-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: don't forget noa wait after oa config

I'm observing incoherence metric values, changing from run to run.

It appears the patches introducing noa wait & reconfiguration from
command stream switched places in the series multiple times during the
review. This lead to the dependency of one onto the order to go
missing...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15d0ace1f876 ("drm/i915/perf: execute OA configuration from command stream")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com


# 2b3c7f0d 11-Nov-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: always consider holding preemption a privileged op

The ordering of the checks in the existing code can lead to holding
preemption not being considered as privileged op.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9cd20ef7803c ("drm/i915/perf: allow holding preemption on filtered ctx")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 0b0120d4c7b013eba59b33254febc0a6e4049e13)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 0b0120d4 11-Nov-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: always consider holding preemption a privileged op

The ordering of the checks in the existing code can lead to holding
preemption not being considered as privileged op.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9cd20ef7803c ("drm/i915/perf: allow holding preemption on filtered ctx")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com


# 9278bbb6 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Reverse a ternary to make sparse happy

Avoid

drivers/gpu/drm/i915/i915_perf.c:2442:85: warning: dubious: x | !y

simply by inverting the predicate and reversing the ternary.

v2: Move the long lines into their own function so there is no confusion
on operator precedence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101192116.12647-1-chris@chris-wilson.co.uk


# 00a7f0d7 25-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/tgl: Add perf support on TGL

The design of the OA unit has been split into several units. We now
have a global unit (OAG) and a render specific unit (OAR). This leads
to some changes on how we program things. Some details :

OAR:
- has its own set of counter registers, they are per-context
saved/restored
- counters are not written to the circular OA buffer
- a snapshot of the counters can be acquired with
MI_RECORD_PERF_COUNT, or a single counter can be read with
MI_STORE_REGISTER_MEM.

OAG:
- has global counters that increment across context switches
- counters are written into the circular OA buffer (if requested)

v2: Fix checkpatch warnings on code style (Lucas)
v3: (Umesh)
- Update register from which tail, status and head are read
- Update logic to sample context reports
- Update whitelist mux and b counter regs
v4: Fix a bug when updating context image for new contexts (Umesh)
v5: Squash patch enabling save/restore of counters into context image

We want this so we can preempt performance queries and keep the
system responsive even when long running queries are ongoing. We
avoid doing it for all contexts.

- use LRI to modify context control (Chris)
- use MASKED_FIELD to program just the masked bits (Chris)
- disable save/restore of counters on cleanup (Chris)
v6: Do not use implicit parameters (Chris)

BSpec: 28727, 30021

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Chris Wilson <chris.p.wilson@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-2-umesh.nerlige.ramappa@intel.com


# fc215230 25-Oct-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Add helper macros for comparing with whitelisted registers

Add helper macros for range and equality comparisons and use them to
check with whitelisted registers in oa configurations.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-1-umesh.nerlige.ramappa@intel.com


# 19c17b76 28-Oct-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/execlists: Use vfunc to check engine submission mode

While processing CSB there is no need to look at GuC submission
settings, just check if engine is configured for execlists mode.

While today GuC submission is disabled it's settings are still
based on modparam values that might not correctly reflect actual
submission status in case of any fallback. Until that is fully
fixed, use alternate method to confirm that engine really runs in
execlists mode by comparing set_default_submission vfunc.

v2: add other immediate use of new helper

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com


# 2871ea85 24-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Split intel_ring_submission

Split the legacy submission backend from the common CS ring buffer
handling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk


# 0587152b 22-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop assertion that ce->pin_mutex guards state updates

The actual conditions are that we know the GPU is not accessing the
context, and we hold a pin on the context image to allow CPU access. We
used a fake lock on ce->pin_mutex so that we could try and use lockdep
to assert that access is serialised, but the various different
hardirq/softirq contexts where we need to *fake* holding the pin_mutex
are causing more trouble.

Still it would be nice if we did have a way to reassure ourselves that
the direct update to the context image is serialised with GPU execution.
In the meantime, stop lockdep complaining about false irq inversions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk


# 8814c6d0 19-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix oa config reconfiguration

The current logic just reapplies the same configuration already stored
into stream->oa_config instead of the newly selected one.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7831e9a965ea ("drm/i915/perf: Allow dynamic reconfiguration of the OA stream")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191019214647.27866-1-lionel.g.landwerlin@intel.com


# 9cd20ef7 14-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: allow holding preemption on filtered ctx

We would like to make use of perf in Vulkan. The Vulkan API is much
lower level than OpenGL, with applications directly exposed to the
concept of command buffers (pretty much equivalent to our batch
buffers). In Vulkan, queries are always limited in scope to a command
buffer. In OpenGL, the lack of command buffer concept meant that
queries' duration could span multiple command buffers.

With that restriction gone in Vulkan, we would like to simplify
measuring performance just by measuring the deltas between the counter
snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
more complex scheme we currently have in the GL driver, using 2
MI_RECORD_PERF_COUNT commands and doing some post processing on the
stream of OA reports, coming from the global OA buffer, to remove any
unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.

Disabling preemption only apply to a single context with which want to
query performance counters for and is considered a privileged
operation, by default protected by CAP_SYS_ADMIN. It is possible to
enable it for a normal user by disabling the paranoid stream setting.

v2: Store preemption setting in intel_context (Chris)

v3: Use priorities to avoid preemption rather than the HW mechanism

v4: Just modify the port priority reporting function

v5: Add nopreempt flag on gem context and always flag requests
appropriately, regarless of OA reconfiguration.

Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk


# 7831e9a9 14-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Allow dynamic reconfiguration of the OA stream

Introduce a new perf_ioctl command to change the OA configuration of the
active stream. This allows the OA stream to be reconfigured between
batch buffers, giving greater flexibility in sampling. We inject a
request into the OA context to reconfigure the stream asynchronously on
the GPU in between and ordered with execbuffer calls.

Original patch for dynamic reconfiguration by Lionel Landwerlin.

Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-3-chris@chris-wilson.co.uk


# 4f6ccc74 14-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: add support for perf configuration queries

Listing configurations at the moment is supported only through sysfs.
This might cause issues for applications wanting to list
configurations from a container where sysfs isn't available.

This change adds a way to query the number of configurations and their
content through the i915 query uAPI.

v2: Fix sparse warnings (Lionel)
Add support to query configuration using uuid (Lionel)

v3: Fix some inconsistency in uapi header (Lionel)
Fix unlocking when not locked issue (Lionel)
Add debug messages (Lionel)

v4: Fix missing unlock (Dan)

v5: Drop lock when copying config content to userspace (Chris)

v6: Drop lock when copying config list to userspace (Chris)
Fix deadlock when calling i915_perf_get_oa_config() under
perf.metrics_lock (Lionel)
Add i915_oa_config_get() (Chris)

Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-2-chris@chris-wilson.co.uk


# b8d49f28 14-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: introduce a versioning of the i915-perf uapi

Reporting this version will help application figure out what level of
the support the running kernel provides.

v2: Add i915_perf_ioctl_version() (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-1-chris@chris-wilson.co.uk


# c2fba936 13-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Avoid polluting the i915_oa_config with error pointers

Use a local variable to track the allocation errors to avoid polluting
the struct and keep the free simple.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013095211.2922-1-chris@chris-wilson.co.uk


# 5f5c382e 12-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Prefer using the pinned_ctx for emitting delays on config

When we are watching a particular context, we want the OA config to be
applied inline with that context such that it takes effect before the
next submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012091056.28686-1-chris@chris-wilson.co.uk


# 15d0ace1 12-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: execute OA configuration from command stream

We haven't run into issues with programming the global OA/NOA
registers configuration from CPU so far, but HW engineers actually
recommend doing this from the command streamer. On TGL in particular
one of the clock domain in which some of that programming goes might
not be powered when we poke things from the CPU.

Since we have a command buffer prepared for the execbuffer side of
things, we can reuse that approach here too.

This also allows us to significantly reduce the amount of time we hold
the main lock.

v2: Drop the global lock as much as possible

v3: Take global lock to pin global

v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel)

v5: Move locking to the stream (Lionel)

v6: Move active reconfiguration request into i915_perf_stream (Lionel)

v7: Pin VMA outside request creation (Chris)
Lock VMA before move to active (Chris)

v8: Fix double free on stream->initial_oa_config_bo (Lionel)
Don't allow interruption when waiting on active config request
(Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-3-chris@chris-wilson.co.uk


# daed3e44 12-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: implement active wait for noa configurations

NOA configuration take some amount of time to apply. That amount of
time depends on the size of the GT. There is no documented time for
this. For example, past experimentations with powergating
configuration changes seem to indicate a 60~70us delay. We go with
500us as default for now which should be over the required amount of
time (according to HW architects).

v2: Don't forget to save/restore registers used for the wait (Chris)

v3: Name used CS_GPR registers (Chris)
Fix compile issue due to rebase (Lionel)

v4: Fix save/restore helpers (Umesh)

v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel)

v6: Add missing struct declarations in i915_perf.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk


# 6a45008a 12-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: allow for CS OA configs to be created lazily

Here we introduce a mechanism by which the execbuf part of the i915
driver will be able to request that a batch buffer containing the
programming for a particular OA config be created.

We'll execute these OA configuration buffers right before executing a
set of userspace commands so that a particular user batchbuffer be
executed with a given OA configuration.

This mechanism essentially allows the userspace driver to go through
several OA configuration without having to open/close the i915/perf
stream.

v2: No need for locking on object OA config object creation (Chris)
Flush cpu mapping of OA config (Chris)

v3: Properly deal with the perf_metric lock (Chris/Lionel)

v4: Fix oa config unref/put when not found (Lionel)

v5: Allocate BOs for configurations on the stream instead of globally
(Lionel)

v6: Fix 64bit division (Chris)

v7: Store allocated config BOs into the stream (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk


# a5efcde6 11-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Replace global wakeref tracking with engine-pm

As we now have a specific engine to use OA on, exchange the top-level
runtime-pm wakeref with the engine-pm. This still results in the same
top-level runtime-pm, but with more nuances to keep the engine and its
gt awake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-1-chris@chris-wilson.co.uk


# 52111c46 10-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Store shortcut to intel_uncore

Now that we have the engine stored in i915_perf, we have a means of
accessing intel_gt should we require it. However, we are currently only
using the intel_gt to find the right intel_uncore, so replace our
i915_perf.gt pointer with the more useful i915_perf.uncore.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-2-chris@chris-wilson.co.uk


# 9a61363a 10-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: store the associated engine of a stream

We'll use this information later to verify that a client trying to
reconfigure the stream does so on the right engine. For now, we want to
pull the knowledge of which engine we use into a central property.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-1-chris@chris-wilson.co.uk


# 23b9e41a 08-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: drop list of streams

At some point in time there was the idea that we could have multiple
stream from the same piece of HW but that never materialized and given
the hard time we already have making everything work with the
submission side, there is no real point having this list of 1 element
around.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191008140111.5437-1-chris@chris-wilson.co.uk


# a4c969d1 07-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Set the exclusive stream under perf->lock

The BKL struct_mutex is no more, the only serialisation we required for
setting the exclusive stream is already managed by ce->pin_mutex in
gen8_configure_all_contexts(). As such, we can manipulate
i915_perf.exclusive_stream underneath our own (already held) perf->lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-2-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-2-chris@chris-wilson.co.uk


# 8f8b1171 07-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Wean ourselves off dev_priv

Use the local uncore accessors for the GT rather than using the [not-so]
magic global dev_priv mmio routines. In the process, we also teach the
perf stream to use backpointers to the i915_perf rather than digging it
out of dev_priv.

v2: Rebase onto i915_perf_types.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-1-chris@chris-wilson.co.uk


# a4e7ccda 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move context management under GEM

Keep track of the GEM contexts underneath i915->gem.contexts and assign
them their own lock for the purposes of list management.

v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex
v3: Correct split with removal of logical HW ID

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk


# 2935ed53 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove logical HW ID

With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines).
According to bspec, aach logical context requires a unique tag in order
for context-switching to occur correctly between them. [Simple
experiments show that it is not so easy to trick the HW into performing
a lite-restore with matching logical IDs, though my memory from early
Broadwell experiments do suggest that it should be generating
lite-restores.]

We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.

v2: Calculate required number of tags to fill ELSP.

Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk


# 2850748e 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull i915_vma_pin under the vm->mutex

Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).

In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.

Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!

v2: Add some commentary, and some helpers to reduce patch churn.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk


# 7dc56af5 24-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Verify the LRC register layout between init and HW

Before we submit the first context to HW, we need to construct a valid
image of the register state. This layout is defined by the HW and should
match the layout generated by HW when it saves the context image.
Asserting that this should be equivalent should help avoid any undefined
behaviour and verify that we haven't missed anything important!

Of course, having insisted that the initial register state within the
LRC should match that returned by HW, we need to ensure that it does.

v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for
constructing the lrc image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk


# dffa8feb 30-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Assert locking for i915_init_oa_perf_state()

We use the context->pin_mutex to serialise updates to the OA config and
the registers values written into each new context. Document this
relationship and assert we do hold the context->pin_mutex as used by
gen8_configure_all_contexts() to serialise updates to the OA config
itself.

v2: Add a white-lie for when we call intel_gt_resume() from init.
v3: Lie while we have the context pinned inside atomic reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk


# 45e9c829 23-Aug-2019 Michel Thierry <michel.thierry@intel.com>

drm/i915/tgl/perf: use the same oa ctx_id format as icl

Compared to Icelake, Tigerlake's MAX_CONTEXT_HW_ID is smaller by one, but
since we just use the upper 32 bits of the lrc_desc, it's guaranteed OA
will use the correct one.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-19-lucas.demarchi@intel.com


# db94e9f1 08-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: extract i915_perf.h from i915_drv.h

It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.

Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.

No functional changes.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com


# a37f08a8 06-Aug-2019 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Refactor oa object to better manage resources

The oa object manages the oa buffer and must be allocated when the user
intends to read performance counter snapshots. This can be achieved by
making the oa object part of the stream object which is allocated when a
stream is opened by the user.

Attributes in the oa object that are gen-specific are moved to the perf
object so that they can be initialized on driver load.

The split provides a better separation of the objects used in perf
implementation of i915 driver so that resources are allocated and
initialized only when needed.

v2: Fix checkpatch warnings
v3: Addressed Lionel's review comment
v4: Rebase
v5: Fix rebase/merge issue with ratelimit_state_init

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806233002.984-1-umesh.nerlige.ramappa@intel.com


# 750e76b4 06-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Move the [class][inst] lookup for engines onto the GT

To maintain a fast lookup from a GT centric irq handler, we want the
engine lookup tables on the intel_gt. To avoid having multiple copies of
the same multi-dimension lookup table, move the generic user engine
lookup into an rbtree (for fast and flexible indexing).

v2: Split uabi_instance cf uabi_class
v3: Set uabi_class/uabi_instance after collating all engines to provide a
stable uabi across parallel unordered construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk


# cb0c43f3 30-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid ce->gem_context->i915

My plan for the future is to have kernel contexts not to have a GEM
context backpointer (as they will not belong to any GEM context). In a
few places, we use ce->gem_context to simply obtain the i915 backpointer,
for which we can use ce->engine->i915 instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730163441.16477-1-chris@chris-wilson.co.uk


# 8f48de49 10-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add missing delay for OA muxes configuration

This was dropped from the original patch series, we weren't sure
whether it was needed at the time. More recent tests show it's
definitely needed to have acurate performance data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: combine duplicate code and comments]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk
(cherry picked from commit 14bfcd3e0daeb0f757a02aac85fd03e0933ab37e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 06c12ae3 09-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: ensure we keep a reference on the driver

The i915 perf stream has its own file descriptor and is tied to
reference of the driver. We haven't taken care of keep the driver
alive.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: eec688e1420da5 ("drm/i915: Add i915 perf infrastructure")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com
(cherry picked from commit a5af1df716c123a09341351008fc497bea137b77)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 95eef14c 10-Jun-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix ICL perf register offsets

We got the wrong offsets (could they have changed?). New values were
computed off an error state by looking up the register offset in the
context image as written by the HW.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1de401c08fa805 ("drm/i915/perf: enable perf support on ICL")
Cc: <stable@vger.kernel.org> # v4.18+
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 8dcfdfb4501012a8d36d2157dc73925715f2befb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 5cca5038 26-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Initialise err to 0 before looping over ce->engines

Smatch warning that the loop may be empty causing us to check err before
it had been set. Ensure that it is initialised to 0, just in case.

v2: Refactor the inner loop for better scooping and clarity

Fixes: a9877da2d629 ("drm/i915/oa: Reconfigure contexts on the fly")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190726131458.8310-1-chris@chris-wilson.co.uk


# eec4844f 18-Jul-2019 Matteo Croce <mcroce@redhat.com>

proc/sysctl: add shared variables for range check

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

$ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

# scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
Data old new delta
sysctl_vals - 12 +12
__kstrtab_sysctl_vals - 12 +12
max 14 10 -4
int_max 16 - -16
one 68 - -68
zero 128 28 -100
Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a9877da2 16-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/oa: Reconfigure contexts on the fly

Avoid a global idle barrier by reconfiguring each context by rewriting
them with MI_STORE_DWORD from the kernel context.

v2: We only need to determine the desired register values once, they are
the same for all contexts.
v3: Don't remove the kernel context from the list of known GEM contexts;
the world is not ready for that yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk


# 14bfcd3e 10-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add missing delay for OA muxes configuration

This was dropped from the original patch series, we weren't sure
whether it was needed at the time. More recent tests show it's
definitely needed to have acurate performance data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: combine duplicate code and comments]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk


# a5af1df7 09-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: ensure we keep a reference on the driver

The i915 perf stream has its own file descriptor and is tied to
reference of the driver. We haven't taken care of keep the driver
alive.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: eec688e1420da5 ("drm/i915: Add i915 perf infrastructure")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com


# 5ed7a0cf 25-Jun-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Move OA files to separate folder

OA files look to be auto-generated so we can keep them all in
dedicated subdirectory.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626123826.39760-1-michal.wajdeczko@intel.com


# 8dcfdfb4 10-Jun-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix ICL perf register offsets

We got the wrong offsets (could they have changed?). New values were
computed off an error state by looking up the register offset in the
context image as written by the HW.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1de401c08fa805 ("drm/i915/perf: enable perf support on ICL")
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com


# d858d569 13-Jun-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: update rpm_get/put to use the rpm structure

The functions where internally already only using the structure, so we
need to just flip the interface.

v2: rebase

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com


# c5cc0bf8 01-Jun-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix whitelist on Gen10+

Gen10 added an additional NOA_WRITE register (high bits) and we forgot
to whitelist it for userspace.

Fixes: 95690a02fb5d96 ("drm/i915/perf: enable perf support on CNL")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com
(cherry picked from commit bf210f6c9e6fd8dc0d154ad18f741f20e64a3fce)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# bf210f6c 01-Jun-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix whitelist on Gen10+

Gen10 added an additional NOA_WRITE register (high bits) and we forgot
to whitelist it for userspace.

Fixes: 95690a02fb5d96 ("drm/i915/perf: enable perf support on CNL")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com


# 10be98a7 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move more GEM objects under gem/

Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk


# 8475355f 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move shmem object setup to its own file

Split the plain old shmem object into its own file to start decluttering
i915_gem.c

v2: Lose the confusing, hysterical raisins, suffix of _gtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk


# 5e2a0419 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Switch back to an array of logical per-engine HW contexts

We switched to a tree of per-engine HW context to accommodate the
introduction of virtual engines. However, we plan to also support
multiple instances of the same engine within the GEM context, defeating
our use of the engine as a key to looking up the HW context. Just
allocate a logical per-engine instance and always use an index into the
ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user
specified map.

v2: Add for_each_gem_engine() helper to iterator within the engines lock
v3: intel_context_create_request() helper
v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough.
v5: Push iterator locking to caller

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk


# fa9f6681 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export intel_context_instance()

We want to pass in a intel_context into intel_context_pin() and that
requires us to first be able to lookup the intel_context!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk


# 2ccdf6a1 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass intel_context to i915_request_create()

Start acquiring the logical intel_context and using that as our primary
means for request allocation. This is the initial step to allow us to
avoid requiring struct_mutex for request allocation along the
perma-pinned kernel context, but it also provides a foundation for
breaking up the complex request allocation to handle different scenarios
inside execbuf.

For the purpose of emitting a request from inside retirement (see the
next patch for engine power management), we also need to lift control
over the timeline mutex to the caller.

v2: Note that the request carries the active reference upon construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk


# 112ed2d3 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GraphicsTechnology files under gt/

Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk


# 09407579 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store the default sseu setup on the engine

As we push for better compartmentalisation, it is more convenient to
copy the default sseu configuration from the engine into the derived
logical context, than it is to dig it out from i915->runtime_info.

v2: Use intel_sseu_from_device_info() to describe the converter

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk


# 97a04e0d 25-Mar-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: switch intel_wait_for_register to uncore

The intel_uncore structure is the owner of register access, so
subclass the function to it.

While at it, use a local uncore var and switch to the new read/write
functions where it makes sense.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-9-daniele.ceraolospurio@intel.com


# a679f58d 21-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush pages on acquisition

When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.

The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.

Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.

Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expected; but easy enough to catch and apply a manual
flush.

For the future, we should push the page flush from the central
set_pages() into the callers so that we can more finely control when it
is applied, but for now doing it one location is easier to validate, at
the cost of sometimes flushing when there is no need.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk


# 3ceea6a1 19-Mar-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: use intel_uncore for all forcewake get/put

Now that the internal code all works on intel_uncore, flip the
external-facing interface.

v2: fix GVT.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com


# 2dd24a9c 08-Mar-2019 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915/gen11+: First assume next platforms will inherit stuff

This exactly same approach was already used from gen9
to gen10 and from gen10 to gen11. Let's also use it
for gen11+.

Let's first assume that we inherit a similar platform
and than we apply the differences on top.

Different from the previous attempts this will be
done this time with coccinelle. We obviously need to
exclude some case that is really exclusive for gen11
like PCH, Firmware, and few others. Luckly this was
easy to filter by selecting the files we are touching
with coccinelle as exposed below:

spatch -sp_file gen11\+.cocci --in-place i915_perf.c \
intel_bios.c intel_cdclk.c intel_ddi.c \
intel_device_info.c intel_display.c intel_dpll_mgr.c \
intel_dsi_vbt.c intel_hdmi.c intel_mocs.c intel_color.c

@noticelake@ expression e; @@
-!IS_ICELAKE(e)
+INTEL_GEN(e) < 11
@notgen11@ expression e; @@
-!IS_GEN(e, 11)
+INTEL_GEN(e) < 11
@icelake@ expression e; @@
-IS_ICELAKE(e)
+INTEL_GEN(e) >= 11
@gen11@ expression e; @@
-IS_GEN(e, 11)
+INTEL_GEN(e) >= 11

No functional change.

v2: Remove intel_lrc.c per Tvrtko request since those were w/a
for ICL hw issuea and media related configuration.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-1-rodrigo.vivi@intel.com


# c4d52feb 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move over to intel_context_lookup()

In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

v2: Check for no HW context in guc_stage_desc_init

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk


# b146e5ef 06-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass around the intel_context

Instead of passing the gem_context and engine to find the instance of
the intel_context to use, pass around the intel_context instead. This is
useful for the next few patches, where the intel_context is no longer a
direct lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk


# 8a68d464 05-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store the BIT(engine->id) as the engine's mask

In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.

v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
v3: Tvrtko voted for moah churn so teach everyone to not mention ring
and use $class$instance throughout.
v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and
VCS[0-4] in later gen. We opt to keep the code consistent and use
0-index naming throughout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk


# 993298af 01-Mar-2019 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Yet another if/else sort of newer to older platforms.

No functional change. Just a reorg to match the preferred
behavior.

When rebasing internal branch on top of latest sort I noticed
few more cases that needs to get reordered.

Let's do in a bundle this time and hoping there's no other
missing places.

v2: Check for HSW/BDW ULT before generic IS_HASWELL or
IS_BROADWELL or it doesn't work as pointed by Ville.
But also ULT came afterwards anyway.
v3: Accepting suggestions from Lucas:
Sort CNL/CFL, KBL/SKL, and use <= 8 removing chv and bdw.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301172703.12139-1-rodrigo.vivi@intel.com


# ec431eae 05-Feb-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: lock powergating configuration to default when active

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active.

On most platforms there are no issues with this apart from a performance
penality for some media workloads that benefit from running on a partially
powergated GPU. We already prevent RC6 from affecting the programming so
it doesn't sound completely unreasonable to hold on powergating for the
same reason.

On Icelake however there would a functional problem if the slices not-
containing the VME block were left enabled with a running media workload
which explicitly disabled them. To avoid a GPU hang in this case, on
Icelake we lock the enablement to only slices which contain VME blocks.
Downside is that it means degraded GPU performance when OA is active but
there is no known alternative solution for this.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
More to_intel_context() (Tvrtko)
s/dev_priv/i915/ (Tvrtko)

Tvrtko Ursulin:

v4:
* Rebase for make_rpcs changes.

v5:
* Apply OA restriction from make_rpcs directly.

v6:
* Rebase for context image setup changes.

v7:
* Move stream assignment before metric enable.

v8-9:
* Rebase.

v10:
* Squashed with ICL support patch.

Bspec: 21140
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com


# 739f3abd 16-Jan-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: small isolated c99 types to kernel types switch

Mixed C99 and kernel types use is getting ugly. Prefer kernel types.

sed -i 's/\buint\(8\|16\|32\|64\)_t\b/u\1/g'

Minor checkpatch fixes sprinkled on top of the changed lines.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/14ed72e7f04c9340a057855c5950b54811f8a477.1547629303.git.jani.nikula@intel.com


# 6619c007 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Track the rpm wakeref

Keep track of our wakeref used to keep the device awake so we can catch
any leak.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-7-chris@chris-wilson.co.uk


# 16e4dd03 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Markup paired operations on wakerefs

The majority of runtime-pm operations are bounded and scoped within a
function; these are easy to verify that the wakeref are handled
correctly. We can employ the compiler to help us, and reduce the number
of wakerefs tracked when debugging, by passing around cookies provided
by the various rpm_get functions to their rpm_put counterpart. This
makes the pairing explicit, and given the required wakeref cookie the
compiler can verify that we pass an initialised value to the rpm_put
(quite handy for double checking error paths).

For regular builds, the compiler should be able to eliminate the unused
local variables and the program growth should be minimal. Fwiw, it came
out as a net improvement as gcc was able to refactor rpm_get and
rpm_get_if_in_use together,

v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual
mark up for smaller more targeted patches.
v3: Mention the cookie in Returns

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk


# 96d4f267 03-Jan-2019 Linus Torvalds <torvalds@linux-foundation.org>

Remove 'type' argument from access_ok() function

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

- csky still had the old "verify_area()" name as an alias.

- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)

- microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0258404f 31-Dec-2018 Jani Nikula <jani.nikula@intel.com>

drm/i915: start moving runtime device info to a separate struct

First move the low hanging fruit, the fields that are only initialized
runtime. Use RUNTIME_INFO() exclusively to access the fields.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c24fe7a4b0492a888690c46814c0ff21ce2f12b1.1546267488.git.jani.nikula@intel.com


# f3ce44a0 12-Dec-2018 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: merge gen checks to use range

Instead of using IS_GEN() for consecutive gen checks, let's pass the
range to IS_GEN_RANGE(). By code inspection these were the ranges deemed
necessary for spatch:

@@
expression e;
@@
(
- IS_GEN(e, 3) || IS_GEN(e, 2)
+ IS_GEN_RANGE(e, 2, 3)
|
- IS_GEN(e, 3) || IS_GEN(e, 4)
+ IS_GEN_RANGE(e, 3, 4)
|
- IS_GEN(e, 5) || IS_GEN(e, 6)
+ IS_GEN_RANGE(e, 5, 6)
|
- IS_GEN(e, 6) || IS_GEN(e, 7)
+ IS_GEN_RANGE(e, 6, 7)
|
- IS_GEN(e, 7) || IS_GEN(e, 8)
+ IS_GEN_RANGE(e, 7, 8)
|
- IS_GEN(e, 8) || IS_GEN(e, 9)
+ IS_GEN_RANGE(e, 8, 9)
|
- IS_GEN(e, 10) || IS_GEN(e, 9)
+ IS_GEN_RANGE(e, 9, 10)
|
- IS_GEN(e, 9) || IS_GEN(e, 10)
+ IS_GEN_RANGE(e, 9, 10)
)

After conversion, checking we don't have any missing IS_GEN_RANGE() ||
IS_GEN() was also done.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-3-lucas.demarchi@intel.com


# cf819eff 12-Dec-2018 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: replace IS_GEN<N> with IS_GEN(..., N)

Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of
gen_mask to do the comparison. Now callers can pass then gen as a parameter,
so we don't require one macro for each gen.

The following spatch was used to convert the users of these macros:

@@
expression e;
@@
(
- IS_GEN2(e)
+ IS_GEN(e, 2)
|
- IS_GEN3(e)
+ IS_GEN(e, 3)
|
- IS_GEN4(e)
+ IS_GEN(e, 4)
|
- IS_GEN5(e)
+ IS_GEN(e, 5)
|
- IS_GEN6(e)
+ IS_GEN(e, 6)
|
- IS_GEN7(e)
+ IS_GEN(e, 7)
|
- IS_GEN8(e)
+ IS_GEN(e, 8)
|
- IS_GEN9(e)
+ IS_GEN(e, 9)
|
- IS_GEN10(e)
+ IS_GEN(e, 10)
|
- IS_GEN11(e)
+ IS_GEN(e, 11)
)

v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than
using the bitmask

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com


# 00690008 12-Dec-2018 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Rename IS_GEN to IS_GEN_RANGE

RANGE makes it longer, but clearer. We are also going to add a macro to
check an individual gen, so add the _RANGE prefix here.

Diff generated with:

sed 's/IS_GEN(/IS_GEN_RANGE(/g' drivers/gpu/drm/i915/{*/,}*.{c,h} -i

v2: use IS_GEN rather than GT_GEN

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-1-lucas.demarchi@intel.com


# 6b671c27 16-Nov-2018 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Revert "drm/i915/perf: Fix warning in documentation"

Userspace portion is still missing.

This reverts commit 9fa6e2f7609fdbb7d6f86be86371a5719bec0376.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116135510.13807-2-joonas.lahtinen@linux.intel.com


# fe841686 16-Nov-2018 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Revert "drm/i915/perf: add a parameter to control the size of OA buffer"

Userspace portion is still missing.

This reverts commit cd956bfcd0f58d20485ac0a785415f7d9327a95f.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116135510.13807-1-joonas.lahtinen@linux.intel.com


# 9fa6e2f7 24-Oct-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: Fix warning in documentation

Forgot to add the description of this option in a previous commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: cd956bfcd0f58d ("drm/i915/perf: add a parameter to control the size of OA buffer")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181024105158.4732-1-lionel.g.landwerlin@intel.com


# cd956bfc 23-Oct-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add a parameter to control the size of OA buffer

The way our hardware is designed doesn't seem to let us use the
MI_RECORD_PERF_COUNT command without setting up a circular buffer.

In the case where the user didn't request OA reports to be available
through the i915 perf stream, we can set the OA buffer to the minimum
size to avoid consuming memory which won't be used by the driver.

v2: Simplify oa buffer size exponent selection (Chris)
Reuse vma size field (Lionel)

v3: Restrict size opening parameter to values supported by HW (Chris)

v4: Drop out of date comment (Matt)
Add debug message when buffer size is rejected (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-5-lionel.g.landwerlin@intel.com


# 5728de2f 23-Oct-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: pass stream to vfuncs when possible

We want to use some of the properties of the perf stream to program
the hardware in a later commit.

v2: Pass only perf stream as argument (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-4-lionel.g.landwerlin@intel.com


# 784b1a84 23-Oct-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: remove redundant oa buffer initialization

We initialize the OA buffer everytime we enable the OA unit (first call in
gen[78]_oa_enable), so we don't need to initialize when preparing the metric
set.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-3-lionel.g.landwerlin@intel.com


# 666424ab 14-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Use coherent writes into the context image

That we use a WB mapping for updating the RING_TAIL register inside the
context image even on !llc machines has been a source of consternation
for every reader. It appears to work on bsw+, but it may just have been
that we have been incredibly bad at detecting the errors.

v2: With extra enthusiasm.
v3: Drop force of map type for pinned default_state as by the time we
pin it, the map type is always WB and doesn't conflict with the earlier
use by ce->state.
v4: Transfer engine->default_state from MAP_WC to MAP_WB on creation so
we do not need the MAP_FORCE littered around the backends

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-3-chris@chris-wilson.co.uk


# 722f3de3 12-Sep-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

i915/oa: Simplify updating contexts

We can remove the update-via-batch-buffer code path, which is basically an
effective duplicate of update-via-context-image path, if we notice that
after we have idled the GPU, we can update the context image even of the
kernel context directly. (Update-via-batch-buffer path existed only to
solve the problem of how to update the kernel context image.)

Only additional thing needed is to activate the edited configuration by
sending one empty request down the pipe. This accomplishes context restore
of the updated kernel context and so the OA configuration gets written out
to it's control registers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912152930.28237-1-tvrtko.ursulin@linux.intel.com


# 35ab4fd2 13-Aug-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: reuse intel_lrc ctx regs macro

Abstract the context image access a bit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180813080218.28994-3-tvrtko.ursulin@linux.intel.com


# 1c71bc56 13-Aug-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: simplify configure all context function

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180813080218.28994-2-tvrtko.ursulin@linux.intel.com


# 6a2f59e4 21-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull unpin map into vma release

A reasonably common operation is to pin the map of the vma alongside the
vma itself for the lifetime of the vma, and so release both pins at the
same time as destroying the vma. It is common enough to pull into the
release function, making that central function more attractive to a
couple of other callsites.

The continual ulterior motive is to sweep over errors on module load
aborting...

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180721125037.20127-1-chris@chris-wilson.co.uk


# ec625fb9 09-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Provide a timeout to i915_gem_wait_for_idle()

Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.

In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.

In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.

v2: Include the timeout parameter in the trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk


# 6ebb6d8e 13-Jun-2018 Jani Nikula <jani.nikula@intel.com>

drm/i915/perf: make oa format tables const

No reason not to be const.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180613114929.14541-1-jani.nikula@intel.com


# 2b9a8203 04-Jun-2018 Michel Thierry <michel.thierry@intel.com>

drm/i915/perf: fix gen11 engine class shift

Use the correct engine class shift value while storing the ctx hw id.
Fixes the copy+paste error from commit 61d5676b5561 ("drm/i915/perf: fix
ctx_id read with GuC & ICL").

Apologies for not spotting this in the original review, the
specific_ctx_id_mask is correct, only the specific_ctx_id had this
problem.

v2: Just use the upper 32 bits of lrc_desc (Chris)
v3: If we use the lrc_desc, we must apply the ctx_id_mask too (Lionel)

Fixes: 61d5676b5561 ("drm/i915/perf: fix ctx_id read with GuC & ICL")
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604233250.609-2-michel.thierry@intel.com


# 9904b156 04-Jun-2018 Michel Thierry <michel.thierry@intel.com>

drm/i915/perf: use the lrc_desc to get the ctx hw id in gen8-10

The upper 32 bits of the lrc_desc (bits 52-32 to be precise) are the
context hw id in GEN8-10, so use them and have one less thing to
maintain in the unlikely case we change the descriptor sw fields.

v2: If we use the lrc_desc, we must apply the ctx_id_mask too (Lionel)

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604233250.609-1-michel.thierry@intel.com


# 61d5676b 01-Jun-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix ctx_id read with GuC & ICL

One thing we didn't really understand about the OA report is that the
ContextID field (dword 2) is copy of the context descriptor (dword 1).

On Gen8->10 and without using GuC we didn't notice the issue because
we only checked the 21bits of the ContextID field in the OA reports
which matches exactly the hw_id stored into the context descriptor.

When using GuC submission we have an issue of a non matching hw_id
because GuC uses bit 20 of the hw_id to signal proxy submission. This
change introduces a mask to compare only the relevant bits.

On ICL the context descriptor format has changed and we failed to
address this. On top of using a mask we also need to shift the bits
properly.

v2: Reuse lrc_desc rather than recomputing part of it (Chris/Michel)

v3: Always pin the context we're filtering with (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1de401c08fa805 ("drm/i915/perf: enable perf support on ICL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104252
BSpec: 1237
Testcase: igt/perf/gen8-unprivileged-single-ctx-counters
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180602112946.30803-3-lionel.g.landwerlin@intel.com
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org


# 1fc44d9b 17-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a pointer to intel_context in i915_request

To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.

v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk


# e896d29a 11-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/oa: Check that OA is disabled before unpinning

Before we unpin the buffer used for OA reports and return it to the
system, we need to be sure that the HW has finished writing into it.
For lack of a better idea, poll OACONTROL to check it is switched off.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106379
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180511135207.12880-1-chris@chris-wilson.co.uk


# a89d1f92 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split i915_gem_timeline into individual timelines

We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.

By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.

v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk


# ab82a063 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->context_pin() and engine->context_unpin()

Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.

v2: Fixup mock_engine to mark the context as pinned on use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk


# 9bd9be66 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add more debug message on perf open & configs

This will make it easier to spot issues related to config
creation/usage.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-9-lionel.g.landwerlin@intel.com


# b82ed43d 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: rename PPGTT/GGTT fields OA registers

We had a generic field name used across 2 registers but it feels like
it's clearer we make it obvious what register this field belongs to.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-7-lionel.g.landwerlin@intel.com


# 53744104 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: remove empty line

This was added by mistake in commit 28964cf25ee67 ("drm/i915/perf:
disable NOA logic when not used").

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-6-lionel.g.landwerlin@intel.com


# 11051303 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: simplify OA unit enabling on gen7

In commit d79651522e89c ("drm/i915: Enable i915 perf stream for
Haswell OA unit") the enable/disable vfunc hadn't appear yet and the
same function would deal with enabling/disabling the OA unit.

This was split later on for gen8 but the gen7 retained some code that
isn't actually useful anymore.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-4-lionel.g.landwerlin@intel.com


# b6dd47b9 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: check the value of PROP_SAMPLE_OA uapi parameter

We've been a bit loose about this opening parameter. We should only
add the flag for writing OA reports when the value of this parameter
is != 0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-3-lionel.g.landwerlin@intel.com


# 1de401c0 26-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: enable perf support on ICL

No significant changes from either context offsets, nor report
formats, nor register whitelist.

v2: Also drop slice/unslice clock ratio changes (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180326133949.12469-3-lionel.g.landwerlin@intel.com


# f616f283 01-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix perf stream opening lock

We're seeing on CI that some contexts don't have the programmed OA
period timer that directs the OA unit on how often to write reports.

The issue is that we're not holding the drm lock from when we edit the
context images down to when we set the exclusive_stream variable. This
leaves a window for the deferred context allocation to call
i915_oa_init_reg_state() that will not program the expected OA timer
value, because we haven't set the exclusive_stream yet.

v2: Drop need_lock from gen8_configure_all_contexts() (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 701f8231a2f ("drm/i915/perf: prune OA configs")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103715
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103755
Link: https://patchwork.freedesktop.org/patch/msgid/20180301110613.1737-1-lionel.g.landwerlin@intel.com
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.14+
(cherry picked from commit 41d3fdcd15d5ecf29cc73e8b79c2327ebb54b960)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 41d3fdcd 01-Mar-2018 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix perf stream opening lock

We're seeing on CI that some contexts don't have the programmed OA
period timer that directs the OA unit on how often to write reports.

The issue is that we're not holding the drm lock from when we edit the
context images down to when we set the exclusive_stream variable. This
leaves a window for the deferred context allocation to call
i915_oa_init_reg_state() that will not program the expected OA timer
value, because we haven't set the exclusive_stream yet.

v2: Drop need_lock from gen8_configure_all_contexts() (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 701f8231a2f ("drm/i915/perf: prune OA configs")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103715
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103755
Link: https://patchwork.freedesktop.org/patch/msgid/20180301110613.1737-1-lionel.g.landwerlin@intel.com
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.14+


# e61e0f51 21-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename drm_i915_gem_request to i915_request

We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# a9a08845 11-Feb-2018 Linus Torvalds <torvalds@linux-foundation.org>

vfs: do bulk POLL* -> EPOLL* replacement

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# afc9a42b 03-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

the rest of drivers/*: annotate ->poll() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# fb5c551a 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915.enable_execlists module parameter

Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk


# 9f9b2792 27-Oct-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: reuse timestamp frequency from device info

Now that we have this stored in the device info, we can drop it from perf
part of the driver.

Note that this requires to init perf after we've computed the frequency,
hence why we move i915_perf_init() from i915_driver_init_early() to after
intel_device_info_runtime_init().

v2: Use div_u64 (Chris)

v3: Drop u64 divs by switching to kHz (Chris/Ville)
Move i915_perf_fini to i915_driver_cleanup_hw (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113181902.12411-2-lionel.g.landwerlin@intel.com


# 3fef5cda 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Automatic i915_switch_context for legacy

During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.

v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk


# 7c52a221 13-Nov-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: replace .reg accesses with i915_mmio_reg_offset

This replaces accesses to the reg field of the i915_reg_t structure
with the i915_mmio_reg_offset() inline function.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ewelina Musial <ewelina.musial@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113233455.12085-2-lionel.g.landwerlin@intel.com


# 95690a02 10-Nov-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: enable perf support on CNL

This adds new registers to the whitelist to configs emitted from userspace.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-6-lionel.g.landwerlin@intel.com


# ba6b7c1a 10-Nov-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: refactor perf setup

Gen8/9 aren't very different and we can merge some of this code.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-4-lionel.g.landwerlin@intel.com


# 4407eaa9 10-Nov-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add support for Coffeelake GT3

We can enable GT3 as well as GT2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-3-lionel.g.landwerlin@intel.com


# a54b19f1 10-Nov-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: complete whitelisting for OA programming on HSW

We were missing some registers and also can name one for which we only had
the offset.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-2-lionel.g.landwerlin@intel.com


# 7277f755 24-Oct-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace

The compat callback was missing and triggered failures in 32bits
userspace when enabling/disable the perf stream. We don't require any
particular processing here as these ioctls don't take any argument.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eec688e1420 ("drm/i915: Add i915 perf infrastructure")
Cc: linux-stable <stable@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024152728.4873-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 191f896085cf3b5d85920d58a759da4eea141721)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 191f8960 24-Oct-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace

The compat callback was missing and triggered failures in 32bits
userspace when enabling/disable the perf stream. We don't require any
particular processing here as these ioctls don't take any argument.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eec688e1420 ("drm/i915: Add i915 perf infrastructure")
Cc: linux-stable <stable@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024152728.4873-1-lionel.g.landwerlin@intel.com


# 4f044a88 19-Sep-2017 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Rename global i915 to i915_modparams

Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com


# 22ea4f35 17-Sep-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add support for Coffeelake GT2

Add the test configuration & timestamp frequency for Coffeelake GT2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170918112124.29541-3-lionel.g.landwerlin@intel.com


# 342a2c84 17-Sep-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: disable clk ratio reports on gen9

We're doing this on all Gen9 based platforms, let's just check the gen
rather than listing every single platforms.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170918112124.29541-2-lionel.g.landwerlin@intel.com


# 28b6cb08 10-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Drop redundant check for perf.initialised on reset

As we cannot have an exclusive stream set if the perf has not been
initialized, we only need to check for that exclusive stream.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-3-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 84a095e4 10-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Drop lockdep assert for i915_oa_init_reg_state()

This is called from execlist context init which we need to be unlocked.
Commit f89823c21224 ("drm/i915/perf: Implement
I915_PERF_ADD/REMOVE_CONFIG interface") added a lockdep assert to this
path for unclear reasons, remove it again!

Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 40f75ea4 10-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Initialise dynamic sysfs group before creation

Another case where we need to call sysfs_attr_init() to setup the
internal lockdep class prior to use:

[ 9.325229] BUG: key ffff880168bc7bb0 not in .data!
[ 9.325240] DEBUG_LOCKS_WARN_ON(1)
[ 9.325250] ------------[ cut here ]------------
[ 9.325280] WARNING: CPU: 1 PID: 275 at kernel/locking/lockdep.c:3156 lockdep_init_map+0x1b2/0x1c0
[ 9.325301] Modules linked in: intel_powerclamp(+) coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(+) snd_hda_intel snd_hda_codec snd_hwdep r8169 mii snd_hda_core snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[ 9.325375] CPU: 1 PID: 275 Comm: modprobe Not tainted 4.13.0-rc4-CI-Trybot_1040+ #1
[ 9.325395] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0045.B51.1704281422 04/28/2017
[ 9.325422] task: ffff8801721a4ec0 task.stack: ffffc900001dc000
[ 9.325440] RIP: 0010:lockdep_init_map+0x1b2/0x1c0
[ 9.325456] RSP: 0018:ffffc900001dfa10 EFLAGS: 00010282
[ 9.325473] RAX: 0000000000000016 RBX: ffff880168d54b80 RCX: 0000000000000000
[ 9.325488] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810f0800
[ 9.325505] RBP: ffffc900001dfa30 R08: 0000000000000001 R09: 0000000000000000
[ 9.325521] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880168bc7bb0
[ 9.325537] R13: 0000000000000000 R14: ffff880168bc7b98 R15: ffffffff81a263a0
[ 9.325554] FS: 00007fb60c3fd700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
[ 9.325574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.325588] CR2: 0000006582777d80 CR3: 000000016d818000 CR4: 00000000003406e0
[ 9.325604] Call Trace:
[ 9.325618] __kernfs_create_file+0x76/0xe0
[ 9.325632] sysfs_add_file_mode_ns+0x8a/0x1a0
[ 9.325646] internal_create_group+0xea/0x2c0
[ 9.325660] sysfs_create_group+0x13/0x20
[ 9.325737] i915_perf_register+0xde/0x220 [i915]
[ 9.325800] i915_driver_load+0xa77/0x16c0 [i915]
[ 9.325863] i915_pci_probe+0x37/0x90 [i915]
[ 9.325880] pci_device_probe+0xa8/0x130
[ 9.325894] driver_probe_device+0x29c/0x450
[ 9.325908] __driver_attach+0xe3/0xf0
[ 9.325922] ? driver_probe_device+0x450/0x450
[ 9.325935] bus_for_each_dev+0x62/0xa0
[ 9.325948] driver_attach+0x1e/0x20
[ 9.325960] bus_add_driver+0x173/0x270
[ 9.325974] driver_register+0x60/0xe0
[ 9.325986] __pci_register_driver+0x60/0x70
[ 9.326044] i915_init+0x6f/0x78 [i915]
[ 9.326066] ? 0xffffffffa024e000
[ 9.326079] do_one_initcall+0x43/0x170
[ 9.326094] ? rcu_read_lock_sched_held+0x7a/0x90
[ 9.326109] ? kmem_cache_alloc_trace+0x261/0x2d0
[ 9.326124] do_init_module+0x5f/0x206
[ 9.326137] load_module+0x2561/0x2da0
[ 9.326150] ? show_coresize+0x30/0x30
[ 9.326165] ? kernel_read_file+0x105/0x190
[ 9.326180] SyS_finit_module+0xc1/0x100
[ 9.326192] ? SyS_finit_module+0xc1/0x100
[ 9.326210] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 9.326223] RIP: 0033:0x7fb60bf359f9
[ 9.326234] RSP: 002b:00007fff92b47c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 9.326255] RAX: ffffffffffffffda RBX: ffffffff814898a3 RCX: 00007fb60bf359f9
[ 9.326271] RDX: 0000000000000000 RSI: 00000028a9ceef8b RDI: 0000000000000000
[ 9.326287] RBP: ffffc900001dff88 R08: 0000000000000000 R09: 0000000000000000
[ 9.326303] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000040000
[ 9.326319] R13: 00000028aaef2a70 R14: 0000000000000000 R15: 00000028aaeee5d0
[ 9.326339] ? __this_cpu_preempt_check+0x13/0x20
[ 9.326353] Code: f1 39 00 85 c0 0f 84 38 ff ff ff 83 3d 9f 44 ce 01 00 0f 85 2b ff ff ff 48 c7 c6 b2 a2 c7 81 48 c7 c7 53 40 c5 81 e8 3f 82 01 00 <0f> ff e9 11 ff ff ff 0f 1f 80 00 00 00 00 55 31 c9 31 d2 31 f6

Fixes: 701f8231a2fe ("drm/i915/perf: prune OA configs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-1-chris@chris-wilson.co.uk
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


# b5fa57dd 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix flex eu registers programming

We were reserving fewer dwords in the ring than necessary. Indeed
we're always writing all registers once, so discard the actual number
of registers given by the user and just program the whitelisted ones
once.

Fixes: 19f81df2859e ("drm/i915/perf: Add OA unit support for Gen 8+")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com
(cherry picked from commit 01d928e9a1644eb2e28f684905f888e700c7b9dc)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 28152a23 03-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Initialise the dynamic sysfs attr

Use sysfs_attr_init() to dynamically initialise the
oa_config->sysfs_metric_id.attr as it has the important side-effect of
setting the lockdep key.

[ 4.971513] [drm] Initialized i915 1.6.0 20170731 for 0000:00:02.0 on minor 0
[ 4.973489] BUG: key ffff88026f6e7bb0 not in .data!
[ 4.973506] DEBUG_LOCKS_WARN_ON(1)
[ 4.973518] ------------[ cut here ]------------
[ 4.973547] WARNING: CPU: 1 PID: 258 at kernel/locking/lockdep.c:3156 lockdep_init_map+0x1b2/0x1c0
[ 4.973567] Modules linked in: i915(+) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mei_me mii mei lpc_ich prime_numbers i2c_hid pinctrl_broxton pinctrl_intel
[ 4.973645] CPU: 1 PID: 258 Comm: systemd-udevd Not tainted 4.13.0-rc3-CI-CI_DRM_2915+ #1
[ 4.973664] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
[ 4.973686] task: ffff8802704c2740 task.stack: ffffc90000224000
[ 4.973700] RIP: 0010:lockdep_init_map+0x1b2/0x1c0
[ 4.973712] RSP: 0018:ffffc90000227a10 EFLAGS: 00010282
[ 4.973726] RAX: 0000000000000016 RBX: ffff880262aac010 RCX: 0000000000000000
[ 4.973741] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810ed1ab
[ 4.973757] RBP: ffffc90000227a30 R08: 0000000000000001 R09: 0000000000000000
[ 4.973774] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88026f6e7bb0
[ 4.973789] R13: 0000000000000000 R14: ffff88026f6e7b98 R15: ffffffff81a24da0
[ 4.973805] FS: 00007f588d7f58c0(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
[ 4.973823] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.973837] CR2: 00000082482e32a0 CR3: 0000000270531000 CR4: 00000000003406e0
[ 4.973852] Call Trace:
[ 4.973864] __kernfs_create_file+0x71/0xe0
[ 4.973876] sysfs_add_file_mode_ns+0x85/0x1a0
[ 4.973890] internal_create_group+0xe5/0x2b0
[ 4.973903] sysfs_create_group+0xe/0x10
[ 4.973985] i915_perf_register+0xd9/0x220 [i915]
[ 4.974044] i915_driver_load+0xa72/0x16b0 [i915]
[ 4.974124] i915_pci_probe+0x32/0x90 [i915]

Annoyingly detected by CI, but not reported due to it occurring during boot
and disabling lockdep for later runs.

Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Andrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803223700.10329-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>


# f89823c2 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface

The motivation behind this new interface is expose at runtime the
creation of new OA configs which can be used as part of the i915 perf
open interface. This will enable the kernel to learn new configs which
may be experimental, or otherwise not part of the core set currently
available through the i915 perf interface.

v2: Drop DRM_ERROR for userspace errors (Matthew)
Add padding to userspace structure (Matthew)
s/guid/uuid/ (Matthew)

v3: Use u32 instead of int to iterate through registers (Matthew)

v4: Lock access to dynamic config list (Lionel)

v5: by Matthew:
Fix uninitialized error values
Fix incorrect unwiding when opening perf stream
Use kmalloc_array() to store register
Use uuid_is_valid() to valid config uuids
Declare ioctls as write only
Check padding members are set to 0
by Lionel:
Return ENOENT rather than EINVAL when trying to remove non
existing config

v6: by Chris:
Use ref counts for OA configs
Store UUID in drm_i915_perf_oa_config rather then using pointer
Shuffle fields of drm_i915_perf_oa_config to avoid padding

v7: by Chris
Rename uapi pointers fields to end with '_ptr'

v8: by Andrzej, Marek, Sebastian
Update register whitelisting
by Lionel
Add more register names for documentation
Allow configuration programming in non-paranoid mode
Add support for value filter for a couple of registers already
programmed in other part of the kernel

v9: Documentation fix (Lionel)
Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej)

v10: Perform read access_ok() on register pointers (Lionel)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Reviewed-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com


# 28964cf2 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: disable NOA logic when not used

We already do it on Haswell and the documentation says it saves power.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-5-lionel.g.landwerlin@intel.com


# 3802c5cb 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: leave GDT_CHICKEN_BITS programming in configs

There will be a need for userspaces configurations to set this
register. We can apply the same model inside the kernel for test
configs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-4-lionel.g.landwerlin@intel.com


# 701f8231 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: prune OA configs

In the following commit we'll introduce loadable userspace
configs. This change reworks how configurations are handled in the
perf driver and retains only the test configurations in kernel space.

We now store the test config in dev_priv and resolve the id only once
when opening the perf stream. The OA config is then handled through a
pointer to the structure holding the configuration details.

v2: Rework how test configs are handled (Lionel)

v3: Use u32 to hold number of register (Matthew)

v4: Removed unused dev_priv->perf.oa.current_config variable (Matthew)

v5: Lock device when accessing exclusive_stream (Lionel)

v6: Ensure OACTXCONTROL is always reprogrammed (Lionel)

v7: Switch a couple of index variable from int to u32 (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-3-lionel.g.landwerlin@intel.com


# 01d928e9 03-Aug-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: fix flex eu registers programming

We were reserving fewer dwords in the ring than necessary. Indeed
we're always writing all registers once, so discard the actual number
of registers given by the user and just program the whitelisted ones
once.

Fixes: 19f81df2859e ("drm/i915/perf: Add OA unit support for Gen 8+")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com


# 635f56c3 14-Jul-2017 Imre Deak <imre.deak@intel.com>

drm/i915: Fix error checking/locking in perf/lookup_context()

1acfc104cdf8 missed to convert this one caller to be lockless. The side
effect of that was that the error check in lookup_context() became
incorrect. Convert now this caller too.

Fixes: 1acfc104cdf ("drm/i915: Enable rcu-only context lookups")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170714151242.517-1-imre.deak@intel.com


# 04941829 27-Jun-2017 sagar.a.kamble@intel.com <sagar.a.kamble@intel.com>

drm/i915: Hold RPM wakelock while initializing OA buffer

OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d79651522e89c ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+
(cherry picked from commit 987f8c444aa2c33d98e7030d0c5f0a5325cc84ea)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 987f8c44 27-Jun-2017 sagar.a.kamble@intel.com <sagar.a.kamble@intel.com>

drm/i915: Hold RPM wakelock while initializing OA buffer

OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d79651522e89c ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+


# 5f09a9c8 19-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow contexts to be unreferenced locklessly

If we move the actual cleanup of the context to a worker, we can allow
the final free to be called from any context and avoid undue latency in
the caller.

v2: Negotiate handling the delayed contexts free by flushing the
workqueue before calling i915_gem_context_fini() and performing the final
free of the kernel context directly
v3: Flush deferred frees before new context allocations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk


# 829a0af2 19-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Group all the global context information together

Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk


# 28c7ef9e 12-Jun-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add GLK support

Add OA support for Geminilake (pretty much identical to Broxton), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com


# 6c5c1d89 12-Jun-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: add KBL support

Add OA support for Kabylake (pretty much identical to Skylake), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 1bef3409 12-Jun-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: remove perf.hook_lock

In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 155e941f 12-Jun-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: per-gen timebase for checking sample freq

An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 19f81df2 12-Jun-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: Add OA unit support for Gen 8+

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
(Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
batchbuffer programming the OA unit doesn't actually work)
(Lionel)
Pin context before updating context image (Chris)
Drop MMIO programming now that we switch to a kernel context with
right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
user contexts and reconfiguring the kernel context through the
batchbuffer and updating all the other contexts' context image.
Also take care to lock slice/subslice configuration when OA is
on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
(Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
configuration changes in a later patch (Lionel)
Remove usleep after programming the NOA configs on Gen8+, this
doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
configuration (Matthew)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 3f488d99 12-Jun-2017 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: rework mux configurations queries

Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.

v2: s/n_mux_regs/n_mux_configs/ (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 712122ea 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: rate limit spurious oa report notice

This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-9-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4117ebc7 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: better pipeline aged/aging tail updates

This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-8-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 52c57c26 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: improve invalid OA format debug message

A minor improvement to debugging output

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-7-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0dd860cf 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: improve tail race workaround

There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-6-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 3bb335c1 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: no head/tail ref in gen7_oa_read

This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-5-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# f279020a 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: avoid read back of head register

There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-4-lionel.g.landwerlin@intel.com


# 26ebd9c7 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: avoid poll, read, EAGAIN busy loops

If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-3-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e81b3a55 11-May-2017 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: fix gen7_append_oa_reports comment

If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-2-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 266a240b 04-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use engine->context_pin() to report the intel_ring

Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk


# aa62acfd 27-Mar-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915/perf: remove user triggerable warn

Don't throw a warning if we are given an invalid property id. While
here let's also bring back Robert' original idea of catching unhandled
enumeration values at compile time.

Fixes: eec688e1420d ("drm/i915: Add i915 perf infrastructure")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com
(cherry picked from commit 0a309f9e3dfaa4f5db0bf1b0cab54571744b491a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4e5f713f 27-Mar-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915/perf: destroy stream on sample_flags mismatch

If we were to ever encounter a sample_flags mismatch we need to ensure
we destroy the stream when we bail.

Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com
(cherry picked from commit 22f880ca8246c6c80c4f48731c6a7d5d15042f56)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 0a309f9e 27-Mar-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915/perf: remove user triggerable warn

Don't throw a warning if we are given an invalid property id. While
here let's also bring back Robert' original idea of catching unhandled
enumeration values at compile time.

Fixes: eec688e1420d ("drm/i915: Add i915 perf infrastructure")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com


# 22f880ca 27-Mar-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915/perf: destroy stream on sample_flags mismatch

If we were to ever encounter a sample_flags mismatch we need to ensure
we destroy the stream when we bail.

Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com


# 67520415 02-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: s/assert_spin_locked/lockdep_assert_held/

assert_spin_locked() becomes an unconditionally compiled BUG_ON(),
adding debug code right into the heart of critical routines like
interrupt handlers.

text data bss dec hex
1296480 19944 2272 1318696 141f28 before (lockdep disabled)
1295984 19944 2272 1318200 141d38 after

1336261 21139 3208 1360608 14c2e0 before (lockdep enabled)
1339920 21139 3208 1364267 14d12b after

Small saving for release; hopefully more instructive in debug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 69df05e1 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify releasing context reference

A few users only take the struct_mutex in order to release a reference
to a context. We can expose a kref_put_mutex() wrapper in order to
simplify these users, and optimise taking of the mutex to the final
unref.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk


# e8a9c58f 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unify active context tracking between legacy/execlists/guc

The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk


# 16d98b31 07-Dec-2016 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: More documentation hooked to i915.rst

This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

v2:
section headers in i915.rst (Daniel Vetter)
missing symbol docs + other fixups (Matthew Auld)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org


# 7708550c 01-Dec-2016 Robert Bragg <robert@sixbynine.org>

drm/i915/perf: use DRM_DEBUG for userspace issues

Avoid using DRM_ERROR for conditions userspace can trigger with a bad
config when opening a stream or from not reading data in a timely
fashion (whereby the OA buffer fills up). These conditions are tested
by i-g-t which treats error messages as failures if using the test
runner. This wasn't an issue while the i915-perf igt tests were being
run in isolation.

One message relating to seeing a spurious zeroed report was changed to
use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
seen, but it's not a serious problem if it is. Considering that the
tail margin mechanism is only a heuristic it's possible we might see
this from time to time.

Signed-off-by: Robert Bragg <robert@sixbynine.org:
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org


# 12d79d78 01-Dec-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make GEM object create and create from data take dev_priv

Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)


# 24603935 23-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Wrap 64bit divides in do_div()

Just a couple of naked 64bit divides causing link errors on 32bit
builds, with:

ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!

v2: do_div() is only u64/u32, we need a u32/u64!
v3: div_u64() == u64/u32, div64_u64() == u64/u64

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Robert Bragg <robert@sixbynine.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161123150714.24449-1-chris@chris-wilson.co.uk
Reviewed-by: Robert Bragg <robert@sixbynine.org>


# 7abbd8d6 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: Add a kerneldoc summary for i915_perf.c

In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-12-robert@sixbynine.org


# 00319ba0 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: add dev.i915.oa_max_sample_rate sysctl

The maximum OA sampling frequency is now configurable via a
dev.i915.oa_max_sample_rate sysctl parameter.

Following the precedent set by perf's similar
kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-10-robert@sixbynine.org


# ccdf6341 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: Add dev.i915.perf_stream_paranoid sysctl option

Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org


# 442b8c06 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: advertise available metrics via sysfs

Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics/<guid>/id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

https://github.com/rib/gputop
> gputop-data/guids.xml
> scripts/update-guids.py
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py

$ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org


# d7965152 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: Enable i915 perf stream for Haswell OA unit

Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
Make sure to initialize ->specific_ctx_id when opening, without
relying on _pin_notify hook, in case ctx already pinned.
v3:
Revert back to pinning ctx upfront when opening stream, removing
need to hook in to pinning and to update OACONTROL on the fly.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org


# eec688e1 07-Nov-2016 Robert Bragg <robert@sixbynine.org>

drm/i915: Add i915 perf infrastructure

Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

uint64_t properties[] = {
/* Single context sampling */
DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle,

/* Include OA reports in samples */
DRM_I915_PERF_PROP_SAMPLE_OA, true,

/* OA unit configuration */
DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id,
DRM_I915_PERF_PROP_OA_FORMAT, report_format,
DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent,
};
struct drm_i915_perf_open_param parm = {
.flags = I915_PERF_FLAG_FD_CLOEXEC |
I915_PERF_FLAG_FD_NONBLOCK |
I915_PERF_FLAG_DISABLED,
.properties_ptr = (uint64_t)properties,
.num_properties = sizeof(properties) / 16,
};
int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v2:
use i915_gem_context_get() - Chris Wilson
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org