The hardware performance monitor (HPM) includes up to 29 64-bit event counters, mhpmcounter3 – mhpmcounter31. The event selector registers, mhpmevent3 – mhpmevent31, are read/write registers that control which event causes the corresponding counter to increment.
Each event selector register is divided into a class, and individual events. Within each class, one or more of the events can be counted. Only events that are defined within a class can be set.
The latency class requires using an event counter implemented as a latency counter. For this class, it is not recommended to count more than one event with each counter.
Event 0 is defined to mean “no event”.
| Bits | Name | Description | Reset Value | |
|---|---|---|---|---|
| 24:5 | Event | Event mask. Defined for each event class. | 0 | |
| 4:1 | Class | 0 1 2 3 4 5 |
Retired instructions Branches and traps Instruction and data cache Pipeline stalls Miscellaneous Latency |
0 |
| 0 | No Event | Set to represent no event | 0 | |
| Event Bit | Description |
|---|---|
| 5 | Integer load instruction retired |
| 6 | Integer store instruction retired |
| 7 | Atomic instruction retired |
| 8 | System instruction retired |
| 9 | Integer arithmetic instruction retired |
| 10 | Integer multiply instruction retired |
| 11 | Integer divide/remainder instruction retired |
| 12 | Custom instruction retired |
| 13 | Bit manipulation instruction retired |
| 14 | Compressed instructions retired |
| 15 | JAL instruction retired |
| 16 | JALR instruction retired |
| 17 | Floating point load instruction retired |
| 18 | Floating point store instruction retired |
| 19 | Floating point add/sub instruction retired |
| 20 | Floating point multiply instruction retired |
| 21 | Floating point divide instruction retired |
| 22 | Floating point fused instruction retired |
| 23 | Floating point other instruction retired |
| 24 | Cache invalidate or flush retired |
| Event Bit | Description |
|---|---|
| 5 | Taken conditional branch |
| 6 | Not taken conditional branch |
| 7 | Exception taken |
| 8 | Interrupt occurred |
| 9 | Branch target cache hit |
| 10 | Branch target mispredict |
| Event Bit | Description |
|---|---|
| 5 | Data request from instruction cache |
| 6 | Hit in instruction cache |
| 7 | Read data requested from data cache |
| 8 | Read data hit in data cache |
| 9 | Write data request from data cache |
| 10 | Write data hit in data cache |
| Event Bit | Description |
|---|---|
| 6 | Pipeline stalled due to operand fetch stage (OF) |
| 7 | Pipeline stalled due to execute stage (EX) |
| 8 | Pipeline stalled due to memory stage:
|
| Event Bit | Description |
|---|---|
| 5 | Divide/remainder by zero operation |
| 6 | Floating-point subnormal result |
| Event Bit | Description |
|---|---|
| 5 | Interrupt: total sum Interrupt: max (31:16) and min (15:0) |
| 7 | Data cache memory read: total sum Data cache memory read: max (31:16) and min (15:0) |
| 9 | Data cache memory write: total sum Data cache memory write:max (31:16) and min (15:0) |
| 11 | Instruction cache memory read: total sum Instruction cache memory read: max (31:16) and min (15:0) |
| 13 | Peripheral AXI data read: total sum Peripheral AXI data read: max (31:16) and min (15:0) |
| 15 | Peripheral AXI data write: total sum Peripheral AXI data write: max (31:16) and min (15:0) |
The number of event counters and event selector registers is set by C_DEBUG_EVENT_COUNTERS + 2 * C_DEBUG_LATENCY_COUNTERS. The lower registers are implemented as event
counters, whereas the higher are implemented as pairs of latency counters,
consisting of the latency sum and min/max latency. The selected event for a pair of
latency counters is set in the corresponding event selector register for the first
counter in the pair.
An example with C_DEBUG_EVENT_COUNTERS = 5
and C_DEBUG_LATENCY_COUNTERS = 2 illustrates how
the event counters are allocated.
| Event Counter | Kind | Description |
|---|---|---|
| mhpmcounter3 | Event Counters | Used to count events from Class 0 – 4 |
| mhpmcounter4 | ||
| mhpmcounter5 | ||
| mhpmcounter6 | ||
| mhpmcounter7 | ||
| mhpmcounter8 | Latency Counter: Total sum | Used to count a latency event from Class 5 Use mhpmevent8 to set events for both counters. |
| mhpmcounter9 | Latency Counter: Min/Max | |
| mhpmcounter10 | Latency Counter: Total sum | Used to count a latency event from Class 5. Use mhpmevent10 to set events for both counters. |
| mhpmcounter11 | Latency Counter: Min/Max |