The hardware performance monitor (HPM) includes up to 29 64-bit event counters, mhpmcounter3 – mhpmcounter31. The event selector registers, mhpmevent3 – mhpmevent31, are read/write registers that control which event causes the corresponding counter to increment. If more than one event is enabled in the event register and two or more of the enabled events occur simultaneously, the counter only increments by one.
Each event selector register is divided into a class, and individual events. Within each class, one or more of the events can be counted. Only events that are defined within a class can be set.
The latency class requires using a counter implemented as a latency counter for sum and max/min, but should use an event counter to count the number of events. For this class, it is not recommended to count more than one event with each counter. The class of a latency counter is fixed to 5.
Event 0 is defined to mean “no event.”
If class is set to an undefined value or an individual event bit that is not defined for the class is set, the counter does not increment.
| Bits | Name | Description | Reset Value | |
|---|---|---|---|---|
| 24:5 | Event | Event mask. Defined for each event class. | 0 | |
| 4:1 | Class | 0 1 2 3 4 5 |
Retired instructions Branches and traps Instruction and data cache Pipeline stalls Miscellaneous Latency |
0 |
| 0 | No Event | Set to represent no event | 0 | |
| Event Bit | Description |
|---|---|
| 5 | Integer load instruction retired |
| 6 | Integer store instruction retired |
| 7 | Atomic instruction retired |
| 8 | System instruction retired, including ECALL and EBREAK |
| 9 | Integer arithmetic instruction, including C.NOP retired |
| 10 | Integer multiply instruction retired |
| 11 | Integer divide/remainder instruction retired |
| 12 | Custom instruction retired |
| 13 | Bit manipulation instruction retired |
| 14 | Compressed instructions retired |
| 15 | JAL or C.J instruction retired |
| 16 | JALR or C.JR instruction retired |
| 17 | Floating point load instruction retired |
| 18 | Floating point store instruction retired |
| 19 | Floating point add/sub instruction retired |
| 20 | Floating point multiply instruction retired |
| 21 | Floating point divide instruction retired |
| 22 | Floating point fused instruction retired |
| 23 | Floating point other instruction retired |
| 24 | Cache invalidate or flush retired |
| Event Bit | Description |
|---|---|
| 5 | Taken conditional branch |
| 6 | Not taken conditional branch |
| 7 | Exception taken |
| 8 | Interrupt occurred |
| 9 | Branch target cache hit |
| 10 | Branch target mispredict |
| Event Bit | Description |
|---|---|
| 5 | Data request from instruction cache |
| 6 | Hit in instruction cache |
| 7 | Read data requested from data cache |
| 8 | Read data hit in data cache |
| 9 | Write data request from data cache |
| 10 | Write data hit in data cache |
| Event Bit | Description |
|---|---|
| 6 | Pipeline stalled due to operand fetch stage (OF) |
| 7 | Pipeline stalled due to execute stage (EX) |
| 8 | Pipeline stalled due to memory stage:
|
| Event Bit | Description |
|---|---|
| 5 | Divide/remainder by zero operation |
| 6 | Floating-point subnormal result |
| Event Bit | Description |
|---|---|
| 5 | Interrupt: total sum Interrupt: max (31:16) and min (15:0) |
| 7 | Data cache memory read: total sum Data cache memory read: max (31:16) and min (15:0) |
| 9 | Data cache memory write: total sum Data cache memory write:max (31:16) and min (15:0) |
| 11 | Instruction cache memory read: total sum Instruction cache memory read: max (31:16) and min (15:0) |
| 13 | Peripheral AXI data read: total sum Peripheral AXI data read: max (31:16) and min (15:0) |
| 15 | Peripheral AXI data write: total sum Peripheral AXI data write: max (31:16) and min (15:0) |
The number of event counters and event selector registers is set by
C_DEBUG_EVENT_COUNTERS + 2 * C_DEBUG_LATENCY_COUNTERS. The lower registers are
implemented as event counters, whereas the higher are implemented as pairs of
latency counters, consisting of the latency sum and min/max latency. The selected
event for a pair of latency counters is set in the corresponding event selector
register for the first counter in the pair.
An example with C_DEBUG_EVENT_COUNTERS = 5 and C_DEBUG_LATENCY_COUNTERS = 2 illustrates how the event counters are
allocated.
| Event Counter | Kind | Description |
|---|---|---|
| mhpmcounter3 | Event Counters | Used to count events from Class 0 – 5. |
| mhpmcounter4 | ||
| mhpmcounter5 | ||
| mhpmcounter6 | ||
| mhpmcounter7 | ||
| mhpmcounter8 | Latency Counter: Total sum | Used to count a latency event from Class 5. Use mhpmevent8 to set events for both counters. |
| mhpmcounter9 | Latency Counter: Min/Max | |
| mhpmcounter10 | Latency Counter: Total sum | Used to count a latency event from Class 5. Use mhpmevent10 to set events for both counters. |
| mhpmcounter11 | Latency Counter: Min/Max |