With extended debugging, MicroBlaze provides
performance monitoring counters to count various events and to measure latency during
program execution. The number of event counters and latency counters can be configured
with C_DEBUG_EVENT_COUNTERS
and C_DEBUG_LATENCY_COUNTERS
respectively, and the counter width can be set to
32, 48, or 64 bits with C_DEBUG_COUNTER_WIDTH
. With the
default configuration, the counter width is set to 32 bits and there are five event
counters and one latency counter.
An event counter simply counts the number of times a certain event has occurred, whereas a latency counter provides the following information:
- Number of times the event has occurred (N)
- The sum of each event latency measured by counting clock cycles from the event starts until it finishes (ΣL), used to calculate the mean latency
- The sum of each event latency squared (ΣL2 ), used to calculate the latency standard deviation
- The minimum, shortest, measured latency for all events (Lmin )
- The maximum, longest, measured latency for all events (Lmax )
The mean latency (μ) is calculated by the following formula:
The standard deviation (σ) of the latency is calculated by the following formula:
Counting can be started or stopped using the Performance Counter Command Register or by cross trigger events (see Table 1).
When configuring, reading or writing counters, they are accessed sequentially through the performance counter registers. After every access the selected counter item is incremented.
All counters are sampled simultaneously for reading using the Performance Counter Command Register. This can be done while counting, or after counting has been stopped.
When an event counter reaches its maximum value, the overflow
status bit is set, and the external interrupt signal Dbg_Intr
is set to one. The interrupt signal is reset to zero by clearing
the counters using the Performance Counter Command Register.
By using one of the event counters to count number of clock cycles, and initializing this counter to overflow after a predetermined sampling interval, the external interrupt can be used to periodically sample the performance counters.
The available events are described in Table 1, listed in numerical order.
A typical procedure to follow when initializing and using the performance monitoring counters is delineated in the steps below.
- Initialize the events to be monitored:
- Use the Performance Counter Command Register to reset the selected counter to the first counter, by setting the Reset bit.
- Write the desired event numbers for all counters in order, using the Performance Counter Control Register. With the default configuration this means writing the register five times for the event counters and then once for the latency counter.
- Clear all counters and start monitoring using the Performance Command Register, by setting the Clear and Start bits.
- Run the program or function to be monitored.
- Sample counters and stop monitoring using the Performance Command Register, by setting the Sample and Stop bits.
- Read the results from all counters:
- Use the Performance Command Register to reset the selected counter to the first counter, by setting the Reset bit.
- Read the status for all counters in order, using the Performance Counter Status Register. With the default configuration this means reading the register five times for the event counters and then once for the latency counter. Ensure that the result is valid by checking that the overflow and full bits are not set.
- Use the Performance Command Register to reset the selected counter to the first counter, by setting the Reset bit.
- Read the counter items for all counters in order, using the Performance Counter Data Read Register. With the default configuration this means reading the register five times for the event counters and then four times for the latency counter as described in Performance Counter Data Read Register.
- Calculate the final results, depending on the measured events, for
example:
- Use the formulas above to determine the mean latency and standard deviation for any measured latency.
- The clock cycles per instruction (CPI) can be calculated by E30 / E0 .
- The instruction and data cache hit rates can be calculated by E11 / E10 and E47 / E46 .
- The instruction cache miss latency is determined by (E60(ΣL) - E60(N)) / (E10 - E11), and equivalent formulas can be used to determine the data cache read and write miss latencies.
- The ratio of floating-point instructions in a program is E29/E0 .
Event | Description | Event | Description |
---|---|---|---|
Event Counter Events | |||
0 | Any valid instruction executed | 29 | Floating-point (fadd, ..., fsqrt ) |
1 | Load word (lw, lwi,
lwx ) executed |
30 | Number of clock cycles |
2 | Load halfword (lhu,
lhui ) executed |
31 | Immediate (imm ) executed |
3 | Load byte (lbu,
lbui ) executed |
32 | Pattern compare (pcmpbf, pcmpeq, pcmpne ) |
4 | Store word (sw,
swi, swx ) executed |
33 | Sign extend instructions (sext8, sext16 ) executed |
5 | Store halfword (sh, shi ) executed |
34 | Instruction cache invalidate (wic ) executed |
6 | Store byte (sb,
sbi ) executed |
35 | Data cache invalidate or flush (wdc ) executed |
7 | Unconditional branch (br, bri, brk, brki ) executed |
36 | Machine status instructions (msrset, msrclr ) |
8 | Taken conditional branch (beq, ..., bnei ) executed |
37 | Unconditional branch with delay slot executed |
9 | Not taken conditional branch (beq,..., bnei ) executed |
38 | Taken conditional branch with delay slot executed |
10 | Data request from instruction cache | 39 | Not taken conditional branch with delay slot |
11 | Hit in instruction cache | 40 | Delay slot with no operation instruction executed |
12 | Read data requested from data cache | 41 | Load instruction (lbu, ..., lwx ) executed |
13 | Read data hit in data cache | 42 | Store instruction (sb, ..., swx ) executed |
14 | Write data request to data cache | 43 | MMU data access request |
15 | Write data hit in data cache | 44 | Conditional branch (beq, ..., bnei ) executed |
16 | Load (lbu, ...,
lwx ) with r1 as operand executed |
45 | Branch (br, bri,
brk, brki, beq, ..., bnei ) executed |
17 | Store (sb, ...,
swx ) with r1 as operand executed |
46 | Read or write data request from/to data cache |
18 | Logical operation (and, andn, or, xor ) executed |
47 | Read or write data cache hit |
19 | Arithmetic operation (add, idiv, mul, rsub ) executed |
48 | MMU exception taken |
20 | Multiply operation (mul, mulh, mulhu, mulhsu, muli ) |
49 | MMU instruction side exception taken |
21 | Barrel shifter operation (bsrl, bsra, bsll ) executed |
50 | MMU data side exception taken |
22 | Shift operation (sra, src, srl ) executed |
51 | Pipeline stalled |
23 | Exception taken | 52 | Branch target cache hit for a branch or return |
24 | Interrupt occurred | 53 | MMU instruction side access request |
25 | Pipeline stalled due to operand fetch stage (OF) | 54 | MMU instruction TLB (ITLB) hit |
26 | Pipeline stalled due to execute stage (EX) | 55 | MMU data TLB (DTLB) hit |
27 | Pipeline stalled due to memory stage (MEM) | 56 | MMU unified TLB (UTLB) hit |
28 | Integer divide (idiv, idivu ) executed |
||
Latency and Event Counter events | |||
57 | Interrupt latency from input to interrupt vector | 61 | MMU address lookup latency |
58 | Data cache latency for memory read | 62 | Peripheral AXI interface data read latency |
59 | Data cache latency for memory write | 63 | Peripheral AXI interface data write latency |
60 | Instruction cache latency for memory read |
The debug registers used to configure and control performance monitoring, and to read or write the event and latency counters, are listed in the following table. All of these registers except the Performance Counter Command register are accessed repeatedly to read or write information, first for all of the event counters followed by all of the latency counters.
The DBG_CTRL
value indicates the value
to use in the MDM Debug Register Access Control Register to access the register, used
with MDM software access to debug registers.
Register Name | Size (bits) | MDM Command | DBG_CTRL Value | R/W | Description |
---|---|---|---|---|---|
Performance Counter Control | 8 | 0101 0001 | 4A207 | W | Select event for each configured counter, according to the previous table |
Performance Counter Command | 5 | 0101 0010 | 4A404 | W | Command to clear counters, start or stop counting, or sample counters |
Performance Counter Status | 2 | 0101 0011 | 4A601 | R | Read the sampled status for each configured performance counter |
Performance Counter Data Read | 32 | 0101 0110 | 4AC1F | R | Read the sampled values for each configured performance counter |
Performance Counter Data Write | 32 | 0101 0111 | 4AE1F | W | Write initial values for each configured performance counter |