Performance Monitoring - 2024.2 English

MicroBlaze Processor Reference Guide (UG984)

Document ID
UG984
Release Date
2024-11-27
Version
2024.2 English

With extended debugging, MicroBlaze provides performance monitoring counters to count various events and to measure latency during program execution. The number of event counters and latency counters can be configured with C_DEBUG_EVENT_COUNTERS and C_DEBUG_LATENCY_COUNTERS respectively, and the counter width can be set to 32, 48, or 64 bits with C_DEBUG_COUNTER_WIDTH. With the default configuration, the counter width is set to 32 bits and there are five event counters and one latency counter.

An event counter simply counts the number of times a certain event has occurred, whereas a latency counter provides the following information:

  • Number of times the event has occurred (N)
  • The sum of each event latency measured by counting clock cycles from the event starts until it finishes (ΣL), used to calculate the mean latency
  • The sum of each event latency squared (ΣL2 ), used to calculate the latency standard deviation
  • The minimum, shortest, measured latency for all events (Lmin )
  • The maximum, longest, measured latency for all events (Lmax )

The mean latency (μ) is calculated by the following formula:

Figure 1. Mean Latency

The standard deviation (σ) of the latency is calculated by the following formula:

Figure 2. Standard Deviation of Latency

Counting can be started or stopped using the Performance Counter Command Register or by cross trigger events (see Table 1).

When configuring, reading or writing counters, they are accessed sequentially through the performance counter registers. After every access the selected counter item is incremented.

All counters are sampled simultaneously for reading using the Performance Counter Command Register. This can be done while counting, or after counting has been stopped.

When an event counter reaches its maximum value, the overflow status bit is set, and the external interrupt signal Dbg_Intr is set to one. The interrupt signal is reset to zero by clearing the counters using the Performance Counter Command Register.

By using one of the event counters to count number of clock cycles, and initializing this counter to overflow after a predetermined sampling interval, the external interrupt can be used to periodically sample the performance counters.

The available events are described in Table 1, listed in numerical order.

A typical procedure to follow when initializing and using the performance monitoring counters is delineated in the steps below.

  1. Initialize the events to be monitored:
    • Use the Performance Counter Command Register to reset the selected counter to the first counter, by setting the Reset bit.
    • Write the desired event numbers for all counters in order, using the Performance Counter Control Register. With the default configuration this means writing the register five times for the event counters and then once for the latency counter.
  2. Clear all counters and start monitoring using the Performance Command Register, by setting the Clear and Start bits.
  3. Run the program or function to be monitored.
  4. Sample counters and stop monitoring using the Performance Command Register, by setting the Sample and Stop bits.
  5. Read the results from all counters:
    • Use the Performance Command Register to reset the selected counter to the first counter, by setting the Reset bit.
    • Read the status for all counters in order, using the Performance Counter Status Register. With the default configuration this means reading the register five times for the event counters and then once for the latency counter. Ensure that the result is valid by checking that the overflow and full bits are not set.
    • Use the Performance Command Register to reset the selected counter to the first counter, by setting the Reset bit.
    • Read the counter items for all counters in order, using the Performance Counter Data Read Register. With the default configuration this means reading the register five times for the event counters and then four times for the latency counter as described in Performance Counter Data Read Register.
  6. Calculate the final results, depending on the measured events, for example:
    • Use the formulas above to determine the mean latency and standard deviation for any measured latency.
    • The clock cycles per instruction (CPI) can be calculated by E30 / E0 .
    • The instruction and data cache hit rates can be calculated by E11 / E10 and E47 / E46 .
    • The instruction cache miss latency is determined by (E60(ΣL) - E60(N)) / (E10 - E11), and equivalent formulas can be used to determine the data cache read and write miss latencies.
    • The ratio of floating-point instructions in a program is E29/E0 .
Table 1. MicroBlaze Performance Monitoring Events
Event Description Event Description
Event Counter Events
0 Any valid instruction executed 29 Floating-point (fadd, ..., fsqrt)
1 Load word (lw, lwi, lwx) executed 30 Number of clock cycles
2 Load halfword (lhu, lhui) executed 31 Immediate (imm) executed
3 Load byte (lbu, lbui) executed 32 Pattern compare (pcmpbf, pcmpeq, pcmpne)
4 Store word (sw, swi, swx) executed 33 Sign extend instructions (sext8, sext16) executed
5 Store halfword (sh, shi) executed 34 Instruction cache invalidate (wic) executed
6 Store byte (sb, sbi) executed 35 Data cache invalidate or flush (wdc) executed
7 Unconditional branch (br, bri, brk, brki) executed 36 Machine status instructions (msrset, msrclr)
8 Taken conditional branch (beq, ..., bnei) executed 37 Unconditional branch with delay slot executed
9 Not taken conditional branch (beq,..., bnei) executed 38 Taken conditional branch with delay slot executed
10 Data request from instruction cache 39 Not taken conditional branch with delay slot
11 Hit in instruction cache 40 Delay slot with no operation instruction executed
12 Read data requested from data cache 41 Load instruction (lbu, ..., lwx) executed
13 Read data hit in data cache 42 Store instruction (sb, ..., swx) executed
14 Write data request to data cache 43 MMU data access request
15 Write data hit in data cache 44 Conditional branch (beq, ..., bnei) executed
16 Load (lbu, ..., lwx) with r1 as operand executed 45 Branch (br, bri, brk, brki, beq, ..., bnei) executed
17 Store (sb, ..., swx) with r1 as operand executed 46 Read or write data request from/to data cache
18 Logical operation (and, andn, or, xor) executed 47 Read or write data cache hit
19 Arithmetic operation (add, idiv, mul, rsub) executed 48 MMU exception taken
20 Multiply operation (mul, mulh, mulhu, mulhsu, muli) 49 MMU instruction side exception taken
21 Barrel shifter operation (bsrl, bsra, bsll) executed 50 MMU data side exception taken
22 Shift operation (sra, src, srl) executed 51 Pipeline stalled
23 Exception taken 52 Branch target cache hit for a branch or return
24 Interrupt occurred 53 MMU instruction side access request
25 Pipeline stalled due to operand fetch stage (OF) 54 MMU instruction TLB (ITLB) hit
26 Pipeline stalled due to execute stage (EX) 55 MMU data TLB (DTLB) hit
27 Pipeline stalled due to memory stage (MEM) 56 MMU unified TLB (UTLB) hit
28 Integer divide (idiv, idivu) executed
Latency and Event Counter events
57 Interrupt latency from input to interrupt vector 61 MMU address lookup latency
58 Data cache latency for memory read 62 Peripheral AXI interface data read latency
59 Data cache latency for memory write 63 Peripheral AXI interface data write latency
60 Instruction cache latency for memory read

The debug registers used to configure and control performance monitoring, and to read or write the event and latency counters, are listed in the following table. All of these registers except the Performance Counter Command register are accessed repeatedly to read or write information, first for all of the event counters followed by all of the latency counters.

The DBG_CTRL value indicates the value to use in the MDM Debug Register Access Control Register to access the register, used with MDM software access to debug registers.

Table 2. MicroBlaze Performance Monitoring Debug Registers
Register Name Size (bits) MDM Command DBG_CTRL Value R/W Description
Performance Counter Control 8 0101 0001 4A207 W Select event for each configured counter, according to the previous table
Performance Counter Command 5 0101 0010 4A404 W Command to clear counters, start or stop counting, or sample counters
Performance Counter Status 2 0101 0011 4A601 R Read the sampled status for each configured performance counter
Performance Counter Data Read 32 0101 0110 4AC1F R Read the sampled values for each configured performance counter
Performance Counter Data Write 32 0101 0111 4AE1F W Write initial values for each configured performance counter