The function generators (LUTs) in SLICEMs can be implemented as a synchronous RAM resource called a distributed RAM element. Multiple LUTs in a SLICEM can be combined in various ways to store larger amount of data. RAM elements are configurable within a SLICEM to implement these configurations:
- Single-Port 32 x 1-bit RAM
- Dual-Port 32 x 1-bit RAM
- Quad-Port 32 x 2-bit RAM
- Simple Dual-Port 32 x 6-bit RAM
- Single-Port 64 x 1-bit RAM
- Dual-Port 64 x 1-bit RAM
- Quad-Port 64 x 1-bit RAM
- Simple Dual-Port 64 x 3-bit RAM
- Single-Port 128 x 1-bit RAM
- Dual-Port 128 x 1-bit RAM
- Single-Port 256 x 1-bit RAM
Distributed RAM modules are synchronous (write) resources. A synchronous read can be implemented with a flip-flop in the same slice. By using this flip-flop, the distributed RAM performance is improved by decreasing the delay into the clock-to-out value of the flip-flop. However, an additional clock latency is added. The distributed elements share the same clock input. For a write operation, the Write Enable (WE) input, driven by either the CE or WE pin of a SLICEM, must be set High.
The following table shows the number of LUTs (four per slice) occupied by each distributed RAM configuration. See Vivado Design Suite 7 Series FPGA and Zynq 7000 SoC Libraries Guide (UG953) for details of available distributed RAM primitives.
| RAM | Description | Primitive | Number of LUTs |
|---|---|---|---|
| 32 x 1S | Single port | RAM32X1S | 1 |
| 32 x 1D | Dual port | RAM32X1D | 2 |
| 32 x 2Q | Quad port | RAM32M | 4 |
| 32 x 6SDP | Simple dual port | RAM32M | 4 |
| 64 x 1S | Single port | RAM64X1S | 1 |
| 64 x 1D | Dual port | RAM64X1D | 2 |
| 64 x 1Q | Quad port | RAM64M | 4 |
| 64 x 3SDP | Simple dual port | RAM64M | 4 |
| 128 x 1S | Single port | RAM128X1S | 2 |
| 128 x 1D | Dual port | RAM128X1D | 4 |
| 256 x 1S | Single port | RAM256X1S | 4 |
Distributed RAM configurations include:
- Single port
- Common address port for synchronous writes and asynchronous reads
- Read and write addresses share the same address bus
- Common address port for synchronous writes and asynchronous reads
- Dual port
- One port for synchronous writes and asynchronous reads
- One function generator is connected with the shared read and write port address
- One port for asynchronous reads
- Second function generator has the A inputs connected to a second read-only port address, and the WA inputs are shared with the first read/write port address
- One port for synchronous writes and asynchronous reads
- Simple dual port
- One port for synchronous writes (no data out/read port from the write port)
- One port for asynchronous reads
- Quad port
- One port for synchronous writes and asynchronous reads
- Three ports for asynchronous reads
As shown in Figure 1, the common write port W6:W1 (WA[6:1] in the following figures) is always physically driven by the inputs to the D LUT using D[6:1]. The read ports are independent for each of the four LUTs. Therefore the D LUT is always effectively single port, even if DPRAM64 is selected as the LUT configuration. The other three LUTs are always effectively dual port, although SPRAM32 can be selected when the read and write addresses are connected together.
Figure 1 through Figure 9 illustrate various example distributed RAM configurations occupying one SLICEM.
When using x2 configurations (as in 32 X 2 Quad Port in Figure 1), A6 and WA6 are driven High by the software to keep O5 and O6 independent.
If four single-port 64 x 1-bit modules are each built as shown in Figure 3, the four RAM64X1S primitives can occupy a SLICEM, as long as they share the same clock, write enable, and shared read and write port address inputs. This configuration equates to a 64 x 4-bit single-port distributed RAM.
If two dual-port 64 x 1-bit modules are each built as shown in Figure 4, the two RAM64X1D primitives can occupy a SLICEM, as long as they share the same clock, write enable, and shared read and write port address inputs. This configuration equates to a 64 x 2-bit dual-port distributed RAM.
Implementation of distributed RAM configurations with depth greater than 64 requires the usage of wide-function multiplexers (F7AMUX, F7BMUX, and F8MUX), as shown in Figure 7 through Figure 9.
If two single-port 128 x 1-bit modules are each built as shown in Figure 7, the two RAM128X1S primitives can occupy a SLICEM, as long as they share the same clock, write enable, and shared read and write port address inputs. This configuration equates to 128 x 2-bit single-port distributed RAM.
Distributed RAM configurations greater than the provided examples require more than one SLICEM. There are no direct connections between slices to form larger distributed RAM configurations within a CLB or between slices.