AMD Versal™ adaptive SoCs combine programmable logic (PL), processing system (PS), and AI Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. The hardware and software are targeted for programming and optimization by data scientists and software and hardware developers. A host of tools, software, libraries, IP, middleware, and frameworks enable Versal adaptive SoCs to support all industry-standard design flows.
The Versal portfolio is the first platform to combine software programmability and domain-specific hardware acceleration with the adaptability necessary to meet today's rapid pace of innovation. The portfolio is uniquely architected to deliver scalability and AI inference capabilities for a host of applications across different markets—from cloud—to networking—to wireless communications—to edge computing and endpoints.
The Versal architecture has a wealth of connectivity and communication capability and a programmable network on chip (NoC) to enable seamless memory-mapped access to the full height and width of the device. AI Engines are SIMD VLIW vector processors for adaptive inference and advanced signal processing compute. The PL combines configurable logic blocks, memory, and DSP Engines architected for high-compute density. The PS includes application and real-time processors from Arm® for intensive compute tasks.
AI Engines
The Versal AI Core Series, AI Edge Series, and AI Edge Series Gen 2 deliver breakthrough AI inference acceleration with AI Engines. This series is designed for a breadth of applications, including cloud for dynamic workloads and network for massive bandwidth, all while delivering advanced safety and security features. AI/data scientists, software/hardware developers benefit from high compute density to accelerate the performance of any application. Given the AI Engine's advanced signal processing compute capability, it is well-suited for highly optimized wireless applications such as radio, 5G, backhaul, and other high-performance DSP applications.
AI Engines are an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML).
AI Engines are hardened blocks that provide multiple levels of parallelism including instruction-level and data-level parallelism. Instruction-level parallelism includes a scalar operation, up to two moves, two vector reads (loads), one vector write (store), and one vector instruction that can be executed—in total, a 7-way VLIW instruction per clock cycle. Data-level parallelism is achieved via vector-level operations where multiple sets of data can be operated on a per-clock-cycle basis. Each AI Engine contains both a vector and scalar processor, dedicated program memory, local data memory, and can access adjacent local memory in any of three neighboring directions. It also has access to DMA engines and AXI4 interconnect switches to communicate via streams to other AI Engines or to the programmable logic (PL) or the DMA. Refer to the Versal Adaptive SoC AI Engine Architecture Manual (AM009) for specific details on the AI Engine array and interfaces.
The AI Engine-ML (AIE-ML) block is capable of delivering 2x compute throughput compared to its predecessor AI Engine blocks. The AIE-ML block, primarily targeted for machine learning inference applications, delivers one of the industry's best performance per Watt for a wide range of inference applications. Refer to the Versal Adaptive SoC AIE-ML Architecture Manual (AM020) for specific details on the AIE-ML features and architecture.
As an application developer, it is possible to use one of the white or black box flows for running a ML inference application on AIE-ML variants. With the white box flow you can integrate custom kernels and dataflow graphs in the AIE-ML variants programming environment. A black box flow uses performance optimized Neural Processing Unit (NPU) IP from AMD to accelerate ML workloads in the AIE-ML variants.
AMD Vitis™ AI is used as a front-end tool that parses the network graph, performs optimization, quantization of the graph, and generates compiled code that can be run on the AIE-ML variants hardware. The AIE-ML variants core tile architecture provides support for a variety of precision fixed and floating-point datatypes. The architecture allows for pipe-lined vector processing and incorporates high-density, high-speed on-chip memory that can effectively store on-chip tensors. Additionally, it features versatile datamovers that are adept at handling multi-dimensional tensors in memory. With the proper selection of overlay processor architecture and spatial and temporal distribution of the input/output tensor in the on/off-chip memory, it is possible to achieve high computational efficiency of the AIE-ML variants processing cores.
AI Engine Kernels
AI Engine kernels are implemented in C/C++. Specialized APIs that target the VLIW vector processor are used to write the kernel. The AI Engine kernel code is compiled using the AI Engine compiler that is included in the AMD Vitis™ core development kit. The AI Engine compiler compiles the kernels to produce an ELF file that runs on the AI Engine processors.
AI Engine Graphs
An AI Engine program requires a C++ data flow graph specification. This graph specification can be compiled and executed using the AI Engine compiler. An adaptive data flow (ADF) graph application consists of nodes and edges where nodes represent compute kernel functions, and edges represent data connections. The application compiles kernels to run on the AI Engines.
Refer to AI Engine Kernel and Graph Programming Guide (UG1079) for more information about how to develop, debug and optimize AI Engine kernels and graphs. It also includes information on specialized graph constructs and ways to control the AI Engine graph.
Compiling and Simulating the Program
Compiling an AI Engine Graph Application describes AI Engine compiler compilation types. It includes the options and input files that can be passed in as well as the expected output. You can compile the graph and kernels independently, or as part of a larger system. You can also set up the design to capture and profile event trace data at runtime.
Simulating an AI Engine Graph Application describes the AI Engine simulator in detail, as well as the x86 simulator for functional simulation. The AI Engine simulator simulates the graph application as a standalone entity, or as part of the hardware emulation of a larger system design.
Using the Vitis Unified IDE in the Vitis Reference Guide (UG1702) describes migrating, building, running, and debugging the AI Engine component. Vitis tools use a bottom-up design flow. You can develop components and then integrate them into a top-level system application. Refer to Vitis Reference Guide (UG1702) for a description of building the full system project incorporating the different components described above.
Integrating the AI Engine application into a Versal Design using Vitis
You can integrate AI Engine kernels and graphs into a Versal adaptive SoC system design. This system design includes AI Engine kernels, HLS PL kernels, RTL kernels, and the host application. The Vitis compiler builds this larger Versal system. Refer to the Vitis Reference Guide (UG1702) for a description of building the full system project incorporating the different components described above.
Profiling and Debugging Designs with AI Engine
Performance Analysis of AI Engine Graph Application during Simulation describes how to extract performance data using event tracing when running the hardware emulation build or the hardware build. You can use this data to further optimize the AI Engine kernels and graphs.
Performance Analysis of AI Engine Graph Application on Hardware describes how to profile and extract performance data using event tracing when running the design in hardware.
The Vitis Reference Guide (UG1702) provides information for debugging the AI Engine component from the Vitis Unified IDE.
Debugging System Projects in the Embedded Design Development Using Vitis (UG1701) shows you how to run and use the debug environment. You can run and use the debug environment from the command line, or from the Vitis Unified IDE. Evaluating system performance and debugging the application are key steps to achieving your objectives for the system.
Controlling the AI Engine Graph
Programming the PS Host Application describes the process of creating a host application to control the graph and PL kernels of the system. When your design is deployed in hardware, you can install drivers that support initializing and controlling the graph execution. You can install the drivers via a host application running on the PS, or by loading and running the AI Engine graph at device boot time.
The AI Engine compiler generates application-specific AI Engine control code as part of compiling the AI Engine design graph and kernel code. The AI Engine control code can perform the following operations:
- Control the initial loading of the AI Engine kernels.
- Run the graph for several iterations, update the runtime parameters (RTP)
associated with the graph, exit and reset the AI Engines.Note: A graph can have multiple kernels, input and output ports.
The graph connectivity, which is equivalent to the nets in a data flow graph is located as follows:
- Between the kernels
- Between kernel and input ports
- Between kernel and output ports
You can configure this as a connection.
A graph runs for an iteration when it consumes data samples equal to the window or stream of data expected by the graph kernels. It produces data samples equal to the window or stream of data expected at the output of all the kernels in the graph.
The Vitis core development kit provides platforms for building, simulating, debugging, and deploying your AI Engine designs. These platforms target a specific hardware board (for example, VCK190 board or VEK280 board). It enables development of a design including AI Engine and PL kernels with a host application. The host application targets the Linux OS running on the Arm processor in the PS. You can verify designs on this platform using the hardware emulation flow and run them on the target hardware board.
AI Engine Methodology
Mapper/Router Methodology describes the mapper and router methodology used to handle failures in the AI Engine compiler during the mapper and/or router phase.
AI Engine Hardware Profile and Debug Methodology in the Embedded Design Development Using Vitis (UG1701) describes the five-stage profile and debug methodology. You can use the five-stage profile and debug methodology when you run a Versal adaptive SoC design with AI Engine graphs in hardware.
Vitis Unified Integrated Design Environment
v++ common command line syntax to create AI Engine components. The
classic Vitis IDE is obsolete. You can migrate
AI Engine graph
applications from the classic IDE to the Vitis
Unified IDE. See Migrating Vitis Classic IDE Graph
Applications to Vitis Unifed IDE in
Vitis Reference Guide (UG1702) for more
information.The Vitis Unified IDE provides system project design and debug for heterogeneous computing systems, embedded system design, and data center acceleration. Elements of the system project includes AI Engine and High-Level Synthesis (HLS) component creation, platform creation, and embedded software design.
The Vitis Unified IDE uses a common
command-line to compile and run the elements of the design. See Compiling using v++ (Unified Compiler) for a review of the v++ command-line flows for developing AI Engine components.