xfblasStatus_t xfblasGemm(xfblasOperation_t transa, xfblasOperation_t transb, int m, int n, int k, int alpha, void* A, int lda, void* B, int ldb, int beta, void* C, int ldc, unsigned int kernelIndex = 0, unsigned int deviceIndex = 0)
This function performs the matrix-matrix multiplication C = alpha*op(A)op(B) + beta*C. For detailed usage, see the L3 examples.
Parameters:
| transa | Operation op(A) that is non- or (conj.) transpose. |
| transb | Operation op(B) that is non- or (conj.) transpose. |
| m | Number of rows in matrix A, matrix C. |
| n | Number of cols in matrix B, matrix C. |
| k | Number of cols in matrix A, number of rows in matrix B. |
| alpha | Scalar used for multiplication. |
| A | Pointer to matrix A in the host memory. |
| lda | Leading dimension of matrix A. |
| B | Pointer to matrix B in the host memory. |
| ldb | Leading dimension of matrix B. |
| beta | Scalar used for multiplication. |
| C | Pointer to matrix C in the host memory. |
| ldc | Leading dimension of matrix C. |
| kernelIndex | Index of the kernel that is being used; default is 0. |
| deviceIndex | Index of the device that is being used; default is 0. |
Return:
| xfblasStatus_t | 0 if the operation completed successfully. |
| xfblasStatus_t | 1 if the library was not initialized. |
| xfblasStatus_t | 3 if not all the matrices have FPGA devie memory allocated. |
| xfblasStatus_t | 4 if the engine is not supported for now. |