The following is a list of known limitations.
- For best results, we recommend you to run the BF16 phi group of models with
ZENDNN_MATMUL_ALGO=BF16:2. - The zentorch library requires the g++ compiler as a dependency. Ensure that the installed g++ version matches the system's gcc version.
- MatMul Fused Gelu_erf activation using ALGO 1 and ALGO 3 shows minor precision difference. Hence, internally execution is performed by ALGO 2 and ALGO 4, respectively.
- ALGO 3 has higher memory footprint during execution with AMP precision for a few models: Starcoder2-15b, Starcoder2-7b, and Qwen-QwQ-32B.
- If you encounter an error while installing the sentencepiece package with Python
3.13, please install it as mentioned
here.
conda install -c conda-forge sentencepiece