It is recommended you use torch.no_grad() for optimal inference
performance with zentorch.
CNN
For torchvision CNN models, set dynamic=False when calling for
torch.compile as follows:
model = torch.compile(model, backend='zentorch', dynamic=False)
with torch.no_grad():
output = model(input)
NLP & RecSys
Optimize Hugging Face NLP models as follows.
model = torch.compile(model, backend='zentorch')
with torch.no_grad():
output = model(input)
Hugging Face Generative LLM Models
The zentorch.llm.optimize API has been
deprecated. You can run generative models using torch.compile (model, backend="zentorch"), but for optimal performance we
recommend using vLLM. See vLLM-zentorch Plugin for more
details.