High quality video + audio generation with first and last frame conditioning. Pre-distilled LTX model for fast inference. [code]