Hybrid Framework Selection for Generative AI Models
DOI:
https://doi.org/10.63282/3050-9416.IJAIBDCMS-V7I2P104Keywords:
AI, Linux Python Ecosystem, GEMM, Machine Learning, Deep Learning, LLMAbstract
The proliferation of generative AI models has led to multiple open-source frameworks, each optimized for different hardware architectures and deployment scenarios. This paper presents a systematic approach to framework selection based on empirical evaluation of accuracy, performance, and ease of use. We analyze PyTorch, TensorFlow, ONNX Runtime, and llama.cpp, examining their optimization strategies and performance characteristics. Our findings suggest that a hybrid approach, selecting frameworks based on specific workload requirements, yields optimal results for production AI workloads.
References
1. Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. NeurIPS.
2. Abadi, M., et al. (2016). TensorFlow: A System for Large-Scale Machine Learning. OSDI.
3. ONNX Runtime Development Team. (2023). ONNX Runtime: Cross-platform, High Performance ML Inferencing and Training Accelerator.
4. Gerganov, G. (2023). llama.cpp: Port of Facebook's LLaMA model in C/C++.
5. Dettmers, T., et al. (2022). LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. NeurIPS.
6. Touvron, H., et al. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv.
7. Jiang, A. Q., et al. (2023). Mistral 7B. arXiv.
8. Hugging Face. (2023). Transformers: State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX.
9. MLPerf Inference Benchmark Suite. (2023). MLCommons.