Multi-Modal Deep Learning for Unified Search-Recommendation Systems in Hybrid Content Platforms

Authors

  • Suchir Agarwal Product Manager, Meta Platforms. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V4I3P104

Keywords:

Multi-modal deep learning, hybrid content platforms, search and recommendation, multi-task learning, user personalization

Abstract

Hybrid content platforms are now relying on combining search and recommendation systems to provide a better experience to everyone using different types of media. Traditional methods, used only for searching or recommending, aren’t equipped to deal with multi-modal data like text, images, and audio and cannot be generalized for different reasons users might have. This work introduces a unified approach where multi-modal deep learning is applied to connect the search and recommendation tasks with shared and aligned representations. Using pre-trained encoders (such as BERT, ResNet, and Wav2Vec), the model combines features using both early and late fusion and learns all the features in a single shared space through attention block features. A multi-task learning framework is employed to ensure both search relevance and recommendation accuracy are improved. The system provides online access for learning, logs user feedback and continuously watches models, adjusting for changing and large content environments. Comparing the proposed model to similar approaches on Amazon Electronics and the Yelp Challenge, we find that our approach surpasses the others by a big margin. The model is especially strong in addressing situations when few data are available and when semantic queries are involved. To support as many customers as possible, the architecture uses modular building blocks suitable for running on the cloud and in A/B testing environments. It emphasizes the role of joined-up deep learning in changing how content is offered in a personalized and relevant manner across different platforms

References

1. Guan, Y., Wei, Q., & Chen, G. (2019). Deep learning-based personalized recommendation with multi-view information integration. Decision Support Systems, 118, 58-69.

2. Ren, X., Yang, W., Jiang, X., Jin, G., & Yu, Y. (2022). A Deep Learning Framework for Multimodal Course Recommendation Based on LSTM+Attention. Sustainability, 14(5), 2907. https://doi.org/10.3390/su14052907

3. Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research synthesis methods, 11(2), 181-217.

4. Biswas, P. K., & Liu, S. (2021). A Hybrid Recommender System for Recommending Smartphones to Prospective Customers. arXiv preprint arXiv:2105.12876. https://arxiv.org/abs/2105.12876

5. Jia, X., Dong, Y., Zhu, F., Xin, Y., & Qian, J. (2022). Preference-corrected Multimodal Graph Convolutional Recommendation Network. Applied Intelligence, 52(5), 1-16. https://dl.acm.org/doi/full/10.1145/3662738

6. Wu, L., He, X., Wang, X., Zhang, K., & Wang, M. (2021). A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation. arXiv preprint arXiv:2104.13030. https://arxiv.org/abs/2104.13030

7. Luo, Y., Wen, Y., Tao, D., Gui, J., & Xu, C. (2015). Large margin multi-modal multi-task feature extraction for image classification. IEEE Transactions on Image Processing, 25(1), 414-427.

8. Remadnia, O., Maazouzi, F., & Chefrour, D. (2021). Hybrid Book Recommendation System Using Collaborative Filtering and Embedding Based Deep Learning. Informatica, 45(3), 389-402. https://www.informatica.si/index.php/informatica/article/view/6950

9. Zamanzadeh Darban, Z., & Valipour, M. H. (2021). GHRS: Graph-based Hybrid Recommendation System with Application to Movie Recommendation. arXiv preprint arXiv:2111.11293. https://arxiv.org/abs/2111.11293

10. Vaswani, K., Agrawal, Y., & Alluri, V. (2021). Formalizing Multimedia Recommendation through Multimodal Deep Learning. ACM Transactions on Recommender Systems, 15(3), 1-25. https://dl.acm.org/doi/full/10.1145/3662738

11. Li, S., Guo, D., Liu, K., Hong, R., & Xue, F. (2023). Multimodal Counterfactual Learning Network for Multimedia-based Recommendation. Proceedings of the ACM Special Interest Group on Information Retrieval, 2023, 1-10. https://dl.acm.org/doi/full/10.1145/3662738

12. Wang, W., Duan, L.-Y., Jiang, H., Jing, P., Song, X., & Nie, L. (2021). Market2Dish: Health-aware Food Recommendation. ACM Transactions on Multimedia Computing, Communications, and Applications, 17(1), 1-19. https://dl.acm.org/doi/full/10.1145/3662738

13. Wei, Y., Wang, X., He, X., Nie, L., Rui, Y., & Chua, T.-S. (2022). Hierarchical User Intent Graph Network for Multimedia Recommendation. IEEE Transactions on Multimedia, 24, 2701-2712. https://dl.acm.org/doi/full/10.1145/3662738

14. You, Y., Belimpasakis, P., & Selonen, P. (2010). A hybrid content delivery approach for a mixed reality web service platform. In Ubiquitous Intelligence and Computing: 7th International Conference, UIC 2010, Xi’an, China, October 26-29, 2010. Proceedings 7 (pp. 563-576). Springer Berlin Heidelberg.

15. Hasan, F., Roy, A., & Pan, S. (2020, November). Integrating text embedding with traditional NLP features for clinical relation extraction. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 418-425). IEEE.

16. Lei, C., Luo, S., Liu, Y., He, W., Wang, J., Wang, G., Tang, H., Miao, C., & Li, H. (2021). Pre-training Graph Transformer with Multimodal Side Information for Recommendation. Proceedings of the ACM International Conference on Multimedia, 2021, 1-10. https://dl.acm.org/doi/full/10.1145/3662738

17. Dong, Y., Gao, S., Tao, K., Liu, J., & Wang, H. (2014). Performance evaluation of early and late fusion methods for generic semantics indexing. Pattern Analysis and Applications, 17, 37-50.

18. Huang, J., Wang, H., Zhang, W., & Liu, T. (2020). Multi-task learning for entity recommendation and document ranking in web search. ACM Transactions on Intelligent Systems and Technology (TIST), 11(5), 1-24.

19. Wehrmann, J., Kolling, C., & Barros, R. C. (2020, April). Adaptive cross-modal embeddings for image-text alignment. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12313-12320).

20. Zhao, X., Liu, H., Fan, W., Liu, H., Tang, J., & Wang, C. (2021, August). Autoloss: Automated loss function search in recommendations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 3959-3967).

21. Gagné, C., Sioud, A., Gravel, M., & Fournier, M. (2020). Multi-objective optimization. Heuristics for Optimization and Learning, 906, 183.

22. Booysen, W., Hamer, W., & Joubert, H. P. R. (2016, August). A simplified methodology for baseline model evaluation and comparison. In 2016 International Conference on the Industrial and Commercial Use of Energy (ICUE) (pp. 200-207). IEEE.

23. Asghar, N. (2016). Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362.

Downloads

Published

2023-07-30

Issue

Section

Articles

How to Cite

1.
Agarwal S. Multi-Modal Deep Learning for Unified Search-Recommendation Systems in Hybrid Content Platforms. IJAIBDCMS [Internet]. 2023 Jul. 30 [cited 2025 Sep. 11];4(3):30-9. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/154