Intelligent Data Summarization Techniques for Efficient Big Data Exploration Using AI

Ajinkya Potdar

doi:10.63282/3050-9416.IJAIBDCMS-V5I1P109

Authors

Ajinkya Potdar Senior Technical Program Manager, Dallas, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V5I1P109

Keywords:

Artificial Intelligence, Big Data, Data Summarization, Machine Learning, Natural Language Processing, Topic Modeling, Reinforcement Learning

Abstract

As data explosion continues in our Big Data era, we are being challenged with summarizing huge amounts of information at the right time to support rapid and meaningful data exploration. Due to the velocity, volume, and variety of data, traditional data summarization approaches fail to handle data in real-time from different sources. Artificial Intelligence, or AI, has become a tool that can be used to automate summarisation, employing machine learning, natural language processing, and deep learning. In this paper, a broad review and analysis of intelligent data summarization techniques that can enable the exploration of big data is presented. Various AI-centric techniques, such as extractive and abstractive summarization, clustering-based summarization, neural summarization and reinforcement learning-based dynamic data reduction, are explored. Moreover, we propose an AI-enhanced architecture enabling efficient summarization of big data, which uses the approaches like BERT-based summarizers, topic modeling and visual summarization. The other strand of work in this thesis evaluates the proposed methods on benchmark big data datasets in terms of time complexity, relevance and accuracy. Finally, the paper also illustrates the current challenges and future directions in providing such intelligent summarization to big data ecosystems

References

[1] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.

[2] Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264-285.

[3] Kupiec, J., Pedersen, J., & Chen, F. (1995, July). A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 68-73).

[4] Lin, C. Y. (1999, November). Training a selection function for extraction. In Proceedings of the Eighth International Conference on Information and Knowledge Management (pp. 55-62).

[5] Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).

[6] Gong, Y., & Liu, X. (2001, September). Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 19-25).

[7] Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends® in Information Retrieval, 5(2–3), 103-233.

[8] Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.

[9] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).

[10] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.

[11] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.

[12] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.

[13] Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap sentences for abstractive summarization. In International conference on machine learning (pp. 11328-11339). PMLR.

[14] Deshpande, A., & Kumar, M. (2018). Artificial intelligence for big data: a complete guide to automating big data solutions using artificial intelligence techniques. Packt Publishing Ltd.

[15] Moreno, A., & Redondo, T. (2016). Text analytics: the convergence of big data and artificial intelligence. IJIMAI, 3(6), 57-64.

[16] Hesabi, Z. R., Tari, Z., Goscinski, A., Fahad, A., Khalil, I., & Queiroz, C. (2015). Data summarization techniques for big data a survey. Handbook on Data Centers, 1109-1152.

[17] Ahmed, M. (2019). Data summarization: a survey. Knowledge and Information Systems, 58(2), 249-273.

[18] Gupta, V., Bansal, N., & Sharma, A. (2018). "Text Summarization for Big Data: A Comprehensive Survey." In Innovative Computing and Communications (LNNS, vol. 56). Springer.

[19] Neto, J. L., Freitas, A. A., & Kaestner, C. A. (2002). Automatic text summarization using a machine learning approach. In Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002 Porto de Galinhas/Recife, Brazil, November 11–14, 2002 Proceedings 16 (pp. 205-215). Springer Berlin Heidelberg.

[20] Martín-Gutiérrez, D., Hernández-Peñaloza, G., Hernández, A. B., Lozano-Diez, A., & Álvarez, F. (2021). A deep learning approach for robust detection of bots in Twitter using transformers. IEEE Access, 9, 54591-54601.

[21] Jangra, A., Mukherjee, S., Jatowt, A., Saha, S., & Hasanuzzaman, M. (2023). A survey on multi-modal summarization. ACM Computing Surveys, 55(13s), 1-36.

Intelligent Data Summarization Techniques for Efficient Big Data Exploration Using AI

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications