Predicting Very High-Cost Claimants Using Symmetry ETG/PEG Feature Engineering Combined with Advanced Machine Learning
DOI:
https://doi.org/10.63282/3050-9416.ICAIDSCT26-145Keywords:
ETG, Procedure Episode Groups PEG, High-Cost Claimants, Machine Learning, Healthcare CostAbstract
Symmetry ETP/PEG are also utilized for engineering to examine provider effectiveness, measure care quality, and examine treatment protocols. It is one of the most significant and tool that has been utilized for the purpose of grouping healthcare claims data into essential, patient centric episode of care. Main motive of this systematic review is to determine the key approaches linked with predictive modeling for assessing high-cost claimants in the healthcare industry.
Considering the research methodology, interpretivism research philosophy has taken into account over positivism, realism, and pragmatism. The inductive research approach has also been selected over the deductive approach because it is suitable for conducting a non-statistical investigation. Descriptive research design and grounded theory as research strategy are also chosen to conduct the study in an appropriate and ethical manner. The secondary data collection method has also been utilized to extract information from available sources and databases.
From the findings, it has been determined that healthcare cost concentration is all about the phenomenon where a small extent of the population accounts for an inappropriately large share of overall healthcare spending. There is a great extent of need for episode-based analytics in the healthcare sector, which is understood by considering the growth of the market. The results represented and drawn based on the secondary data have presented and specified the significant of Symmetry ETG/PEG as key episode-based analytics models in the context of predictive healthcare analytics. Organizations should utilize the CatBoost Regressor, SAP Analytics Cloud, and LIME techniques. The primary data collection method can be utilized in the future.
References
1. Acharya, N., Kar, P., Ally, M., & Soar, J. (2024). Predicting co-occurring mental health and substance use disorders in women: an automated machine learning approach. Applied Sciences, 14(4), 1630. https://doi.org/10.3390/app14041630
2. Bolarinwa, D., Egemba, M., & Ogundipe, M. (2025). Developing a predictive analytics model for cost-effective healthcare delivery: A conceptual framework for enhancing patient outcomes and reducing operational costs. International Journal of Advanced Multidisciplinary Research and Studies, 5(2), 227-238. https://doi.org/10.62225/2583049X.2025.5.2.3832
3. Cheong, H. I., Lyons, A., Houghton, R., & Majumdar, A. (2023). Secondary qualitative research methodology using online data within the context of social sciences. International Journal of Qualitative Methods, 22, 16094069231180160. https://doi.org/10.1177/16094069231180160
4. CMS, (2026). NHE Fact Sheet. [Online]. Retrieved Through: < https://www.cms.gov/data-research/statistics-trends-and-reports/national-health-expenditure-data/nhe-fact-sheet#:~:text=NHE%20by%20Age%20Group%20and,by%20age%20in%20downloads%20below>. [Retrieved on: 11th March 2026]
5. Dubey, U. K. B., & Kothari, D. P. (2022). Research methodology: Techniques and trends. Chapman and Hall/CRC. file:///C:/Users/hp/Downloads/10.1201_9781315167138_previewpdf%20(1).pdf
6. Elton, D., & Zhang, M. (2023). Neck pain service utilization and costs: association with timing of non-pharmaceutical services for individuals initially contacting a primary care provider. A retrospective cohort study. medRxiv, 2023-01. https://doi.org/10.1101/2023.01.10.23284193
7. GVR, (2025). Healthcare Analytics Market. [Online]. Retrieved Through: https://www.grandviewresearch.com/industry-analysis/healthcare-analytics-market#:~:text=The%20global%20healthcare%20analytics%20market,of%20patient%20retention%20and%20engagement. [Retrieved on: 10th March 2026]
8. Hamid, M., Hajjej, F., Alluhaidan, A. S., & bin Mannie, N. W. (2025). Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling. Scientific reports, 15(1), 31199. https://doi.org/10.1038/s41598-025-13790-x
9. Kumar, R. (2025). Design of a Secure SAP-Enabled Cloud Lakehouse for AI-Driven Financial Risk and Healthcare Analytics. International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), 8(5), 12803-12810. https://doi.org/10.15662/wn7pnz25
10. Mitchell, M, E, (2016). Concentration of Health Expenditures in the U.S. Civilian Noninstitutionalized Population, 2014. [Online]. Retrieved Through: < https://meps.ahrq.gov/data_files/publications/st497/stat497.shtml >. [Retrieved on: 11th March 2026]
11. Ozdemir, S. (2022). Feature engineering bookcamp. Simon and Schuster. https://books.google.co.in/books?hl=en&lr=&id=xQGEEAAAQBAJ&oi=fnd&pg=PA1&dq=+Episode+Treatment+Groups++feature+engineering+for+predictive+modeling&ots=_jkMjypnEl&sig=i7bPM2wYm3KvR6VX-DGwcCcWgDU&redir_esc=y#v=onepage&q&f=false
12. Sheha, M. A., Mabrouk, M. S., & Sharawy, A. A. (2022). Feature engineering: Toward identification of symptom clusters of mental disorders. IEEE Access, 10, 134136-134156. https://doi.org/0.1109/ACCESS.2022.3232075
13. Smierzchała, Ł., Kozłowski, N., & Unold, O. (2023). Anticipatory classifier system with episode-based experience replay. IEEE Access, 11, 41190-41204. https://doi.org/10.1109/ACCESS.2023.3269879
14. Zaleski, A. L., Guan, X., Thomas Craig, K. J., Junk, C., McGill, A. T., Gordon, H., ... & Caya, K. (2025). An episode-based cost analysis of virtual-first versus in-person-first care to treat common acute conditions among members of a large national payor. BMC Health Services Research, 25(1), 994. https://doi.org/10.1186/s12913-025-13154-1.