Innovation & Technology | Machine learning

Exploring Federated Learning

A look at privacy-respecting solutions to sparse data challenges in the insurance industry

Panyi Dong, Runhuan Feng, Zhiyu (Frank) Quan and Tianyang Wang October 2024

Photo: Getty Images/maxkabakov

We recently took part in the podcast, “Federated Learning for Insurance Companies,” part of the Society of Actuaries (SOA) Research Institute’s Insights series. In this episode, we explored a distributed machine learning framework called federated learning that allows multiple devices or insurance organizations to collaborate on a machine learning model without sharing their raw data to ensure data privacy.¹ Depending on a dataset’s diversity, one could use horizontal, vertical or hybrid federated learning approaches. This technology unlocks isolated datasets for collective insight without actual data exchange. In our view, this presents a viable solution to maintaining privacy, given the right scenario.

The prevalence of sparse data in the insurance industry, along with stringent privacy concerns, spurred our exploration into federated learning. With regulators and consumers emphasizing data privacy, federated learning emerged as a solution that respects these concerns.²

This article delves into these aspects, examining federated learning’s potential role in enhancing insurance operations. It addresses challenges like data scarcity and privacy concerns,³ opening new avenues for collaboration and innovation. We’re just scratching the surface, but as we assert, the future looks promising for federated learning in insurance.

A Look at Utilization

Federated learning has been used in various sectors, including health care and financial services. In health care, it enables hospitals and research institutions to collaborate on predictive models for diseases without sharing patient data.

For example, multiple hospitals could share anonymized, structured data to build a collective model for cancer diagnosis. While this approach respects patients’ data privacy and allows access to a broader dataset, it still carries risks, such as potential reidentification of patients and limitations in the richness of the data shared. This underscores the need for a more secure and effective solution, such as federated learning, which enables hospitals to contribute to a collective model without sharing any raw data. By keeping patient data localized and only exchanging model updates, federated learning could enhance privacy and leverage diverse data sources, leading to more robust and accurate models without compromising patient confidentiality.⁴ In financial services, federated learning could enhance fraud detection by allowing financial institutions to learn from transaction patterns across different banks without sharing sensitive customer data.

In our SOA report, we conducted a case study on loss modeling, demonstrating how federated learning can leverage data from multiple entities to improve prediction accuracy. This approach addresses the need for enhanced loss modeling in the insurance industry, particularly in scenarios involving rare events or new markets. One of the core issues in the insurance industry, as noted, is the need for enhanced loss modeling. We’ve found traditional models often struggle with accuracy due to limited data, especially in scenarios involving rare events or new markets. Federated learning could offer a way to collaboratively improve these models without the need to directly share sensitive data.

These examples underscore federated learning’s versatility and potential in addressing complex challenges across industries by enabling collaborative learning while upholding privacy standards.

Challenges and Ethical Considerations

The insurance industry’s adoption of federated learning faces challenges. Implementing federated learning requires robust and secure infrastructure, investment in technology and expertise, and the addressing of heterogeneity of data.⁵ Harmonizing diverse data formats and ensuring that the federated model learns effectively is complex.

The technical complexity and infrastructure requirements could represent significant investments for insurance companies. Ensuring secure communication between different entities’ systems and the integrity of aggregated model updates is crucial, in our view.

Another challenge is the heterogeneity of data. Insurance companies often have data of diverse formats, quality and types. Harmonizing this data to ensure the federated model learns effectively is nontrivial. This heterogeneity also can affect the model’s fairness, potentially leading to unfair outcomes if one participant’s data is vastly different or of lower quality.

It’s worth mentioning that these challenges also might open a spectrum of opportunities. Organizations, such as universities, could serve as neutral communicators and safe harbors in this ecosystem. They could help educate and translate this technology from academic theory to practical application. They could potentially act as independent evaluators of federated learning models, helping ensure they adhere to ethical standards and are free from biases. Ultimately, universities could help educate and train the workforce required to tackle the technical complexities of federated learning, in theory facilitating its adoption across the insurance industry.

Regulators could create a conducive environment for federated learning by helping to establish guidelines and standards for implementation. They could assist with privacy laws and ethical considerations, maintaining public trust. Additionally, regulators could act as mediators, ensuring transparency and accountability in the federated learning process, and possibly provide the necessary legal framework to address issues of consent and data ownership.

Preparing for Federated Learning

We believe preparing for federated learning in the insurance industry involves education, investment in secure IT systems, collaboration and developing protocols for data sharing and collaboration via machine learning. The industry could also advocate for and participate in developing regulatory frameworks that support federated learning, ensuring its responsible use and maximizing benefits while minimizing risks.

Working with lawmakers and regulatory bodies to ensure that federated learning is used responsibly and that its benefits are maximized while minimizing potential risks is crucial. This also involves staying ahead of potential legal and ethical issues that might arise with federated learning’s implementation.

In essence, preparing for the federated learning technology in insurance involves a commitment to education, investment in technology, fostering collaboration and engaging in regulatory advocacy. By taking these steps, professionals and the industry could not only prepare for but also shape the future of federated learning in insurance.

Conclusion: Why Federated Learning?

In summary, federated learning offers a transformative approach to collaborative data analysis by allowing organizations to build robust models without sharing raw data, addressing privacy and security concerns more effectively than traditional methods like anonymization or centralized data collection. In contexts such as fraud detection, federated learning enables institutions to share insights without exposing sensitive financial information, resulting in more accurate models across diverse datasets.

Unlike alternatives like homomorphic encryption or secure multiparty computation, which can be computationally intensive, federated learning strikes a balance between privacy, scalability and model accuracy. Its potential to reshape data collaboration while preserving confidentiality makes it a promising solution for industries where data sensitivity is paramount, such as health care and finance. As the demand for secure, data-driven decision-making grows, federated learning could stand out as a practical and scalable solution warranting further exploration and adoption.

If the implementation challenges are met, federated learning could provide enhanced privacy, greater data utilization and collaborative model building. By committing to education, investing in technology, fostering collaboration and engaging in regulatory advocacy, we believe professionals and the industry could shape the future of federated learning in insurance.

Panyi Dong is Ph.D. student of actuarial and risk management sciences at the University of Illinois Urbana-Champaign.

Runhuan Feng, Ph.D., FSA, CERA, is chair professor at Tsinghua University and Consultant for External Organizations.

Zhiyu (Frank) Quan is a Ph.D. in actuarial science with a data science background and an assistant professor at the University of Illinois Urbana-Champaign.

Tianyang Wang, ASA, CFA, FRM, is a professor of finance in the finance and real estate department at Colorado State University. He is also a contributing editor for The Actuary.

Statements of fact and opinions expressed herein are those of the individual authors and are not necessarily those of the Society of Actuaries or the respective authors’ employers.

References:

1. McMahan, B., E. Moore, D. Ramage, S. Hampson, and B. A. Arcas, 2017. Communication-efficient Learning of Deep Networks From Decentralized Data. In Singh, A. and Zhu, J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1,273–1,282. ↩
2. InCountry Staff. Data Protection in the Insurance Industry. InCountry, April. 23, 2024. ↩
3. Li, L., Y. Fan, M. Tse, and K. Y. Lin. 2020. A Review of Applications in Federated Learning. Computers & Industrial Engineering, 149:106854. ↩
4. Karargyris, A., R. Umeton, M.J. Sheller, A. Aristizabal, J. George, A. Wuest, S. Pati, H. Kassem, M. Zenk, U Baid, et al. 2023. Federated Benchmarking of Medical Artificial Intelligence with Medperf. Nature Machine Intelligence, 5, no. 7:799–810. ↩
5. Supra note 3. ↩