Actuaries Take on AI

How well can ChatGPT and GitHub Copilot handle real actuarial tasks? Kyle Nobbe

Photo: iStock.com/Supatman

Class is in session! I quizzed ChatGPT and GitHub Copilot on typical actuarial tasks to test their current strengths and limitations. The results reveal how actuaries can gain efficiencies by implementing artificial intelligence (AI) tools in their technical, day-to-day work.

There is no doubt that AI has arrived in the insurance industry.1 Yet, in my experience, some actuaries still carry doubts about how accurate or effective AI tools are for their day-to-day work. In my estimation, AI is like the calculators and laptops we use each day—it is simply a tool to help us do our jobs. How well or quickly can AI help us? The best way to find out is to simply put it to the test.

In this article, I will do just that: Run typical actuarial tasks through the most popular large language model (LLM) AI, ChatGPT, and scrutinize its performance. For comparison’s sake, I will pose the same tasks to GitHub Copilot, Microsoft’s AI pair programmer that is embedded within Visual Studio Code.

My exam revealed the ups and downs of AI in early 2024. It also emphasized the critical role that insurance professionals will play in the implementation of these tools. I don’t believe AI will be a job killer. I believe it will be a job evolver, and the actuaries who learn, test and adopt it will improve the efficiency and quality of their technical deliverables.

Figure 1 shows a summary of the results. Read on for more details about each task and how the AI bots performed.

Figure 1: LLMs Take on Actuarial Tasks

  ChatGPT GitHub Copilot
Task 1: Convert Scala code snippets from a legacy analytics platform to SQL code for the new platform. Grade: B+

Comments: Impressive response despite one dropped command. Code must be transported from ChatGPT UI to a coding platform.

Grade: A

Comments: Executed translation perfectly within the coding platform.

Task 2: Compute life expectancy on a dataset in R. Grade: B

Comments: After an initial stumble, ChatGPT quickly rebounded with some notable enhancements.

Grade: C

Comments: Initial answer was wrong. Pushed out the correct answer only after a tedious number of additional inputs and fixes.

Task 3: Help a user understand complex VBA code from an Excel spreadsheet. Grade: A

Comments: Described the macro perfectly and even offered some insights on how to improve runtime—plus, it translated its output into Japanese.

Grade: F

Comments: This feature is not available in GitHub Copilot.

Source: RGA, 2023

Task 1: Using ChatGPT and GitHub Copilot to Convert Code

First, I took a recent real-life project to see if AI tools could save time. A company team had decommissioned a legacy analytics platform. The platform previously used Scala data cleaning scripts for experience study analytics. I needed to translate more than 1,000 lines of Scala code into SQL for the new analytics platform.

I turned that first task over to ChatGPT and asked it to provide the SQL equivalent of the Scala code. ChatGPT’s response was impressive. It provided the SQL code and even included some helpful explanations about minor adjustments the user might need to make. ChatGPT did drop an important “create table” command. Otherwise, the output provided what was needed.

To take advantage of ChatGPT’s generative strengths, I dropped in the next chunk of Scala code in the process and asked ChatGPT to convert this snippet by building off the prior step. Again, the chatbot performed well. It surprisingly referenced code snippet from the first prompt by building a nested subquery to consolidate the two pieces of code into one. Sadly, the dropped table was left out again.

I wrapped up the exercise by asking ChatGPT to rebuild the last query as a common table expression (CTE). The bot demonstrated an understanding of CTE, but its CTE script was poorly written and would make the code less efficient. Once again, it dropped the “create table” command.

The other test subject, GitHub Copilot, is built for exactly this type of code-intensive problem. Its translation was executed perfectly for both the code translation and the CTE. Additionally, the convenience of the code translation occurring within the coding platform made this tool ideal for improving the project’s efficiency.

Final grades on task 1: ChatGPT: B+; GitHub Copilot: A

Task 2: ChatGPT and GitHub Copilot Attempt to Compute Life Expectancy on a Dataset in R

Next, I asked the AI tools to tackle a basic actuarial calculation—converting mortality rates into predicted life expectancies. This is a function our team uses to evaluate the mortality curves our models generate and test them for reasonableness.

Here was the prompt for ChatGPT: “Create a data frame in R with given parameters. Then create a function that calculates the life expectancy from that data frame.” ChatGPT generated code that produced a data frame, but one of its formulas contained too many values. The code could not run. Furthermore, its calculation of life expectancy was erroneous.

I then provided ChatGPT with supplemental information that clearly spelled out the definition of life expectancy: “Given the estimated life expectancy is the sum of the cumulative probability of survival at each age plus a half a year, update the function accordingly.” That extra bit of information helped. ChatGPT correctly calculated the life expectancy. In doing so, it included some unnecessary code, which was flagged and removed. The bot’s response also used a function in R called “rev,” which was new to me.

Finally, I asked ChatGPT to update the function to calculate life expectancies for each unique ID within the data frame. After some minor edits, the bot’s code output worked well. Again, its answer provided the opportunity to learn a new function in R, called “by,” which substantially improved the runtime of the calculations.

GitHub Copilot also stumbled on its first attempt at this project. Its output code was incorrect and, surprisingly, incorrect in a different way than ChatGPT’s first attempt. After looking closely at GitHub’s output, I figured out that the bot was assuming mortality rates are constant over time and across all ages. If that assumption were true, then GitHub’s output would have been correct.

I updated GitHub’s code to accurately calculate life expectancy. But the next output calculated the probability of survival for a given year instead of cumulative survival. Once again, I had to jump in to correct the code. In the time it took to correct GitHub’s approach, I could have coded this project myself.

Final grades on task 2: ChatGPT: B; GitHub Copilot: C

Task 3: ChatGPT and GitHub Copilot Explain the Purpose of a Complex Excel Macro

The final quiz was born from another common event in the life of an actuary: You have inherited a complex spreadsheet packed with code and want to quickly understand the purpose of a particular formula or macro.

In this example, the overall goal of the spreadsheet was to add results from new medical studies, estimate elevated levels of mortality and transform those results into possible “flat extras”—premium increases based on new mortality rates related to certain medical conditions.

I dropped the spreadsheet’s primary macro (made from 900 lines of code) into ChatGPT and asked the bot to explain in 500 words or less what the code was doing. Here is an excerpt from the bot’s response, which shows how clearly and accurately it responded to the prompt: “This VBA code is a macro for Microsoft Excel that performs a Goal Seek analysis on a spreadsheet that includes multiple studies. Goal Seek is an Excel tool that enables users to find a solution to a problem by adjusting one or more input values.”

ChatGPT then continued to explain the steps in the formula, even picking up some subtleties of the calculation, such as “the extra mortality is assumed to be constant for all ages.” ChatGPT also recognized that the code contained redundancies for the Goal Seek for different studies, which I could eliminate to make the macro run more efficiently.

I followed up with a question to ChatGPT: “Any suggestions for improving the macro’s runtime?” It responded with some basic tricks that were already implemented elsewhere in the code, but they were still useful for new actuaries to understand. The bot also suggested reducing some of the precision of the Goal Seek. It even weighed the pros and cons of such a change, noting that reducing the precision would speed up the macro but possibly result in inaccuracies. I was astounded that ChatGPT could provide a narrative that described the macro perfectly. With little to no edits, I would feel comfortable sharing its explanation with a colleague.

Finally, with the idea that I might need to share this macro with international colleagues, I asked ChatGPT to translate the chat into Japanese and immediately received a Japanese translation.

GitHub Copilot unfortunately received a failing grade because it was unable to complete this task. Since GitHub Copilot’s purpose is to code and nothing else, this may not seem like a fair fight. But the reality is that ChatGPT has many of the same coding capabilities plus the ability to provide narratives around and about the data (this test didn’t verify the accuracy of the translation, merely the fact that it performed the translation). Sorry, GitHub, it’s every bot for itself right now.

Final grades on task 3: ChatGPT: A; GitHub Copilot: F

Conclusion

The most important takeaway from the tests is the reassuring lack of perfection. Neither ChatGPT nor GitHub Copilot produced perfect results. Both tools needed human intervention to correct, refine or adjust results. But overall, the bots understood queries and made surprisingly robust attempts to answer them.

ChatGPT scored higher overall, and I was impressed with its ability to build on prior queries and “think” about what the user wanted. GitHub Copilot’s code suggestions seemed to work line by line instead of holistically.

In my opinion, the only way to determine if LLMs like ChatGPT and GitHub Copilot will enhance day-to-day actuarial tasks is to experiment with them. I believe that with time, AI tools will provide substantial improvements in coding efficiency and technical deliverables for professionals.

Class is dismissed.

Kyle Nobbe, FSA, is vice president and advanced analytics actuary within the Global Data Analytics department of Reinsurance Group of America, Inc. (RGA). He is based in Chesterfield, Missouri.

Statements of fact and opinions expressed herein are those of the individual authors and are not necessarily those of the Society of Actuaries or the respective authors’ employers.

Copyright © 2024 by the Society of Actuaries, Chicago, Illinois.