Meeting Software, Hardware Challenges

Driving computing efficiency in actuarial valuation and analysis

Chad Schuster, Jim Brackett, Corey Grigg

At the heart of modern actuarial work is the need to process vast amounts of data rapidly and accurately while sustaining complex, computationally intensive models. Whether by scaling up a single high-performance computing system or orchestrating a distributed network of machines, the strategic decisions behind hardware selection play a pivotal role in meeting these operational demands. Equally important is software design: selecting a programming language and development framework that not only maximizes performance but also ensures maintainability and adaptability in a market where computational paradigms are continually evolving.

When it comes to implementing software solutions, actuaries face challenging decisions around how to best coordinate with IT, software, cloud vendors, and internal development teams. To achieve performance targets and comply with infrastructure constraints, these decisions require a working knowledge of technological capabilities, an appreciation of how and where trade-offs apply, and an understanding of operating costs. In this article, we highlight key software- and hardware-level trade-offs to help actuaries with that decision-making process.

Software development considerations

Choosing the right programming language is fundamental to the performance, maintainability, and long-term viability of any solution and defines the mechanisms available to interact with hardware. For classical computing, languages like C, C++, C# and Java can be compiled in advance to native or low-level virtual machine code, and provide high performance out of the box. Languages like Python, Ruby, Julia, and Rust are interpreted (thereby sacrificing performance) or rely on just-in-time (JIT) compilation and can achieve competitive performance after one-time hits per run to complete compilation.

In general, languages like C, C++, C# and Java require more programming knowledge to utilize, while Python, Ruby, Julia, and Rust exhibit a lower barrier to entry and offer more convenience. An example of this is C versus Python. C has more syntactic complexity, requires programming memory management manually, and enforces explicit typing of all variables and functions. The resulting programs are not portable across architectures. Python is easier to read, handles memory management automatically, infers types at runtime and is highly portable.

The benefits of languages like C/C++ include compilers that can optimize compiled machine code, the choice of exercising control over operations like memory management to best fit the particular use case and the potential to access specialized hardware (like graphics processing units or GPUs).

These capabilities afford more opportunity to tune performance and achieve higher throughput. However, they can also limit the pool of developers qualified to write and maintain code, extend the time needed to implement features, and increase the probability of elusive bugs.

The capacity to exploit GPU hardware is also different between the languages. Use of Compute Unified Device Architecture (CUDA) or OpenCL from C/C++, for example, enables explicit access to GPU hardware, but requires extensive programming expertise. Conversely, languages like Python (with open-source JIT compiler Numba) and Julia require only modest programmer effort to leverage a GPU, but do not present the same capacity for fine-tuning.

Regardless of the language, code refactoring and optimization can increase performance dramatically, with actuaries adding efficiencies that an engineer alone may not realize. For example, in the case of a Black Scholes pricing algorithm computing European option values, a programmer not familiar with financial mathematics may compute call and put prices independently, which can be computationally expensive. Alternatively, an actuarial approach may use the put-call parity identity or other mathematical shortcuts, potentially eliminating close to half of the computation time. When such a function is evaluated millions or billions of times over the course of an actuarial projection, such efficiencies can make a significant difference in runtime and cost.

The purpose of a quantum computer programming language is the same as in the classical case. A quantum programming language is simply a programming language designed to write programs for quantum computers. The evaluation of quantum programming languages is markedly different than those of classical computers due to differences in the building blocks of the two technologies. In the classical case, the basic unit of information is called a bit; it uses simple electrical properties to represent data, operations, and results as ones and zeros. A qubit is the quantum analog to a bit; it is defined by a matrix of specific physical properties of a quantum system. Operations on the qubit(s) are performed to manipulate the system to put it into a specific state. Factors to consider for a quantum programming language that are not applicable in the classical case include how algorithms are evaluated and executed and the capacity of the language to monitor fundamental properties of the quantum system (coherence, entanglement, qubits).

Like the classical computing case, there are different levels of abstraction between the programmer and the hardware in quantum computer programming. At the lowest level, quantum instruction sets turn algorithms into physical instructions for quantum processors that are specific to the quantum hardware. Examples include OpenQASM and Quil. Quantum programming languages are the next category and are analogous to classical programming languages such as C, Python and C#. These languages vary in the level of abstraction, in a similar way to the discussion above on classical languages. Examples of these languages include Q#, Silq, Quipper and QCL. It should be noted that Python includes libraries for quantum computing and many quantum frameworks are built with Python in mind. Additionally, there are development tools that allow a user to design, run and/or simulate quantum programs. Examples include Qiskit, Cirq, Ocean, Forest, ProjectQ, Strawberry Fields, PennyLane, Intel Quantum Software Development Kit and the Azure Quantum Development Kit.

What hardware factors should you consider?

The capacity of a classical computer to process computational workload is dependent on several hardware factors, including microprocessor architecture, clock speed, and internal cache; memory capacity, access latency, and types of memory available; persistent storage capacity and throughput associated with attached and networked devices; and the degree to which aggressive clock speeds can be sustained given the associated cooling requirements. These factors—chipset, memory, storage I/O, and heat dissipation—apply to central processing unit (CPU)-based systems and to GPUs that are at times exploited by such systems to offload and accelerate workload.

In actuarial work, hardware selection matters most for tasks involving large-scale data processing (e.g., cleaning, preprocessing), intensive calculations (e.g., bootstrap resampling), or both (e.g., nested stochastic simulations, machine learning). Each benefit from hardware tailored to the workload. For applications with a large amount of data but simple calculations, the key components are large amounts of random access memory (RAM) to reduce the reliance on slower disk-based operations.

If all the data can fit within the memory, this can greatly speed up a data-heavy workload, and it can be beneficial to have substantially more memory as a buffer to support certain kinds of data operations, like sorting. The speed of the memory and topological arrangement can matter as well. For example, even if a machine has a large amount of memory, its physical proximity to a given CPU can affect performance, as some computers are designed to have clusters of memory and CPUs co-located, known as nonuniform memory access (NUMA), where access to memory placed within another cluster can take longer than nearby memory.

Storage devices are critical in data-heavy workloads, especially when in-process memory is insufficient and temporary storage is needed. Of course, storage devices have capacity limits, and running out of space can cause software processes to fail—sometimes dramatically. Beyond capacity, storage can also bottleneck processing due to limited bandwidth (the rate at which data moves between memory and storage, typically in bits per second), high latency (from physical distance or connection type to the motherboard), and data fragmentation (whether data is stored in contiguous blocks or scattered across the device).

Finally, initial input data and outgoing data must travel to and from the computer doing the processing, in some cases from data storage locations thousands of miles away. Therefore, networking devices and bandwidth to perform the initial and final transfers can affect the end-to-end processing time.

In workloads where heavy calculations are the primary demand and data volume is less of a concern, the CPU becomes the linchpin of system performance. Central to the CPU’s efficiency is its clock speed, a measure that defines how quickly each instruction is executed. Higher clock speeds can lead to faster computations, but they must be balanced with power and heat considerations. The cache, including its size and hierarchy, serves as a critical intermediary, ensuring that frequently accessed data is readily available. This minimizes delays from fetching data from slower main memory, which is crucial in computation-heavy scenarios.

Beyond raw speed, the microarchitecture of the CPU is a pivotal factor in shaping its overall performance. Modern microarchitectures are designed to optimize parallelism, execute complex instructions more effectively, and reduce latencies faced during multithreaded operations. However, challenges like cache thrashing—where excessive data swapping between cache and main memory degrades performance—can undermine these benefits. Additionally, sophisticated power and heat management systems, including dynamic boosting features, which elevate a CPU’s clock speed at the expense of higher power consumption and heat generation, are essential to maintain optimal operational conditions. These systems ensure the CPU can run at higher speeds safely and efficiently.

When faced with both large data and lots of calculations, a balanced system designed for high-performance computing is most useful, in our estimation. Such a machine has ample RAM, powerful CPUs, fast storage and bandwidth to support data exchange between the various components. Furthermore, the scalability of the system becomes a concern when faced with increasing data and computational needs. Such a system should support the option to add hardware either directly to a machine, or by networking many computers together to form a supercomputer or distributed computing grid.

In this case, networking hardware can become a constraint if parts of the process operating on different machines must frequently communicate with one another or a central control process, and specialized hardware that supports low-latency high-bandwidth networking, such as InfiniBand, is key. For example, we have seen a 65%–70% improvement in runtime and cost when moving a model for a registered index-linked annuity from commodity hardware to fourth-generation high-performance computing systems in Microsoft Azure.

For calculation tasks that can be massively parallelized, GPUs can also prove useful. A GPU is designed for processing multiple similar calculations simultaneously, and to some extent can be thought of as a specialized computer with its own RAM, thousands of processors, onboard cooling and power supply.

A GPU relies on a CPU to orchestrate its calculations and to transfer data between the system memory and GPU memory. Thus, a GPU is subject to the constraints of both CPUs (clock speed, memory access patterns, heat/power management) and storage devices (bandwidth, capacity, latency and input/output access patterns). Some of these constraints can be mitigated through specialized hardware techniques; for example, direct memory access (DMA) or remote direct memory access (RDMA) can be exploited with CUDA to reduce transfer latency by bypassing CPU participation. Nevertheless, the dependency on a host device can present significant operational and software design challenges.

Accordingly, some modeling tasks such as simulation of simple equity-linked products can benefit from GPU if they involve parallelizable calculations driven by simple arithmetic, but for others with features that lead to interdependency of calculations (with-profits products or other asset-liability linkages), large memory requirements (complex policy state tracking for long-term care or management of stateful random number generators) or branching logic (trigger-based policy features, transition-based behavioral models), GPUs may not deliver meaningful performance improvements or may require modifications to actuarial model logic.

Quantum computing affords an entirely novel approach to solving computationally complex problems when compared to classical systems like CPU and GPU. Quantum solutions can be simulated on classical systems, but such simulations may be intractable when relying on classical hardware. Instead, a physical quantum computer is required to realize the benefits of quantum computing. Factors influencing the capacity and reliability of physical quantum computers include qubit count, refrigeration techniques to achieve near absolute-zero and mitigations to protect against decoherence.

The potential speedup that a quantum computer may provide can be significant. For example, factoring large integers into primes is effectively intractable on a classical computer, which is why it is a central tenet of modern-day Rivest-Shamir-Adleman (RSA) encryption. Even a group of hundreds of classical supercomputers would take millions or billions of years to factor a large number into primes, with no guarantee they would produce the right answer. In 1994, Peter Shor developed a quantum algorithm (Shor’s Algorithm) designed to find prime factors of an integer that could theoretically render modern-day encryption ineffective. For now, RSA encryption is not threatened since the algorithm will require many qubits and improvements in error correction to handle quantum noise.

While cloud-based classical hardware options are “menu-driven,” and quantum computer hardware is not yet available for practical applications, there are opportunities to apply finer-grained decisions on hardware components for on-premises solutions. Tuning hardware to accommodate the nature of the workload may be helpful in reducing time and cost, so effective collaboration between IT and developers to best exploit hardware can be beneficial.

How do you optimize hardware and software?

For classical computing solutions, obtaining maximum performance often involves leveraging software capabilities to parallelize loops, manage multiple threads efficiently, and exploit features at the level of specific native machine instructions (e.g., SSE, AVX or GPU). Major chip manufacturers such as AMD, Intel, ARM, and Nvidia have supported or directly developed compilers that enable software to be tuned to their processor architecture.1 These optimizations may be handled automatically by the compiler or, if the programming language permits, through explicitly accessing hardware-specific instructions by the programmer.

There are trade-offs between targeted hardware optimizations in software development and maintaining flexibility to run in varied computing environments. For instance, specialized actuarial software developed for GPU may not be easily ported or perform efficiently on CPU. More extreme examples involve collaboration between hardware and software developers to provide purpose-built products for console video games, high-frequency trading, and artificial intelligence and machine learning models.2

Some of these innovations utilize application-specific integrated circuits (ASICs) hardwired for certain calculations, or field programmable gate arrays (FPGA), a type of hardware whose circuits can be configured by end-users to perform specialized logic. However, an investment in hardware-specific software optimizations can have a limited shelf life and may lose utility or degrade performance in future hardware generations, and thus may be difficult to justify without a short payback period.

A more flexible solution could involve modifying or configuring firmware, the low-level embedded software that controls the hardware itself. Although many devices do not expose firmware directly to end-users, motherboards typically have an interface (UEFI, formerly BIOS) allowing access to firmware settings. Personal computer enthusiasts may tweak these settings to enhance performance in video games, but the same principles apply to non-gaming workloads and hardware manufacturers provide guidelines for configuring firmware for specific use cases.3

Some relevant settings for actuarial workloads include simultaneous multithreading or hyperthreading (sharing physical CPU cores among multiple programs), memory interleaving (splitting dynamically allocated memory across multiple physical memory banks) and power management in support of core boosting (increasing processor speed with more power).

Cloud hardware vendors apply similar tweaks in their ecosystems and offer configurations tailored to various workloads. For instance, Microsoft Azure provides “compute-optimized,” “memory-optimized,” and “high-performance computing” virtual machine flavors, among others. However, collaboration between cloud vendors and developers of actuarial software can further refine firmware configuration for specific use cases. On-premises hardware owners have greater flexibility to modify these settings but may lack the expertise to determine ideal configurations.

For quantum solutions, the computing paradigm is so dramatically different from classical that even the conceptual framing of an algorithm must be reconceived before considerations about which quantum programming language is suitable. A quantum algorithm is a set of instructions for a quantum computer to solve a problem, analogous to a classical algorithm. In the case of a quantum algorithm, properties defined in quantum mechanics, such as superposition and entanglement, are utilized to perform the computation.

Naturally, the problems where one would most expect an advantage from using a quantum computer are those problems where one needs to simulate a quantum system, but many intractable problems have the potential to be mapped to a quantum computer through a quantum algorithm. The creation of algorithms to map a classical problem to a quantum computer is central in determining if there is a potential to apply quantum computing to an intractable problem.

There are a variety of computationally intense problems that insurance companies tackle regularly. These problems range from pricing and risk estimation to portfolio optimization. Quantum algorithms for stochastic modeling, optimization, and machine learning may offer meaningful increases in efficiency and improvements in these areas. For example, quantum Monte Carlo algorithms have the potential for a quadratic speedup relative to classical methods, as they require many fewer steps to accomplish the same level of convergence.4 This means instead of running 10,000 scenarios, one could get the same level of convergence with 100 scenarios. There are challenges to realizing the promise of quantum algorithms in the insurance space, primarily driven by the steps required to implement a quantum algorithm in practice (e.g., how classical data is embedded in a quantum system).5 In addition to algorithm-specific challenges, current noisy intermediate-scale quantum (NISQ) hardware is limited to a relatively small number of qubits with low fidelity, making it difficult to solve problems of a significant size.

What’s next?

FOR MORE INFORMATION

Actuaries today operate at an exciting crossroads where advanced data analytics, cutting-edge computational hardware and innovative software solutions converge. This article has explored a diverse array of considerations—from classical hardware factors such as processor architecture, memory allocation and storage throughput, to the emerging promise of quantum computing—demonstrating that no single solution fits all scenarios. Rather, a careful, deliberate balance between technology choices and actuarial requirements is, we believe, essential.

Moreover, the interplay between IT teams, software vendors, and actuarial professionals is more critical than ever. Close collaboration and informed decision-making can unlock efficiencies by ensuring that every component of the technology stack is tuned for a project’s specific characteristics. Companies with dedicated staff focused on computing efficiency or that work with software-as-a-service (SaaS) providers that take a holistic approach to computing efficiency could be well-positioned to achieve this.

Chad Schuster, FRM, is Principal (Life and Financial Services Practice) at Milliman Financial Risk Management, LLC, and is based in Chicago. Chad provides risk management, modeling, and technology consulting services to clients in the United States, Canada, Europe, and Asia.
Jim Brackett is Principal Consultant (Technology Lead) at Milliman Financial Risk Management, LLC, and is based in Marlborough, Massachusetts. Since 2003, he has led Milliman’s platform development, focusing on high-performance cloud and hybrid computing.
Corey Grigg, FSA, MAA, CERA, is Principal (Technical Lead, Professional Services) at Milliman Financial Risk Management, LLC, and is based in Chicago. Corey has been building, tweaking, and breaking personal computers since age 11. At Milliman, he spearheads the design, implementation, and maintenance of sophisticated automated actuarial data management, valuation, and reporting processes.

Statements of fact and opinions expressed herein are those of the individual authors and are not necessarily those of the Society of Actuaries or the respective authors’ employers.

Copyright © 2025 by the Society of Actuaries, Chicago, Illinois.