Developing a concept map for predictive analyticsAugust/September 2016
To successfully use predictive analytics on projects, an actuary or other practitioner will need to decide and/or be able to _______________.
How would you complete the sentence? How would your peers complete the sentence? Only you can answer the first question. This article provides the answer to the second—but you will need to wait until the end. Before that, I will provide some background and explain the process that was used to come up with the answer.
It should be obvious that the Society of Actuaries (SOA) is into predictive analytics in a big way with a section, newsletters, webcasts, meeting sessions and seminars. When the SOA Board of Directors (Board) formed the Learning Strategy Task Force, it was clear from prior SOA research and environmental scanning that predictive analytics would be featured prominently in its recommendations. That is how it turned out: One of the 11 recommendations approved by the Board was to create curricula for professional development; the first to be developed was for predictive analytics.
The various professional development tasks from the Learning Strategy Task Force recommendations were assigned to a Professional Development (PD) Task Force, chaired by Terry Long, FSA, MAAA. The PD Task Force in turn asked the Predictive Analytics Advisory Group (PAAG), chaired by me, to oversee a concept mapping exercise to fill in the blank. Once completed, it becomes possible to build a curriculum that will provide actuaries and others the opportunity to become that person.
Creating the Concept Map
Concept mapping1 combines qualitative brainstorming to generate ideas followed by, what else, a series of analytic steps to make sense of those ideas. It began with hundreds of people invited to provide as many responses as they wanted to the “focus prompt,” the statement that introduced this article. The result was 488 statements.
Not surprisingly, many of the statements were similar. It was a fairly easy process to pare them down to 87 unique statements. Here are some randomly selected statements from those 87:
- Use holdout data to validate a model.
- Apply the principal components method.
- Identify and explain surprise findings.
- Clean and prepare the data for its intended use.
The members of the PAAG and additional predictive analytics thought leaders (30 in total) then went through a sorting exercise. Using an electronic tool, the objective was to place the 87 statements into virtual piles. Statements in a given pile were to be similar to each other and different from those in other piles.
Multidimensional scaling then was used to plot the 87 statements (points) in two dimensions. The basic idea is that the more sorters who put a given pair of statements in the same cluster, the closer those statements will appear on the plot.
Then the points (statements) were gathered into clusters using a cluster analysis algorithm. We chose 17 as the maximum number of clusters that would be meaningful. Our next task was to look at the software output, which indicated how the 17 clusters might be combined further. Two PAAG members then looked at the clusters and statements, and decided that a six-cluster solution made sense.
To support the development of the clusters, 106 respondents had provided a relative importance rating for each of the 87 statements. Ratings from a scale of 1 (unimportant compared to the other statements) to 5 (extremely important compared to the other statements) were provided. The full PAAG examined the results. They confirmed the clusters made sense and then named the clusters—each name representing the set of statements in the clusters. They also tweaked some of the statements, reducing the total to 84.
The result is six clusters. For each, the average relative importance score is given, along with a brief description. The scores didn’t vary much, as it appears all six clusters have value. It is not surprising that communication ranked highest. It doesn’t matter how brilliant the analysis is if the analyst cannot convince anyone else of its value. It is possible that the lower overall rating for methodologies and tools is a function of the respondents’ roles—many of the respondents are in more senior positions and are no longer in the trenches. The order of the clusters outlined in this article is, to some degree, the same as the steps in the modeling process. In total, the clusters will provide a framework for the predictive analytics curriculum.
As you read through the description of each cluster, you might want to consider where you would assign each of the four sample statements provided earlier in this article.
Developing a Learning Strategy
As a member of the Predictive Analytics Advisory Group (PAAG), I had the honor of joining experts in the predictive analytics field to determine the foundation of the Society of Actuaries’ predictive analytics curriculum. As I prepared to meet with the PAAG, I found that the concept mapping framework aligned very closely with an article I was reading on performing k-means clustering in Python. It is very fitting that we developed a learning strategy for predictive analytics by actually using predictive analytics…
Project Management and Planning (3.47)
For those familiar with the Fundamentals of Actuarial Practice course or the Actuarial Control Cycle, this is the “define the problem” stage. Here it is important to understand the goals of the analytics exercise, with particular attention being paid to those who will use the results. With the goals in mind, a project plan can be devised (and thus project planning skills are on the list) that will include data needs, personnel needs (which implies understanding how to build and manage a team), technology needs and other resources, and an understanding of regulatory constraints.
Data Engineering and Management (3.44)
Termed “data wrangling” by one of our respondents, this skill requires the ability to take wild and uncooperative data and teach it to obey your software commands. More seriously, analysts need to determine the appropriate data sources, acquire that data, and then ensure it is clean and fit for the uses identified in the project plan. There may be storage and retrieval issues, challenges with respect to organization (particularly if it is of high dimension with many linkages), missing data problems and privacy constraints.
Model Design Principles and Processes (3.50)
This and the next cluster represent the “design the solution” phase of the control cycle. This cluster represents the mechanics of building a model. It begins with choosing a model and/or technique that is appropriate for the problem. Selection techniques, both between and within models, must be employed to, among others things, select variables, employ transformations or interactions, and select error distributions. Tools, such as hypothesis tests and residual and q-q plots, will aid in making those decisions. Throughout, modeling principles, such as parsimony, should be employed.
Modeling Methodologies and Tools (3.05)
The analyst’s toolkit needs to contain a sufficient variety of models so that an appropriate one can be employed. The list compiled for this exercise included (in no particular order): time series, generalized linear and generalized additive models, hierarchical models, principal components models and machine learning methods in general. Along with that, there should be knowledge of modeling platforms and tools, such as SAS, R, Python, SQL and VBA.
Model Validation and Performance (3.54)
The final step of the control cycle, “monitor the results,” is reflected here. This is slightly different from the diagnostics mentioned under the modeling process. Here the best model has been developed, and it is time to confirm that a valid choice has been made. Techniques, such as stress and scenario testing; graphical assessments, such as gain and lift curves; and using holdout data can be employed. If not done previously, model limitations should be documented, and uncertainty in estimates and predictions quantified. Finally, plans for maintaining and updating the model should be made.
While there likely will be communication throughout the process, both within the team and to other stakeholders, this cluster is mostly about sharing the results of the analysis. Paramount is crafting the message to the intended audience. Oral and written communications need to provide the right amount of detail and make appropriate use of graphs and tables. Appropriate uses and limitations of the analysis need to be communicated and, of course, appropriate standards of practice must be followed.
Now that you know what your peers believe to be the core concepts in predictive analytics, look for future professional development opportunities from the SOA to help you enhance your own mastery of them. Visit SOA.org/PDcalendar.aspx.
- 1. Concept mapping analysis and results conducted using The Concept System Global
MAX software: Concept Systems Inc. Copyright 2004–2016; all rights reserved.