# Using a Kalman Filter to Predict Ticket Prices

Welcome back! In case you missed part one of this series, we’re opening up the hood on Deal Score, one of SeatGeek’s most popular features.

In part one we gave a brief overview of why we sort ticket listings by Deal Score rather than by price. We gave you our two main assumptions:

- Seat quality, within a given venue, has a consistent ordering.
- The relationship of seat price to seat quality follows a similar pattern across all events at a given venue.

In the last post we discussed how we take advantage of the first assumption. Today, we’ll explain the second assumption and how it leads to the accurate valuations of thousands of tickets on a daily basis.

This second assumption means that all we need to do in order to predict prices is find a function, unique to each event, that maps our seat score vector to a vector of expected prices. Since we’re working with limited data here and surely do not have enough data to induce the structure of this curve for each event, we make a further simplifying assumption: for each venue, the curve will look similar for every event at that venue, whether that curve is a straight line, polynomial, or otherwise.

This does not mean that we can assume that a premium seat will carry the same relative premium over a nosebleed for each game. For a game that will not sell out, for example, the value of a nosebleed should be negative–this ticket will not sell, and given the additional expenses involved in attending an event (gas, parking, food, etc.), someone would have to be *paid* in order to sit in that seat. In such a situation, box seats are infinitely more valuable than bleachers. For a premium game, on the other hand, such as a playoff game, opening day, or any game at Fenway Park, all seats will sell out, as there is significant value just getting through the gates to see the event. Perhaps the box seat is only worth two or three times as much as the nosebleed in these cases.

At this point, we break out a terrific tool for processing small amounts of noisy data, the Kalman filter. Heavily used in the guidance and control of spacecraft and aircraft as well as with time-series data in economic and financial spheres, the Kalman filter is an algorithm that uses state estimates of model parameters combined with estimates of their variance to make predictions about the output of a linear dynamic system. I’ll spare you the obligatory rocket science joke and jump straight into a tutorial on how to use a Kalman filter to make predictions in the face of noisy and limited ticket data.

Since every observed price is going to be the output of a noisy system at a point in time, we are most interested in the likely state of the system as of the last observation, making a recursive estimator such as the Kalman filter an excellent choice.

## Step one: model the system

In SeatGeek’s case, our assumption is that the underlying structure of price dynamics is similar across events, and we can therefore generate a curve mapping seat quality to expected prices using the same parameters on the curve each time. As an example, here is a plot with seat scores for Knicks games at Madison Square Garden on the x-axis and historical average sale price for those seats on the y-axis.

I’ve added a best-fit line, which shows that sale prices tend to grow exponentially with seat quality at this particular venue. As such, we will model our price predictions as log-linear with respect to seat quality. We’re about to do a lot of math here, so feel free to skip ahead.

The Kalman filter maintains the state of the filter at step k with two variables:

- \( \mathbf{\hat{x}_{k}} \), the parameters of the model given observations up to and including step k
- \( \mathbf{P}_{k} \), the covariance matrix of parameter errors, a measure of the confidence the model has in its parameters

In our simple case, \( \mathbf{\hat{x}} \) represents the intercept \( \hat{x_1} \) and slope \( \hat{x_2} \) of our line. \( \mathbf{P} \) represents the covariance of parameter errors. This covariance matrix will be used down the line to determine which parameters must change when we make a new observation. It will also determine the magnitude of the adjustment.

A general Kalman filter uses a state-transition matrix \( \mathbf{F} \) in order to advance from one observation to predicting the value of the next.

\[ \mathbf{x}_{k|k-1} = \mathbf{Fx}_{k-1|k-1} + \mathbf{w}_k \]

where \( \mathbf{x}_{k|k-1} \) is our best estimate of \( \mathbf{x}_k \) given observations up to and including \( k-1 \). \( \mathbf{w}_{k} \) is white noise of the same dimension as the model, drawn from a multivariate normal distribution having covariance \( \mathbf{Q} \), representing the process error. Many applications of the filter model physical systems that take velocity and acceleration into account (and behind the scenes, so does SeatGeek). In these cases, \( \mathbf{F} \) can include time-varying parameters, but in our simple example, we set \[ \mathbf{F} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix} \]

Assuming that in between observations the underlying model dynamics do not change according to any known physical system, we use the identity matrix.^{1}

## Step two: model the output

Sharp eyes may have noticed that the preceding equation does not use our lovely seat scores quite yet. The reason is our observations do not come in the form of linear models, but rather in observed fair values for seats, i.e. when users express an intent to buy. We have to model our output as \( \mathbf{z}_k = \mathbf{H}_k \mathbf{x}_k + \mathbf{v}_k \), where \( \mathbf{H}_k \) is a 1×2 matrix \( \begin{bmatrix} 1 & \theta_k \end{bmatrix} \), theta being the rating of the seat in question. Keeping with Kalman filter assumptions we model our residuals, the difference between our observations and predictions, as Gaussian white noise.^{2} In the single-output case, the observation noise can be thought of as the square of our standard estimation error, or how far we allow our predictions to be off before the model updates itself. This variance, \( \mathbf{R} \), will be used later on when we update the model.

## Step three: predict

Now that we have a model of our system, we can start making predictions. Using historical data, we can generate \( \mathbf{x}_0 \), our default parameters, and start predicting prices. Our prediction, of course, is that our observations will lie on the line defined by \( \mathbf{x}_0 \), shown in the image above. In this case, \( \mathbf{x}_0 = \begin{bmatrix} 7.2356 \ , \ .1428 \end{bmatrix} \).

We add those parameters to our listing feed, determine seat quality from the data provided to us by the market in question, and predict a price for each listing. We compare the predicted price to the listed price, assign a Deal Score to each listing, and sort your search results accordingly. We live to fight another day.

## Step four: observe

Market dynamics, however, are not so kind as to stay constant, and our models, alas, are unable to perfectly predict every price from the outset. The Kalman filter is thus useful for responding to changing tides. Since observations of changing tides can be few and far between and must inform our predictions on all other tickets, it behooves us to have a degree of certainty about our model, which we represent by \( \mathbf{P} \), the 2×2 covariance matrix of our state estimate errors. A good design decision is to start off \( \mathbf{P}_0 \) with large numbers on the diagonals and zeroes elsewhere, assuming low certainty of model parameters and independence.^{2} The filtering process should be able to give you excellent color on their true relationship.

Many of the signals discussed in part one can be interpreted, directly or indirectly, as a fair ticket price, and when we observe a new price, \( \mathbf{z}_k \), we model the residual as \( y_k = \mathbf{z}_k – \mathbf{H}_k \mathbf{\hat{x}}_{k|k-1} \) where \( \mathbf{H}_k \mathbf{\hat{x}}_{k|k-1} \) represents our predicted price for that seat. In the Kalman filter, the residual variance (variance of \( y_k \)) is modeled as \( \mathbf{S}_k = \mathbf{H}_k \mathbf{P}_{k|k-1} \mathbf{H}_{k}^{\mathbf{T}} + \mathbf{R} \). In the general case, these are covariance matrix. Since our model outputs only one value, a predicted price, \( \mathbf{S} \) and \( \mathbf{R} \) are variances. \( \mathbf{R} \) we have seen before, this is the general model of our error variance. \( \mathbf{S}_k \) is the variance of this particular observation, which varies depending on the seat score of this particular observation (see dotted lines in the slideshow below). Variance is higher on expensive tickets than it is on cheaper tickets. We will use the residual and its variance in the next step to order to update our parameters.

## Step five: update

We now come to the key element of the Kalman filter, the gain. The gain takes our *a priori* estimate covariance \( \mathbf{P}_{k|k-1} \), our observation model \( \mathbf{H}_k \), and our residual variance \( \mathbf{S} \) in order to decide how much we should change our model parameters before the next prediction. Our optimal gain is \( \mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{H}_{k}^{\mathbf{T}} \mathbf{S}_{k}^{-1} \). Kalman gain is a bit of a tricky nut to crack. If you think of the new parameters \( \mathbf{x}_{k+1} \) as a weighted average of the old estimate \( \mathbf{x}_k \) and the new observation, \( \mathbf{K} \) provides the optimal weight for the observed residual in the new average.

If you’ve been following along, you can see that the larger our *a priori* uncertainty, the higher this gain factor gets. Now that we have a gain factor, we can start making some updates.

For the model parameters, this is easy. We simply scale the Kalman gain by the measurement residual, yielding us a new estimate. You can see here that if we guessed the price exactly, the slope and intercept do not change (a good sanity check) and that if we were fairly sure about the estimate beforehand, it requires a major miss before we update it substantially:

\[ \mathbf{\hat{x}}_{k|k} = \mathbf{\hat{x}}_{k|k-1} + \mathbf{K}_{k} y_k \]

Similarly, we also update our error covariance matrix,

\[ \mathbf{P}_{k|k} = ( I - \mathbf{K}_k \mathbf{H}_k ) \mathbf{P}_{k|k-1} \]

The error covariance update is a bit of a headache as well; the easiest way to think about it is to remember that \( \mathbf{P} \) represents the covariance of our parameter estimation errors \( \mathbf{x}_k – \mathbf{\hat{x}_{k|k-1} } \) and to play around with what makes the changes in \( \mathbf{P} \) large or small. For example, if the observation noise, \( \mathbf{S}_k \) is very large, our gain will be small and our certainty will remain mostly unchanged.

Now that we understand how the filter works, let’s rejoin our original programming and see it in action!

## Putting it all together

The slideshow below takes you on a visual tour through several steps of the dynamic linear model. In all slides, the dark red line represents our estimate of the mapping from seat quality to expected price and the and the dashed lines represent our 95% confidence interval for the price.

[portfolio_slideshow exclude="26210,26197"]

Before concluding, I’d like to note that a major motivation behind this series was the lack of real-world Kalman filter examples out here on the internet, which is disappointing given its usefulness as an estimator, especially for low-dimensional time-variant systems with small data. Here are some of the better articles I’ve found:

- Giovanni Petris’ monograph on dynamic linear models in R
- his excellent description of dlm, his R package for dynamic linear modeling
- a wonderful cheat sheet on the dimensionality of higher order Kalman filters
- an introduction to Kalman filters for predicting basketball game scores, written by the incomparable Dean Oliver

I gladly welcome thoughts on our usage of the filter or critiques of my explanation from those who have a better handle on things. Leave a comment or find me on twitter @steve_rit.

**Notes:**

^{1} If we wanted to add time-variance to our parameters, we could use something like: \[ \mathbf{F} = \begin{bmatrix} 1 & 0 & \Delta t \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

^{2} To cut down on the amount of notation, I’ve removed some symbols representing noise that aren’t directly used in the predict-update process.