Chapter 7 Dynamic Decision Model

7.1 Motivations

The model is dynamic if there is an endogenous state variable, a state variable that is affected by an action of a player in the past.
There are many cases where the decision makers have to take into account the dynamic effects of their actions.
Payoff linkages:
- Storable good: Next week’s demand for a detergent depends on how many bottles consumers purchase and consume this week. The latter depends on this week’s price and the price schedule of the detergent. The stock of the detergent consumers hold is an endogenous state variable.
- Learning by doing: The productivity of a firm next year can be higher if the firm produced more this year because the firm can learn from the experience. The cumulative production level is an endogenous state variable.
Information linkages:
- Uncertainty about the quality: If a consumer is uncertain about the quality of a product but can learn from experiencing the product, next season’s demand for the product depends on how many consumers purchase and experience the product this season. The latter depends on this season’s price and the price schedule of the product.
Strategic linkages:
- Tacit collusion: If a firm deviates from the collusive price, the price war will start. Then, the history of the prices is the endogenous state variable.

7.2 Single-Agent Model

7.2.1 Setting

This model originates at Rust (1987), while the setting and the notation follows Pesendorfer & Schmidt-Dengler (2008).
We start from a simple set-up:
Single agent.
Infinite-horizon discrete time.
- Time is \(t = 1, 2, \cdots, \infty\).
Finitely many choices.
- There are \(K + 1\) actions \(A = \{0, 1, \cdots, K\}\).
Finite state space.
- There are \(L\) states \(S = \{1, \cdots, L\}\).
Markovian state transition.

7.2.2 Timing of the Model

At period \(t\):
State \(s_t \in S\) is publicly observed.
Choice-specific profitability shocks \(\epsilon_t \in \mathbb{R}^{K + 1}\) are realized according to \(F(\cdot|s_t)\) and privately observed.
Choice \(a_t \in A\) is made.
State evolves according to a transition probability: \[\begin{equation} g(a, s, s') := \mathbb{P}\{s_{t + 1} = s'|s_t = s, a_t = a\}, \end{equation}\]
Thus, the transition law only depends on today’s state and action, but not on the past history.

\[\begin{equation} G := \begin{pmatrix} g(0, 1, 1) & \cdots & g(0, 1, L)\\ \vdots & & \vdots \\ g(K, 1, 1) & \cdots & g(K, 1, L)\\ & \vdots & \\ g(0, L, 1) & \cdots & g(0, L, L)\\ \vdots & & \vdots \\ g(K, L, 1) & \cdots & g(K, L, L)\\ \end{pmatrix}. \end{equation}\]

7.2.3 Period Payoff

When the state is \(s_t\), action is \(a_t\), and the profitability shocks are \(\epsilon_t\), the period payoff is: \[\begin{equation} \pi(a_t, s_t) + \sum_{k = 1}^K \epsilon_{tk} 1\{a_t = k\}, \end{equation}\]
\(\pi(a_t, s_t)\) is the mean choice-specific period payoff.
\(\epsilon_t\) is assumed to be i.i.d. across times and is drawn from \(F\).
Let: \[ \epsilon_{t a_t} := \sum_{k = 1}^K \epsilon_{tk} 1\{a_t = k\}, \] be the choice-specific profitability shock.
Let \(\Pi\) summarize the choice-specific period payoffs at each state: \[\begin{equation} \Pi = \begin{pmatrix} \pi(0, 1)\\ \vdots \\ \pi(K, 1)\\ \vdots \\ \pi(0, L)\\ \vdots \\ \pi(K, L)\\ \end{pmatrix}. \end{equation}\]
The payoff is the discounted sum of future payoffs with discount factor \(\beta < 1\).
\(\Pi\) is one of the parameters of interest.

7.2.4 Markovian Framework

The strategy is in general a mapping from the entire history to the action set.
We restrict the set of possible strategies to Markovian strategies \(a(\epsilon_t, s_t)\) that only depends on the latest realization of states \(\epsilon_t, s_t\), i.e., the behavior does not depend on the past states, conditional on today’s states.
The Markovian strategy is introduced by Maskin & Tirole (1988a). But their model is slightly different from the current mode. They considered an oligopoly model in which only one firm can move at one time and his/her action depends only on the rival’s latest move. The current single-agent model can be regarded as a version of Maskin & Tirole (1988a)’s model such that the rival is replaced with the nature.

7.2.5 Belief

When a player makes a decision, s/he should have some belief about the future \(\epsilon_t, s_t\), and \(a_t\).
We usually assume the rational expectation: the play knows the equilibrium distribution of these future variables and use it as his/her belief.
The distribution of \(\epsilon_t\) and \(s_t\) is believed to follow \(F\) and \(G\).
Let \(\sigma(a|s)\) be the player’s belief about the possibility of taking \(a\) when the realized state is \(s\), which may or may not coincide with the equilibrium probability.
Let \(\sigma\) stack them up as: \[\begin{equation} \sigma = \begin{pmatrix} \sigma(0|1)\\ \vdots\\ \sigma(K|1)\\ \vdots\\ \sigma(0|L)\\ \vdots\\ \sigma(K|L) \end{pmatrix}. \end{equation}\]

7.2.6 Decision Problem

The agent chooses strategy \(a(\cdot, \cdot)\) such that: \[\begin{equation} \begin{split} \max_{a(\cdot, \cdot)} & \pi[a(\epsilon_0, s_0), s_0] + \epsilon_{0 a(\epsilon_0, s_0)}\\ &+ \mathbb{E}\Bigg\{ \sum_{t = 1}^\infty \beta^t \Bigg[\pi(a(\epsilon_t, s_t), s_t) + \epsilon_{t a(\epsilon_t, s_t)}\Bigg]\Bigg|s_0, a(\epsilon_0, s_0)\Bigg\} \end{split} \end{equation}\]
The expectation is taken with respect to his/her belief.

7.2.7 Value Function and Ex-ante Value Function

When the belief about the future behavior is \(\sigma\), then the value function associated with the belief is defined as: \[\begin{equation} \begin{split} &V(\sigma, s_0, \epsilon_0)\\ &= \sum_{a \in A} \sigma(a|s_0) \Bigg\{\pi(a, s_0) + \epsilon_{0a} + \mathbb{E}\Bigg[ \sum_{t = 1}^\infty \beta^t \sum_{a \in A}\sigma(a|s_t)\Bigg(\pi(a, s_t) + \epsilon_{ta}\Bigg)\Bigg|s_0, a\Bigg] \Bigg\}. \end{split} \end{equation}\]
It has the recursive structure as: \[\begin{equation} \begin{split} &V(\sigma, s_0, \epsilon_0)\\ & = \sum_{a \in A} \sigma(a|s_0) \Bigg\{\pi(a, s_0) + \epsilon_{0a} + \beta \mathbb{E}\Bigg[V(\sigma, s_1, \epsilon_1)\Bigg|s_0, a\Bigg]\Bigg\}\\ & = \sum_{a \in A} \sigma(a|s_0) \Bigg\{\pi(a, s_0) + \epsilon_{0a} + \beta \sum_{s_1 \in S} V(\sigma, s_1, \epsilon_1)g(a, s_0, s_1)\Bigg\}. \end{split} \end{equation}\]
\(V(\sigma, s, \epsilon)\) is the value function after the profitablity shock \(\epsilon\) is realized.
On the other hand, we define the ex-ante value function under belief \(\sigma\) as: \[\begin{equation} V(\sigma, s) = \mathbb{E}\{V(\sigma, s, \epsilon)|s\}. \end{equation}\]

7.2.8 Choice-specific Value Function

When the current state and profitability shocks are \(s\) and \(\epsilon\) and the belief about the future behavior is \(\sigma\), we define the choice-specific value function for an agent in a period as follows: \[\begin{equation} \begin{split} V(\sigma, a, s, \epsilon) &= \pi(a , s) + \epsilon_a + \beta \sum_{s' \in S} V(\sigma, s') g(a, s, s')\\ &= \underbrace{\pi(a , s) + \beta \sum_{s' \in S} V(\sigma, s') g(a, s, s')}_{v(\sigma, a, s)} + \epsilon_a. \end{split} \end{equation}\]
We call \(v(\sigma, a, s)\) be the choice-specific mean value function with belief \(\sigma\).
\(V(\sigma, s, \epsilon)\), \(V(\sigma, s)\), \(V(\sigma, a, s, \epsilon)\), and \(v(\sigma, a, s)\) are all different, abusing the notation.

7.2.9 Optimality Condition

When the state and profitability shocks are \(s\) and \(\epsilon\), \(a\) is the optimal choice if and only if: \[\begin{equation} v(\sigma, a, s) + \epsilon_{a} \ge v(\sigma, a', s) + \epsilon_{a'}, \forall a' \in A. \end{equation}\]
This condition looks similar to the optimality condition in the static discrete choice model.
The only difference from the static discrete choice model is that the mean indirect utility is the sum of the choice-specific mean profit and the discounted continuation value.
Thus, as long as the mean choice-specific value function for given parameters can be computed, the following simulation and estimation procedure will be similar to the static discrete choice model.

7.2.10 Optimal Conditional Choice Probability

From the previous optimality condition, we can define the optimal conditional choice probability with belief \(\sigma\) as: \[\begin{equation} \begin{split} p(a|s) &:= \mathbb{P}\{v(\sigma, a, s) + \epsilon_{a} \ge v(\sigma, a', s) + \epsilon_{a'}, \forall a' \in A\}\\ &= \int \prod_{a' \neq a} 1\{v(\sigma, a, s) + \epsilon_{a} \ge v(\sigma, a', s) + \epsilon_{a'}\} dF\\ &:= \Psi(\sigma, a, s). \end{split} \end{equation}\]
\(\Psi(\sigma, a, s)\) maps the tuple of action, state and belief to the optimal conditional choice probability of the action given the state and the belief.

7.2.11 Optimality Condition

Let \(p\) and \(\Psi\) be: \[\begin{equation} p = \begin{pmatrix} p(0|1)\\ \vdots\\ p(K|1)\\ \vdots\\ p(0|L)\\ \vdots\\ p(K|L) \end{pmatrix}, \end{equation}\]

\[\begin{equation} \Psi(\sigma) = \begin{pmatrix} \Psi(\sigma, 0, 1)\\ \vdots\\ \Psi(\sigma, K, 1)\\ \vdots\\ \Psi(\sigma, 0, L)\\ \vdots\\ \Psi(\sigma, K, L) \end{pmatrix}. \end{equation}\] - The optimality condition with respect to the conditional choice probabilities given the belief is written as: \[\begin{equation} p = \Psi(\sigma). \end{equation}\] - The rational expectation hypothesis requires that the belief about the future behavior coincides with the optimal conditional choice probability, i.e.: \[\begin{equation} p = \Psi(p). \end{equation}\] - The optimal conditional choice probability under the rational expectation hypothesis is characterized as a fixed point of mapping \(\Psi\).

7.2.12 Mapping from a Conditional Choice Probability to an Ex-ante Value Function

Inserting \(\sigma = p\), we obtain a mapping from an optimal conditional choice probability to an ex-ante value function such as: \[\begin{equation} \begin{split} V(s) &= \mathbb{E}\{V(s, \epsilon)|s\}\\ &= \mathbb{E}\left[ \sum_{t = 0}^\infty \beta^t \sum_{a \in A}p(a|s_t)\left[\pi(a, s_t) + \epsilon_{ta}\right]\Bigg|s_0, a\right]\\ &:= \varphi(p, s). \end{split} \end{equation}\]

7.2.13 Mapping from an Ex-ante Value Function to an Optimal Conditional Choice Probability

On the other hand, we can derive a mapping from an ex-ante value function to an optimal conditional choice probability such as: \[\begin{equation} \begin{split} p(a|s) = \mathbb{P}\Bigg\{&\pi(a , s) + \beta \sum_{s' \in S} V(s') g(a, s, s') + \epsilon_a \ge\\ &\pi(a' , s) + \beta \sum_{s' \in S} V(s') g(a', s, s') + \epsilon_{a'}, \forall a' \in A \Bigg\}\\ &:= \Lambda(V, a, s). \end{split} \end{equation}\]

7.2.14 The Optimality Conditions

Composing these two mappings, we can write down the optimality condition as the fixed-point for ex-ante value functions: \[\begin{equation} V = \varphi(p) = \varphi[\Lambda(V)] := \Phi(V). \end{equation}\]
Or as the fixed-point for the optimal conditional choice probabilities: \[ p = \Lambda(V) = \Lambda[\varphi(p)] := \Psi(p). \]

7.2.15 Fixed-point Algorithm

If \(\epsilon\) is drawn from an i.i.d. type-I extreme value distribution, we can derive the mapping from the ex-ante value function to the conditional choice probability in the closed form: \[\begin{equation} \begin{split} \Lambda(V, a, s) &= \mathbb{P}\{\pi(a , s) + \beta \sum_{s' \in S} V(s') g(a, s, s') + \epsilon_{a} \ge\\ & \pi(a' , s) + \beta \sum_{s' \in S} V(s') g(a', s, s') + \epsilon_{a'}, \forall a' \in A\}\\ &=\frac{\exp[\pi(a , s) + \beta \sum_{s' \in S} V(s') g(a, s, s')]}{\sum_{a' \in A} \exp[\pi(a' , s) + \beta \sum_{s' \in S} V(s') g(a', s, s')]}. \end{split} \end{equation}\]
Moreover, we can also derive the mapping from the ex-ante value function to the ex-ante value function as follows: \[\begin{equation} \begin{split} \Phi(V) &= \mathbb{E}\{\max_{a \in A} \pi(a , s) + \beta \sum_{s' \in S} V(s') g(a, s, s') + \epsilon_{a}\} \\ &=\log \Bigg\{\sum_{a \in A} \exp[\pi(a , s) + \beta \sum_{s' \in S} V(s') g(a, s, s')] \Bigg\} + \gamma, \end{split} \end{equation}\] where \(\gamma\) is Euler’s constant.
\(\Phi\) is shown to be a contraction mapping as long as \(\beta < 1\).
Thus, we can solve the model by starting from an arbitrary ex-ante value function \(V\) and by iterating the mapping \(\Phi\) until the change in the ex-ante value functions is below a certain threshold.
Rust (1987) gives the result for more general distributional assumption for \(\epsilon\).

7.3 Identification

7.3.1 Unidentification Result

The identification of the single-agent dynamic decision model is studied in Magnac & Thesmar (2002).
The model primitives are \((\Pi, F, \beta, G)\).
The transition probability \(G\) is directly identified from the data because \(a, s, s'\) are observed by econometrician.
In the same manner, we can directly identify the optimal conditional choice probability \(p\) because \(a\) and \(s\) are observed by econometrician.
It is known that the model is in general not identified.
The discount factor \(\beta\) is hard to identify.
- It determines the weights between the current and future profits.
- Suppose that a firm makes a large investment.
- This may be because the firm overweights the future (high \(\beta\)) or because the investment cost is low (\(\pi\) is such that the investment cost is low).
- We cannot distinguish between these two possibilities.
- To identify it, you need some instruments that changes the future return to the investment but does not affect today’s payoff.

7.3.2 Identification when \(\beta\) and \(F\) is Known

We often fix \(\beta\) and assume the distribution \(F\) and only consider the identification of \(\Pi\).
Note that the optimal conditional choice probability is directly identified from the data because \(s\) and \(a\) are observed.
Then the optimality condition under the rational expectation hypothesis gives the following \(KL\) system of equations: \[ p = \Psi(p). \]
On the other hand, the dimension of parameter \(\Pi\) is in general \((K + 1)L\) (the mean profit at a state and an action).
One possible restriction is to assume that \(\pi(0, s)\) are known for any \(s\). For example, assume that \(a = 0\) means that the firm is inactive and so \(\pi(0, s) = 0\).

7.3.3 Crucial Assumptions for the Argument

The following assumptions are crucial for the above argument.

Conditional i.i.d. Unobservable: The profit shocks that are unobservable to econometrician are i.i.d. conditional on the observable state.
Conditional Independence of Future Observable State: \[\begin{equation} \mathbb{P}\{s_{t + 1}|s_t, a_t, \epsilon_t\} = \mathbb{P}\{s_{t + 1}|s_t, a_t\}. \end{equation}\]

If the first assumption is violated, the choice probability cannot be written as a function only of the observable state of the period. If \(\epsilon_t\) is serially correlated, to integrate over \(\epsilon_{t + 1}\) we have to condition on \(\epsilon_t\). Then, we may not be able to identify the optimal conditional choice probability from the data.
If the second assumption is violated, for the same reason, we may not be able to identify the state transition law.
Kasahara & Shimotsu (2009) proves the identification when the first assumption is violated: there is a player-specific fixed effect that has a finite-mixture structure.

7.4 Estimation by Nested Fixed-point Algorithm

7.4.1 Nested Fixed-Point Algorithm

A straightforward way of estimating the single-agent dynamic model is to solve the optimal conditional choice probability by a fixed-point algorithm for each parameter and evaluate the likelihood function using the optimal conditional choice probability.
Because a fixed-point algorithm is nested in the parameter search, Rust (1987) named it the nested fixed-point algorithm.

7.4.2 Solving for the Ex-ante Value Function

Let \(\theta_1\) be the parameters that determine \(\Pi\), \(\theta_2\) be the parameters that determine \(G\), and \(\theta = (\theta_1', \theta_2')'\).
The ex-ante value function is a fixed point of a contraction mapping \(\Phi^\theta\) such that: \[\begin{equation} \begin{split} \Phi^\theta(p) &=\log \Bigg\{\sum_{a \in A} \exp[\pi^{\theta_1}(a , s) + \beta \sum_{s' \in S} V(s') g^{\theta_2}(a, s, s')] \Bigg\} + \gamma, \end{split} \end{equation}\]
Let \(V^{\theta (0)}\) be an arbitrary initial function and define \(V^{\theta (r + 1)}\) by: \[ V^{\theta (r + 1)} = \Phi^{\theta}(V^{\theta (r)}). \]
We iterate this until \(|V^{\theta (r + 1)} - V^{\theta (r)}|\) is below a certain threshold.
Let \(V^{^\theta (\ast)}\) be the solution to the fixed-point algorithm.
Then, we can derive the optimal choice probability by: \[\begin{equation} \begin{split} p^{\theta (\ast)} = \Lambda\left[V^{\theta (\ast)}\right]. \end{split} \end{equation}\]
These are the ex-ante value function and optimal conditional choice probabilities under parameters \(\theta\).

7.4.3 Estimation by Nested Fixed-Point Algorithm

The previous algorithm allows us to derive the optimal choice probability given parameters.
Then it is straight forward to evaluate the likelihood function given observations \(\{a_t, s_t\}_{t = 1}^T\) by: \[\begin{equation} \begin{split} &L(\theta; \{a_t, s_t\}_{t = 1}^T) =\prod_{t = 1}^T \prod_{a_t = 0}^1 p^{\theta (\ast)}(a_t|s_t)^{a_t} g^{\theta_2} (s_{t + 1}|s_t, a_t). \end{split} \end{equation}\] and so the log likelihood function is: \[\begin{equation} \begin{split} &l(\theta; \{a_t, s_t\}_{t = 1}^T)\\ &=\sum_{t = 1}^T \sum_{a_t = 0}^1 a_t \log [p^{\theta (\ast)}(a_t|s_t)] + \sum_{t = 1}^T \log [g^{\theta_2} (s_{t + 1}|s_t, a_t)]. \end{split} \end{equation}\]

7.4.4 Full and Partial Likelihood

We can find \(\theta\) that maximizes the full log likelihood \(l(\theta; \{a_t, s_t\}_{t = 1}^T)\) to estimate the model.
However, the convergence takes longer as the number of parameters are larger.
Parameters that govern the state transition is estimated by finding \(\theta_2\) that maximizes the partial likelihood: \[\begin{equation} \hat{\theta}_2 = \text{argmax}_{\theta_2} \sum_{t = 1}^T \log g^{\theta_2}(s_{t + 1}|s_t, a_t). \end{equation}\]
Then we can estimate \(\theta_1\) by finding \(\theta_1\) that maximizes the partial likelihood: \[\begin{equation} \hat{\theta}_1 = \text{argmax}_{\theta_1} \sum_{t = 1}^T \sum_{a_t = 0}^1 a_t \log [p^{(\theta_1, \hat{\theta}_2) (\ast)}(a_t|s_t)]. \end{equation}\]
This causes some efficiency loss but speeds up the computation, because we can estimate \(\theta_2\) without solving the fixed-point.

7.5 Estimation by Conditional Choice Probability (CCP) Approach

7.5.1 CCP Approach

Conditional Choice Probability (CCP) approach suggested by Hotz & Miller (1993) significantly reduces the computation time at the cost of some efficiency.
This approach can be applied to many other settings.
The idea is:
- We can identify the optimal conditional choice probability \(p^\theta\) directly from the data. This is a reduced-form parameter (cf. \(\theta\) is the structural parameters) of the model.
- The optimality condition \(p^\theta = \Psi^\theta(p^\theta)\) can be regarded as a moment condition.
In the nested fixed-point algorithm, we find \(p^\theta\) that solves the optimality condition given \(\theta\) to compute the likelihood.
In CCP approach, we find \(\theta\) that solves the optimality condition given \(p^\theta\) that is identified directly from the data.

7.5.2 First Step: Estimating CCP

The first step of the CCP approach is to estimate the conditional choice probability and transition probability.
If everything is discrete, it is nothing but the empirical distribution: \[\begin{equation} \begin{split} &\hat{p}(a|s) = \frac{\sum_{i = 1}^N \sum_{t = 1}^T 1\{a_{it} = a, s_{it} = s\}}{\sum_{i = 1}^N \sum_{t = 1} 1\{s_{it} = s\}},\\ &\hat{g}(s'|s, a) = \frac{\sum_{i = 1}^N \sum_{t = 1}^T 1\{s_{i, t + 1} = s', s_{it} = s, a_{it} = a\}}{\sum_{i = 1}^N \sum_{t = 1} 1\{s_{it} = s, a_{it} = a\}}. \end{split} \end{equation}\]
We can of course use a parametric model.
For example, we may estimate the conditional choice probability with a multinomial logit models: \[\begin{equation} \begin{split} &\hat{p}(a|s) = \frac{\exp[\hat{\beta} a + \hat{\gamma} s)]}{\sum_{a' \in A} \exp[\hat{\beta} a' + \hat{\gamma} s)]}. \end{split} \end{equation}\]

7.5.3 First Step: Estimating CCP

What is the estimated CCP \(\hat{p}\)?
This is the optimal conditional choice probability at a particular equilibrium under a true parameter.
If parameter changes, then the equilibrium changes. Then, the conditional choice probability also changes.
The reduced-form parameter \(\hat{p}\) embodies the information about behaviors under the actual equilibrium but does not tell anything about behaviors under hypothetical equilibria.
Therefore, \(\hat{p}\) is not sufficient to make counterfactual prediction.

7.5.4 Second Step: Estimating Structural Parameters

Among structural parameters \(\theta\), parameters in the transition probability \(\theta_2\) is already identified from the data in the first step.
How do we identify \(\theta_1\), parameters in the profit function \(\pi\)?
If we fix \(\theta_1\), in theory, we can compute: \[\begin{equation} \begin{split} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s) = \varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p}, s) = \mathbb{E}\Bigg[ \sum_{t = 0}^\infty \beta^t \sum_{a \in A}\hat{p}(a|s_t)\Bigg(\pi^{\theta_1}(a, s_t) + \epsilon_{ta}\Bigg)\Bigg|s\Bigg],\\ \end{split} \end{equation}\] although the expectation may not have a closed form solution.

7.5.5 Second Step: Estimating Structural Parameters

In addition, if we fix \(\theta_1\), in theory, we can compute: \[\begin{equation} \begin{split} &\Lambda^{(\theta_1, \hat{\theta}_2)}(\hat{V}^{(\theta_1, \hat{\theta}_2)}, a, s)\\ &:= \mathbb{P}\Bigg\{\pi^{\theta_1}(a , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a, s, s') + \epsilon_a\\ &\ge \pi^{\theta_1}(a' , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a', s, s') + \epsilon_{a'}, \forall a' \in A \Bigg\} \end{split} \end{equation}\]
Combining these two mappings, we can compute: \[\begin{equation} \Psi^{(\theta_1, \hat{\theta_2})}(\hat{p}) = \Lambda^{(\theta_1, \hat{\theta}_2)}[\varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p})]. \end{equation}\]
Then, we can find \(\theta_1\) that minimizes the distance between \(\hat{p}\) and \(\Psi^{(\theta_1, \hat{\theta_2})}(\hat{p})\) to find \(\theta_1\) that is consistent with the observed conditional choice probabilities.

7.5.6 Type-I Extreme Value Distribution

\(\Lambda\) and \(\varphi\) do no have closed form expressions in general.
Exception is the case where the profitability shocks \(\epsilon_{ta}\) is drawn from i.i.d. type-I extreme value distribution.
First, we know that \(\Lambda\) can be written as: \[\begin{equation} \begin{split} &\Lambda^{(\theta_1, \hat{\theta}_2)}(\hat{V}^{(\theta_1, \hat{\theta}_2)}, a, s)\\ &:= \mathbb{P}\Bigg\{\pi^{\theta_1}(a , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a, s, s') + \epsilon_a \ge\\ &\pi^{\theta_1}(a' , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a', s, s') + \epsilon_{a'}, \forall a' \in A \Bigg\}\\ &=\frac{\exp\Big[\pi^{\theta_1}(a , s) + \beta \sum_{s' \in S}\hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a, s, s')\Big]}{\sum_{a' = 0}^K \exp\Big[\pi^{\theta_1}(a' , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a', s, s') \Big]}. \end{split} \end{equation}\]
Second, we can show that \(\varphi\) has a closed form solution: \[\begin{equation} \begin{split} &\varphi^{(\theta_1, \hat{\theta}_2)}(p, s)\\ &:= \mathbb{E}\{V(s, \epsilon)|s\}\\ &= \mathbb{E}\Bigg[ \sum_{t = 0}^\infty \beta^t \sum_{a \in A}\hat{p}(a|s_t)\Bigg(\pi^{\theta_1}(a, s_t) + \epsilon_{ta}\Bigg)\Bigg|s\Bigg]\\ &=\mathbb{E}\Bigg[\sum_{a \in A}\hat{p}(a|s)\Bigg(\pi^{\theta_1}(a, s) + \epsilon_{a} + \beta \sum_{s' \in S} \mathbb{E}\{\hat{V}^{(\theta_1, \hat{\theta}_2)}(s, \epsilon)|s'\} g^{\hat{\theta}_2}(a, s, s') \Bigg)\Bigg|s\Bigg]\\ &=\mathbb{E}\Bigg[\sum_{a \in A}\hat{p}(a|s)\Bigg(\pi^{\theta_1}(a, s) + \epsilon_{a} + \beta \sum_{s' \in S} \varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p}, s') g^{\hat{\theta}_2}(a, s, s') \Bigg)\Bigg|s\Bigg]\\ &=\sum_{a \in A}\hat{p}(a|s)\Bigg(\pi^{\theta_1}(a, s) + \mathbb{E}[\epsilon_{a}|s, a] + \beta \sum_{s' \in S} \varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p}, s') g^{\hat{\theta}_2}(a, s, s') \Bigg) \end{split} \end{equation}\]
We need closed form expression of \(\mathbb{E}[\epsilon_{a}|s, a]\): the expected value of choice\(-a\) specific profitability shock conditional on that state is \(s\) and \(a\) is optimal (\(\neq\) unconditional mean of \(\epsilon_a\)).

7.5.7 Type-I Extreme Value Distribution

If \(\epsilon_a\) is drawn from i.i.d. type-I extreme value distribution, it can be shown that: \[\begin{equation} \begin{split} \mathbb{E}[\epsilon_{a}|s, a] &= \hat{p}(a|s)^{-1} \int \epsilon_a 1\Bigg\{\pi^{\theta_1}(a , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a, s, s') + \epsilon_a \ge\\ &\pi^{\theta_1}(a' , s) + \beta \sum_{s' \in S} \hat{V}^{(\theta_1, \hat{\theta}_2)}(s') g^{\hat{\theta}_2}(a', s, s') + \epsilon_{a'}, \forall a' \in A \Bigg\}dF(e)\\ &= \gamma - \ln \hat{p}(a|s), \end{split} \end{equation}\] where \(\gamma\) is Euler’s constant: \[\begin{equation} \gamma := \lim_{n \to \infty} \Bigg(\sum_{k = 1}^n \frac{1}{k} - \ln(n) \Bigg) \approx 0.57721... \end{equation}\]

7.5.8 Type-I Extreme Value Distribution

Inserting this into the previous expression, we get: \[\begin{equation} \begin{split} \varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p}, s) = \sum_{a \in A}\hat{p}(a|s)\Bigg(\pi^{\theta_1}(a, s) + \gamma - \ln \hat{p}(a|s) + \beta \sum_{s' \in S} \varphi^{(\theta_1, \hat{\theta}_2)}(\hat{p}, s') g^{\hat{\theta}_2}(a, s, s') \Bigg). \end{split} \end{equation}\]

7.5.9 Type-I Extreme Value Distribution

Write the continuation value in a matrix form: \[\begin{equation} \begin{split} & \sum_{s' \in S} \varphi^{(\theta_1, \hat{\theta}_2)}(p, s') g^{\hat{\theta}_2}(a, s, s')\\ & = [g^{\hat{\theta}_2}(a, s, 1), \cdots, g^{\hat{\theta}_2}(a, s, L)] \underbrace{\begin{bmatrix} \varphi^{(\theta_1, \hat{\theta}_2)}(p, 1)\\ \vdots\\ \varphi^{(\theta_1, \hat{\theta}_2)}(p, L). \end{bmatrix}}_{:= \varphi^{(\theta_1, \hat{\theta}_2)}(p)} \end{split} \end{equation}\]

7.5.10 Type-I Extreme Value Distribution

Write the ex-ante value function in a matrix form:

\[\begin{equation} \begin{split} &\varphi^{(\theta_1, \hat{\theta}_2)}(p, s)\\ &=\underbrace{[p(0|s), \cdots, p(K|s)]}_{:= p(s)'} \\ &\times\begin{bmatrix} \underbrace{\begin{bmatrix} \pi^{\theta_1}(0, s)\\ \vdots\\ \pi^{\theta_1}(K, s) \end{bmatrix}}_{:= \pi^{\theta_1}(s)} + \gamma - \underbrace{\begin{bmatrix} \ln p(0|s)\\ \vdots\\ \ln p(K|s) \end{bmatrix}}_{:= \ln p(s)} +\beta \underbrace{\begin{bmatrix} g^{\hat{\theta}_2}(0, s, 1), \cdots, g^{\hat{\theta}_2}(0, s, L)\\ \vdots\\ g^{\hat{\theta}_2}(K, s, 1), \cdots, g^{\hat{\theta}_2}(K, s, L) \end{bmatrix}}_{:= G^{\hat{\theta}_2}(s)} \varphi^{(\theta_1, \hat{\theta}_2)}(p) \end{bmatrix}\\ &=p(s)'[\pi^{\theta_1}(s) + \gamma - \ln p(s)] + \beta p(s)' G^{\hat{\theta}_2}(s) \varphi^{(\theta_1, \hat{\theta}_2)}(p) \end{split} \end{equation}\]

Stacking up for \(s\), we get: \[\begin{equation} \begin{split} &\varphi^{(\theta_1, \hat{\theta}_2)}(p) = \begin{bmatrix} p(1)'[\pi^{\theta_1}(1) + \gamma - \ln p(1)]\\ \vdots\\ p(L)'[\pi^{\theta_1}(L) + \gamma - \ln p(L)] \end{bmatrix} +\beta \begin{bmatrix} p(1)' G^{\hat{\theta}_2}(1)\\ \vdots\\ p(L)' G^{\hat{\theta}_2}(L) \end{bmatrix} \varphi^{(\theta_1, \hat{\theta}_2)}(p)\\ &\Leftrightarrow\\ &\varphi^{(\theta_1, \hat{\theta}_2)}(p) = \begin{bmatrix} I - \beta \begin{bmatrix} p(1)' G^{\hat{\theta}_2}(1)\\ \vdots\\ p(L)' G^{\hat{\theta}_2}(L) \end{bmatrix} \end{bmatrix}^{-1} \begin{bmatrix} p(1)'[\pi^{\theta_1}(1) + \gamma - \ln p(1)]\\ \vdots\\ p(L)'[\pi^{\theta_1}(L) + \gamma - \ln p(L)] \end{bmatrix}. \end{split} \end{equation}\]
Note that you can get this expression even if the profitability shocks are not type-I extreme value, although you need numerical integration for \(\mathbb{E}\{\epsilon_a|s, a\}\) instead of the analytical solution \(\gamma - \ln p(a|s)\).
Let: \[ \Sigma(p) = \begin{pmatrix} p(1)' & & \\ & \ddots & \\ & & p(L)' \end{pmatrix} \] and: \[ E(p) = \gamma - \ln p, \] we can have a matrix representation: \[ \varphi^{(\theta_1, \hat{\theta}_2)}(p) = [I - \beta \Sigma(p) G]^{-1}\Sigma(p)[\Pi + E(p)]. \]

7.5.11 General Distribution

If the profitability shock \(\epsilon_a\) is not an i.i.d. type-I extreme value random variable, you may need to compute \(\mathbb{E}\{\epsilon_a|s, a\}\) and \(\Lambda^{(\theta_1, \hat{\theta}_2)}(V)\) numerically.
This may or may not feasible.

7.6 Unobserved Heterogeneity

7.6.1 Dynamic Decision Model with a Finite Mixture

Decision makers such as a firm, a worker, and a consumer can be different in an unobserved manner.
Finite mixture models is restrictive yet flexible enough modeling framework of unobserved heterogeneity.
Kasahara & Shimotsu (2009) consider a dynamic decision model with a finite mixture and provide a sufficient condition for identification.

7.6.2 Setting

Each period, each player makes a choice \(a_t\) from a discrete and finite set \(A\) conditioning on \((x_t, x_{t - 1}, a_{t - 1}) \in X \times X \times A\) (being allowed to depend on \(x_{t - 1}\) and \(a_{t - 1}\)).
\(x_t\) is observable individual characteristics that can change over time.
Each player belongs to one of \(M\) types.
For example, parameters are different across types.
The probability of belonging to type \(m\) is denoted by \(\pi^m\) such that \(\sum_{m = 1}^M \pi^m = 1\).
Type \(m\)’s conditional choice probability: \(P_t^m(a_t|x_t, x_{t - 1}, a_{t - 1})\).
Type \(m\)’s initial probability of \((x_1, a_1)\): \(p^{*m}(x_1, a_a)\).
Type \(m\)’s transition probability of \(x_t\): \(f_t^m(x_t|\{x_{\tau}, a_{\tau}\}_{\tau = 1}^{t - 1})\) (being allowed to depend on the entire history).

7.6.3 Observation

We have a panel data set with time-deimension equal to \(T\).
Each player’s observation \(w_i = \{a_{it}, x_{it}\}_{t = 1}^T\) is drawn randomly from an \(M\)-term mixture distribution such as: \[ \begin{split} P(\{a_t, x_t\}_{t = 1}^T) &= \sum_{m = 1}^M \pi^m p^{*m}(x_1, a_1) \prod_{t = 2}^T f_t^m(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1}) P_t^m(a_t| x_t, \{x_\tau, a_\tau\}_{\tau = 1}^{t - 1})\\ &= \sum_{m = 1}^M \pi^m p^{*m}(x_1, a_1) \prod_{t = 2}^T f_t^m(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1}) P_t^m(a_t| x_t, x_{t - 1}, a_{t - 1}), \end{split} \] where the second equality uses the Markovian assumption on the conditional choice probability.

7.6.4 Further Assumptions

We start from a model with the following simplifying assumptions, which are often imposed in an applied work.

The choice probability of \(a_t\) does not depend on time: \(P_t^m(a_t|x_t, x_{t - 1}, a_{t - 1}) = P^m(a_t|x_t, x_{t - 1}, a_{t - 1})\). for all \(t\).
The choicd probability of \(a_t\) does not depend on \(x_{t - 1}, a_{t - 1}\): \(P^m(a_t|x_t, x_{t - 1}, a_{t - 1}) = P^m(a_t|x_t)\).
\(f_t^{m}(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1}) > 0\) for all \((x_t, \{x_\tau, a_\tau\}_{\tau = 1}^{t - 1})\) and all \(m\).
The transition function is common across types: \(f_t^m(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1}) = f_t(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1})\) for all \(m\).
The transition function does not depend on time: \(f_t(x_t|\{x_\tau, a_\tau\}_{\tau = 1}^{t - 1}) = f(x_t|x_{t - 1}, a_{t - 1})\) for all \(t\).

Then the probability of an observation becomes: \[ P(\{a_t, x_t\}_{t = 1}^T) = \sum_{m = 1}^M \pi^m p^{*m}(x_1, a_1) \prod_{t = 2}^T f(x_t|x_{t - 1}, a_{t - 1}) P^m(a_t| x_t). \]

7.6.5 Lower-dimensional Submodels

Because \(f(x_t|x_{t - 1}, a_{t - 1})\) is non-parametrically identified from data, we are only concerned with identification of type probabilities and conditional choice probabilities.
Transform the previous equation as: \[ \begin{split} \widetilde{P}(\{a_t, x_t\}_{t = 1}^T) &:= \frac{P(\{a_t, x_t\}_{t = 1}^T)}{\prod_{t = 2}^T f(x_t|x_{t - 1}, a_{t - 1})}\\ &= \sum_{m = 1}^M \pi^m p^{*m}(x_1, a_1) \prod_{t = 2}^T P^m(a_t| x_t). \end{split} \]
Let \(\mathcal{I} := \{i_1, \cdots, i_l\} \subset \{1, \cdots, T\}\) be a subset of time indices.
We define a lower-dimensional submodels given \(\mathcal{I}\) as: \[ \widetilde{P}(\{a_{i_s}, x_{i_s}\}_{i_s \in \mathcal{I}}) = \sum_{m = 1}^M \pi^m p^{*m}(x_1, a_1) \prod_{s = 2}^l P^m(a_t| x_t), \] if \(1 \in \mathcal{I}\) and: \[ \widetilde{P}(\{a_{i_s}, x_{i_s}\}_{i_s \in \mathcal{I}}) = \sum_{m = 1}^M\pi^m \prod_{s = 2}^l P^m(a_t| x_t), \] if \(1 \not\in \mathcal{I}\).
Under each different value of \((x_1, \cdots, x_T)\), above equations imply different restrictions on the type probabilities and conditional choice probabilities.
There are the order of \(|X|^T\) variations in \((x_1, \cdots, x_T)\), whereas the number of parameters \(\{\pi^m, p^{*m}(a, x), P^m(a|X)\}_{m = 1}^M\) is of a order of \(|X|\).

7.6.6 Notations

For notational simplicity, consider a case with \(A = \{0, 1\}\).
Define, for \(\xi \in X\): \[ \lambda_\xi^{*m} := p^{*m}[(a_1, x_1) = (1, \xi)], \] \[ \lambda_\xi^m := P^m(a = 1|x = \xi). \]

7.6.7 Notations for Parameters to be Identified

Let \(\xi_j, j = 1, \cdots, M - 1\) be elements of \(X\) and \(k\) be an element of \(X\).
Define a matrix of type-specific distribution functions and type probabilities as: \[ L := \begin{pmatrix} 1 & \lambda_{\xi_1}^1 & \cdots & \lambda_{\xi_{M - 1}^1}\\ \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_{\xi_1}^M & \cdots & \lambda_{\xi_{M - 1}^M}\\ \end{pmatrix}, \] \[ D_k^* := \begin{pmatrix} \lambda_k^{*1} & & \\ & \ddots & \\ & & \lambda_k^{*M} \end{pmatrix}, \] and \[ V := \begin{pmatrix} \pi^1 & & \\ & \ddots & \\ & & \pi^M \end{pmatrix}. \]

7.6.8 Notations for Observables

Define for every \((x_1, x_2, x_3)\): \[ F_{x_1, x_2, x_3}^* := \widetilde{P}(\{1, x_t\}_{t = 1}^3) = \sum_{m = 1}^M \pi^m \lambda_{x_1}^{*m} \lambda_{x_2}^m \lambda_{x_3}^m. \]
Define for every \((x_2, x_3)\): \[ F_{x_2, x_3} := \widetilde{P}(\{1, x_t\}_{t = 2}^3) = \sum_{m = 1}^M \pi^m \lambda_{x_2}^m \lambda_{x_3}^m. \]
In the same way, define \(F_{x_1, x_2}^*\) and \(f_{x_1, x_3}^*\).
Define for every \(x_1\): \[ F_{x_1}^* := \widetilde{P}(\{1, x_1\}) = \sum_{m = 1}^M \pi^m \lambda_{x_1}^{*m}. \]
In the same way, define \(F_{x_2}\) and \(F_{x_3}\).
In the notations above, \(F^*\) involves \((a_1, x_1)\) and \(F\) does not.
Summing up probabilities that do not involve \(x_1\) as: \[ P := \begin{pmatrix} 1 & F_{\xi_1} & \cdots & F_{\xi_{M - 1}}\\ F_{\xi_1} & F_{\xi_1, \xi_1} & \cdots & F_{\xi_1, \xi_{M - 1}}\\ \vdots & \vdots & \ddots & \vdots\\ F_{\xi_{M - 1}} & F_{\xi_{M - 1}, \xi_1} & \cdots & F_{\xi_{M - 1}, \xi_{M - 1}} \end{pmatrix} \]
Summing up probabilities that involve \(x_1\) as: \[ P^* := \begin{pmatrix} k & F^*_{k, \xi_1} & \cdots & F^*_{k, \xi_{M - 1}}\\ F^*_{k, \xi_1} & F^*_{k, \xi_1, \xi_1} & \cdots & F^*_{k, \xi_1, \xi_{M - 1}}\\ \vdots & \vdots & \ddots & \vdots\\ F^*_{k, \xi_{M - 1}} & F^*_{k, \xi_{M - 1}, \xi_1} & \cdots & F^*_{k, \xi_{M - 1}, \xi_{M - 1}} \end{pmatrix}. \]

7.6.9 Sufficient Condition for Identification

Identification theorem:
Suppose that assumptions in 7.6.4 hold.
\(T \ge 3\).
There exist some \(\{\xi_1, \cdots, \xi_{M - 1}\} \in X^{M - 1}\) such that \(L\) is non-singular.
There exists \(k \in X\) such that \(\lambda_k^{*m} > 0\) for all \(m = 1, \cdots, M\) and \(\lambda_k^{*m} \neq \lambda_k^{*n}\) for any \(m \neq n\).
Then, \(\{\pi^m, \{\lambda_\xi^{*m}, \lambda_\xi^m\}_{\xi \in X}\}_{m = 1}^M\) is uniquely identified from \(\{\widetilde{P}(\{a_t, x_t\}_{t = 1}^3)\}\).
Because the assumptions of the above theorem refer to model parameters, the authors also derive sufficient conditions based on observables.
Corollary:
Suppose that assumptions in 7.6.4 hold.
\(T \ge 3\).
There exist some \(\{\xi_1, \cdots, \xi_{M - 1}\} \in X^{M - 1}\) and \(k \in X\) such that \(P\) is of full rank and that all the eigenvalues of \(P^{-1}P_k^*\) take distinct values.
Then, \(\{\pi^m, \{\lambda_\xi^{*m}, \lambda_\xi^m\}_{\xi \in X}\}_{m = 1}^M\) is uniquely identified from \(\{\widetilde{P}(\{a_t, x_t\}_{t = 1}^3)\}\).

7.6.10 Remarks on the Theorem

The condition says that \(L\) is non-singular.
- This implies that all columns in \(L\) must be linearly independent.
- In other words, the changes in covariate \(x\) must induce sufficient heterogeneous variations in the conditional choice probabilities across types.
The conditions says \(\lambda_k^{*m} > 0\) for all \(m\).
- If there is some \(m\) that is \(\lambda_k^m = 0\) for any \(k\), such type never shows up in the data.
The condition says \(\lambda_k^{*m} \neq \lambda_k^{*n}\) for any \(m \neq n\).
- This condition is satisfied if initial distribution is different across types.
- If this condition fails, the identification becomes severe.
- Actually, \(T = 3\) is not enough and \(T \ge 4\) becomes necessary.
The identification only requires one set of \(M - 1\) points \(\xi_1, \cdots, \xi_{M - 1}\) that satisfy the condition.
- Information from other points provide overidentifying restrictions.

7.6.11 Factorization Equations

Parameters \(L, V, D_k^*\) and data \(P, P_k^*\) are related through the following factorization equations (check manually): \[ P = L' V L, \] \[ P_k^* = L'D_k^*VL. \]
Note that \((1, 1)\)-th element of \(P = L'VL\) is \(\sum_{m = 1}^M \pi^m = 1\) and give no information.

7.6.12 Sketch of the Proof

Suppose that \(P\) is invertible or equivalently \(L\) is invertible.
Then, we have: \[ P^{-1} = L^{-1}V^{-1} L^{'-1}. \]
Therefore, we have: \[ \begin{split} P^{-1} P_k^* &= L^{-1}V^{-1} L^{'-1} L'D_k^*VL\\ &= L^{-1} V^{-1} D_k^* V L\\ &= L^{-1} D_k^* L, \end{split} \] where the third equality is because \(V\) and \(D_k^*\) are diagonal matrices.
This equation means that \(D_k^*\) is identified as the matrix of eigenvalues of \(P^{-1} P_k^*\).
Moreover, the columns of \(L^{-1}\) are identified as the eigen vectors of \(P^{-1} P_k^*\).
Finally, \(V = L^{'-1} P L^{-1}\) is identified because the right-hand side is now known.

7.6.13 Constructive Estimation

According to the above identification argument, we can consider a following constructive estimation procedure.

Estimate \(P\) and \(P_k^*\) non-parametrically.
By applying an eigenvalue decomposition algorithm to \(P^{-1} P_k^*\), identify \(D_k^*\) and \(L\).
Then identify \(V\).
Once conditional choice probability is identified, we can use the standard estimation method based on the CCP approach to each type.

7.6.14 Estimation by an EM Algorithm

Arcidiacono & Miller (2011) suggest to use an EM algorithm to estimate a dynamic decision model with unobserved heterogeneity.
Return to our original notation, and suppose that state \(s_t\) is partitioned into \(s_t := (x_t, w_t)\), where \(x_t\) is observed but \(w_t\) is unobserved to an econometrician.
Suppose that the transition probability is such that: \[ \begin{split} \mathbb{P}\{s_{t + 1}|s_t, a_t\} &= \mathbb{P}\{x_{t + 1}|x_t, w_t, a_t\} \mathbb{P}\{w_{t + 1}|w_t\} \\ &:= g(x_{t + 1}|x_t, w_t, a_t) h(w_{t + 1}|w_t). \end{split} \]
The initial distribution of \(w_1\) is \(h(w_1|x_1)\).
We assume that the model is identified and only consider estimation.
For example, in the previous finite-mixture model, we considered a case with \(h(w_{t + 1} = w'|w_{t} = w) = 1\{w' = w\}\) and provided a sufficient condition for identification.
Let \(\theta_1\) be the parameters in \(\pi\) and \(\theta_2\) be the parameters in \(g\).
Let \(\theta = (\theta_1, \theta_2)\).

7.6.15 Idea

Let \(q_{it}(w)\) be the probability that firm \(i\) is in unobserved state \(w\) in time \(t\).
The idea of the EM algorithm is:

Expectation step: Given a parameter \(\theta^{(r)}\), a conditional choice probability \(p^{(r)}(a|x, w)\), an initial distribution of the unobserved state variable \(h^{(r)}(w_1|x_1)\), and a transition probability of the unobserved state variable \(h^{(r)}(w'|w)\), update the probability that firm \(i\) is in unobserved state \(w\) in time \(t\) to \(q_{it}^{(r + 1)}(w)\), update the initial distribution of the unobserved state variable to \(h^{(r + 1)}(w|x_1)\), update the transition probability of the unobserved state variable to \(h^{(r + 1)}(w'|w)\), and update the conditional choice probability to \(p^{(r + 1)}(a|x, w)\).
Maximization step: Given the probability that firm \(i\) is in unobserved state \(w\) in time \(t\) to \(q_{it}^{(r + 1)}(w)\), the initial distribution of the unobserved state variable to \(h^{(r + 1)}(w|x_1)\), the transition probability of the unobserved state variable to \(h^{(r + 1)}(w'|w)\), and the conditional choice probability to \(p^{(r + 1)}(a|x, w)\), update the parameters to \(\theta^{(r + 1)}\).

And continue this until a convergence.
By doing so, we avoid integrating out the unobserved state variables to evaluate the likelihood function.

7.6.16 Optimal Conditional Choice Probability

We can still define the optimal conditional choice probability given a choice probability in the future as \(\varphi^{\theta, h}(p)\) because we just assumed that some of the state is not observed to an econometrician.
We just divided parameters in \(G\) into \(\theta_2\) and \(h\).

7.6.17 Likelihood Functions

The following likelihood functions can be calculated if we know \(\theta, h\), and \(p\).
The likelihood of observing \(\{a_{it}, x_{i, t + 1}\}\) conditional on \(x_{it}\) and \(w_{it}\) is: \[ L(a_{it}, x_{i, t + 1}|x_{it}, w_{it}; \theta, h, p) := \varphi^{\theta, h}(p)(a_{it}|x_{it}, w_{it}) g^{\theta_2}(x_{i, t + 1}|x_{it}, w_{it}, a_{it}). \]
The likelihood of observing \(\{a_{it}, x_{it}\}_{t = 1}^T\) conditional on \(x_{i1}\) is: \[ \begin{split} L(\{a_{it}, x_{it}\}_{t = 1}^T|x_{i1}, \theta, h, p) &:= \sum_{w_{i1} = 1}^W \cdots \sum_{w_{iT} = 1}^W h(w_{i1}|x_{i1}) L(a_{i1}, x_{i, 2}|x_{i1}, w_{i1}; \theta, h, p)\\ &\times \prod_{t = 2}^T h(w_{i, t + 1}| w_{it}) L(a_{it}, x_{i, t + 1}|x_{it}, w_{it}; \theta, h, p). \end{split} \]
The likelihood of having \(w_{it}\) in period \(t\) and having \(\{a_{it}, x_{it}\}_{t = 1}^T\) conditional on \(x_{i1}\) is: \[ \begin{split} L(w_{it}, \{a_{it}, x_{it}\}_{t = 1}^T|x_{1t}, \theta, h, p) &:= \sum_{w_{i1} = 1}^W \cdots \sum_{w_{i, t - 1} = 1}^W \sum_{w_{i, t + 1} = 1}^W \cdots \sum_{w_{iT} = 1}^W h(w_{i1}|x_{i1}) L(a_{i1}, x_{i, 2}|x_{i1}, w_{i1}; \theta, h, p)\\ &\times \prod_{t = 2}^T h(w_{i, t + 1}| w_{it}) L(a_{it}, x_{i, t + 1}|x_{it}, w_{it}; \theta, h, p). \end{split} \]
The likelihood of having \(w_{it}\) in period \(t\) conditional on \(\{a_{it}, x_{it}\}_{t = 1}^T\) is: \[ L(w_{it}|\{a_{it}, x_{it}\}_{t = 1}^T, \theta, h, p) := \frac{L(w_{it}, \{a_{it}, x_{it}\}_{t = 1}^T|x_{1t}, \theta, h, p)}{L(\{a_{it}, x_{it}\}_{t = 1}^T|x_{1t}, \theta, h, p)}. \]

7.6.18 Expectation Step

We have a parameter \(\theta^{(r)}\), a conditional choice probability \(p^{(r)}(a|x, w)\), an initial distribution of the unobserved state variable \(h^{(r)}(s_1|x_1)\), and a transition probability of the unobserved state variable \(h^{(r)}(w'|w)\).

Update the probability that firm \(i\) is in unobserved state \(w\) in time \(t\) to \(q_{it}^{(r + 1)}(w)\): \[ q_{it}^{(r + 1)}(w) := L(w|\{a_{it}, x_{it}\}_{t = 1}^T, \theta^{(r)}, h^{(r)}, p^{(r)}). \]
Update the initial distribution of the unobserved state variable to \(h^{(r + 1)}(w|x_1)\): \[ h^{(r + 1)}(w|x_1) := \frac{\sum_{i = 1}^N 1\{x_{i1} = x_1\} q_{i1}^{(r + 1)}(w)}{\sum_{i = 1}^N 1\{x_{i1} = x_1\}}. \]
Update the transition probability of the unobserved state variable to \(h^{(r + 1)}(w'|w)\): \[ h^{(r + 1)}(w'|w) := \frac{\sum_{i = 1}^N \sum_{t = 2}^T q_{i, t - 1}^{(r + 1)}(w) q_{it}^{(r + 1)}(w')}{\sum_{i = 1}^N \sum_{t = 2}^T q_{i, t - 1}^{(r + 1)}(w)}. \]
Update the conditional choice probability to \(p^{(r + 1)}(a|x, w)\): \[ p^{(r + 1)}(a|x, w) := \frac{\sum_{i = 1}^N \sum_{t = 1}^T q_{it}^{(r + 1)}(w) 1\{x_{it} = x\}1\{a_{it} = a\}}{\sum_{i = 1}^N \sum_{t = 1}^T q_{it}^{(r + 1)}(w) 1\{x_{it} = x\}}. \]

7.6.19 Maximization Step

We have the probability that firm \(i\) is in unobserved state \(w\) in time \(t\) to \(q_{it}^{(r + 1)}(w)\), the initial distribution of the unobserved state variable to \(h^{(r + 1)}(w|x_1)\), the transition probability of the unobserved state variable to \(h^{(r + 1)}(w'|w)\), and the conditional choice probability to \(p^{(r + 1)}(a|x, w)\).
Update parameters to \(\theta^{(r + 1)}\): \[ \theta^{(r + 1)} := \text{argmax}_{\theta} \sum_{i = 1}^N \sum_{t = 1}^T \sum_{w = 1}^W q_{it}^{(r + 1)}(w) \ln L(a_{it}, x_{i, t + 1}|x_{it}, w_{it}; \theta, h^{(r + 1)}, p^{(r + 1)}). \]
The maximization step can be either of a nested-fixed point algorithm or CCP approach.

References

Arcidiacono, P., & Miller, R. A. (2011). Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity. Econometrica, 79(6), 1823–1867.

Hotz, V. J., & Miller, R. A. (1993). Conditional Choice Probabilities and the Estimation of Dynamic Models. The Review of Economic Studies, 60(3), 497–529.

Kasahara, H., & Shimotsu, K. (2009). Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices. Econometrica, 77(1), 135–175.

Magnac, T., & Thesmar, D. (2002). Identifying Dynamic Discrete Decision Processes. Econometrica, 70(2,), 801–816.

Maskin, E., & Tirole, J. (1988a). A Theory of Dynamic Oligopoly, I: Overview and Quantity Competition with Large Fixed Costs. Econometrica, 56(3), 549.

Pesendorfer, M., & Schmidt-Dengler, P. (2008). Asymptotic Least Squares Estimators for Dynamic Games. Review of Economic Studies, 75(3), 901–928.

Rust, J. (1987). Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher. Econometrica, 55(5), 999.