Chapter 4 Demand Function Estimation
4.1 Motivations
From demand function and utility maximization assumption, we can reveal the preference of the decision maker.
Thus, estimating demand function is necessary for evaluating the consumer welfare.
In IO, estimating the price elasticity of demand is specifically important, because it determines the market power of a monopolist and the size of the deadweight loss.
In macroeconomics, estimating demand is in important to determine the price level, because the price level is the minimum expenditure for a consumer to achieve the certain level of utility.
In marketing, estimating demand is necessary to design the optimal pricing, advertising, and all the other marketing interventions.
In principle, the theory can be applied to whatever decisions other than the consumer choice.

 How do the hypothetical mergers in the readytoeat cereal industry affect the market price, markup, and consumer surplus?
 To do so, the authors estimate the demand for readytoeat cereals and the cost functions for each product. Then, the authors conduct counterfactual simulations of mergers to quantify the effects.

 To what extent do firms go abroad to access technology available in other locations?
 To study this issue, the authors estimate the firms’ locational choice when going abroad.

 In Yellow pages, how do consumers evaluate the advertisement on it, and how do advertisers value consumer usage?
 To study this, the author simultaneously estimate the consumer demand for usage of a directory, advertiser demand for advertising, and a publisher’s firstorder condition.

 Are online and print newspapers substitutes or complements?
 To study this, the author estimate a demand function in which online and print newspapers can be either substitutes or complements.
Bayer et al. (2007):
 How is the preference of people for schools and neighborhoods? How is this capitalized into housing prices?
 To do so, the authors estimate the discrete choice of residents over locations. To deal with the endogeneity between the neighborhood and the unobserved attributes of the location, the authors use the discontinuity at the school attendance zone.
Archak, Ghose, & Ipeirotis (2011):
 How does the information embedded in product reviews the consumer choice?
 To study this, the authors estimate the discrete choice model of consumers in which the text information from the product reviews are included as the product attributes.

 WalMart maintain high store density. How large is the economy of density and the sales cannibalization?
 To study this, the author first estimate the demand function across neighborhood WalMart to capture the sales cannibalization, and then estimate the cost structure from their entry and exit behaviors.

 Urban and rural areas differ in available products. How does the price difference change if the heterogeneity in the product availability is incorporated?
 To do so, the authors estimate the demand function at each location, and the construct the spatial price index based on the available products at each location.
4.2 Analyzing Consumer Behaviors
Alternative set.
Utility function.
 Add system of choice sets.
 Aad utility maximization.
\(\rightarrow\) Demand function.
In case of producer behavior, there was a chance to directly observe the output of the most primitive function, the production function.
In case of consumer behavior, we never directly observe the output of the most primitive function, the utility function.
We can at most identify demand functions.
Revealed preference theory:
 Samuelson (1938), Houthakker (1950), Richter (1966), Afriat (1967), Varian (1982).
 If the demand function is derived from a preference by maximizing the preference, the demand function should satisfy some restrictions.
 If the assumption is true, we can recover part of the preference from the demand function.
4.3 Continuous Choice
 The alternative set \(\mathcal{X}\) is a subset of \(\mathbb{R}^J\).
 The utility function \(u\) is rational, monotone, and continuous on \(\mathcal{X}\).
 The choice sets are given by a system of linear budget set: \[ \mathcal{B}(p, w) = \{q \in \mathcal{X}: p \cdot q \le w\}. \]
 If choice sets are nonlinear, the following duality approach needs to be modified.
4.3.1 Duality between Utility and Expenditure Functions
 It is rather a special case that we can derive a closed form solution to a utility maximization problem.
 We can use the firstorder conditions as moment conditions for identification. \[\begin{equation} \frac{\partial u(q)}{\partial q_i} = \lambda p_i, i = 1, \cdots, J. \end{equation}\]
 The derivation of a demand function from the identified utility function in general require a numerical simulation, which can be bothering.
 As well as the duality between production and cost functions, we have the same duality theorem for utility and expenditure functions.
 There is a onetoone mapping between a class of utility functions and a class of expenditure functions.
 Therefore, it is okay to start from an expenditure function.
 It is rare that we can recover the utility function associated with an expenditure function in a closed form. But it is not often required for analysis.
 Moreover, we can easily derive other important functions from the expenditure functions.
 Let \(p\) be the price vector and \(u\) be the target utility level.
 Let \(u(q)\) be a utility function.
 An expenditure function associated with the utility function is defined by: \[\begin{equation} e(u, p) = \min_{q} p \cdot q, u(q) \ge u. \end{equation}\]
 Let \(x\) be the total expenditure such that: \[\begin{equation} x = e(u, p). \end{equation}\]
 We can start the analysis by specifying this function instead of the utility function.
4.3.2 Deriving Other Functions
 It is easy to derive other functions from an expenditure function.
 Indirect utility function: invert the expenditure function to get: \[\begin{equation} u = e^{1}(p, x) \equiv v(p, x). \end{equation}\]
 Hicksian demand function: apply Shepard’s lemma: \[\begin{equation} q_i = \frac{\partial e(u, p)}{\partial p_i} \equiv h_i(u, p). \end{equation}\]
 Marshallian demand function: insert Hicksian demand function to the expenditure function: \[\begin{equation} q_i = h_i(v(p, x), p) \equiv d_i(p, x). \end{equation}\]
4.3.3 Starting from an Indirect Utility Function
 It is almost equivalent to start from an indirect utility function.
 An indirect utility function with the utility function is defined by: \[\begin{equation} v(p, x) \equiv \max_{q} u(q), p'q \le x. \end{equation}\]
 We can derive Marshallian demand function by Roy’s identity: \[\begin{equation} q_i = \frac{ \partial v(p, x)/\partial p_i}{\partial v(p, x)/\partial x} \equiv d_i(p, x). \end{equation}\]
4.3.5 Almost Ideal Demand System (AIDS)
 Based on Deaton & Muellbauer (1980).
 See Angus Deaton & John Muellbauer (1980) for further reference.
 Consider an expenditure function that satisfies the following useful conditions:
 It allows aggregation (this motivation is less important in recent days).
 It gives an arbitrary firstorder approximation to any demand system.
 It can satisfy the restrictions of utility maximization.
 It can be used to test the restrictions of utility maximization.
4.3.6 PIGLOG Class
 PIGLOG (priceindependent generalized logarithmic) class (Muellbauer, 1976). \[\begin{equation} \ln e(u, p) = (1  u) \ln a(p) + u\ln b(p), \end{equation}\] where \(a(p)\) and \(b(p)\) are arbitrary linear homogeneous concave functions.
 Consider households that differ in total income.
 PIGLOC form ensures that the aggregate demand can be written in the same form where the total income is replaced with the sum of household total income.
 The derivatives should be given free parameters for the model to be an arbitrary firstorder approximation to any demand system.
 In AIDS, we specify \(a(p)\) and \(b(p)\) as: \[\begin{equation} \begin{split} \ln a(p) &\equiv a_0 + \sum_{k} \alpha_k \ln p_k + \frac{1}{2}\sum_{k} \sum_{j} \gamma_{kj}^* \ln p_k \ln p_j\\ \ln b(p) &\equiv \ln a(p) + \beta_0 \prod_{k} p_k^{\beta_k}. \end{split} \end{equation}\]
4.3.9 Specify the Detail II
 It can satisfy the restrictions of utility maximization.
 It can be used to test the restrictions of utility maximization.
 \(\sum_{j} x_j = 1\): \[\begin{equation} \sum_{j} \alpha_j = 1, \sum_{j} \gamma_{jk} = 0, \sum_{j} \beta_j = 0. \end{equation}\]
 \(e(u, p)\) is linear homogeneous in \(p\): \[\begin{equation} \sum_{j} \gamma_{ij} = 0. \end{equation}\]
 Symmetry: \[\begin{equation} \gamma_{ij} = \gamma_{ji}. \end{equation}\]
4.3.10 Estimation
 We can estimate parameters based on the share equations.
 If we use aggregate data, the aggregate error term is correlated with the price vector.
 Therefore, we need at least as many instrumental variables as the dimension of the price vector.
 With valid instrumental variables, we can estimate the model with GMM.
 If we use householdlevel data, the householdspecific errors controlling for aggregate errors will not be correlated with the price vector if the price is determined in a competitive market.
4.3.11 From Product Space Approach to Characteristics Space Approach
 The framework up to here is called product space approach because the utility has been defined over a product space.
 When there are \(J\) goods, there are \(J^2\) parameters for prices.
 One way to resolve this issue is to introduce a priori knowledge about the preference.
 For example, we can introduce a priori segmentation with separability.
 It is hard to evaluate the effect of introducing new product.
 Again, we have to a priori decide which segment/product is similar to the new product.
 This leads us to the characteristics space approach (Lancaster, 1966; Muth, 1966):
 Consumption is an activity in which goods are inputs and in which the output is a collection of characteristics.
 Utility ranks collections of characteristics and only to rank collections of goods indirectly through the characteristics that they possesses.
 There are \(k = 1, \cdots, K\) activities.
 The activity \(y\) requires to consume \(x = A y\) products.
 The activity \(y\) generates \(z = B y\) characteristics.
 The budget constraint is \(p \cdot x \le 1\).
 The utility is defined over the characteristics \(u(z)\).
 The consumer’s problem is: \[ \max_y u(z) \] s.t. \[ p \cdot x \le 1, x = Ay, z = By, x, y, z \ge 0. \]
 Then, only the dimension of characteristics matters and the value of new products can be evaluated by the contribution to the production of characteristics.
 The early application includes Rosen (1974), Muellbauer (1974), Gorman (1980).
 The nonparametric analysis based on the reveals preference is Blow, Browning, & Crawford (2008).
4.3.12 From Continuous Choice Approach to Discrete Choice Approach
 The aggregate demand is a collection of choice across consumers and within consumers over time.
 It makes sense to model individual choices and then aggregate rather than directly modeling the aggregate demand.
 The resulting aggregate demand will satisfy restrictions that are consistent with the underlying consumer choice model.
 If there is an interaction across choices, the aggregation is not trivial.
 This is especially true when aggregating choices within consumers.
 For now, assume that each choice is independent.
4.4 Discrete Choice
4.4.1 Discrete Choice Approach
 Let \(u(q, z_i)\) be the utility of a consumer over \(J + 1\) dimensional consumption bundle \(q\) characterized by consumer characteristics \(z_i\).
 The consumer solves: \[\begin{equation} V(p, y_i, z_i) = \max_{q}u(q, z_i), \text{ s.t. } p'q \le y_i. \end{equation}\]
 Alternative \(0\) is an outside good.
 Normalize \(p_0 = 1\).
 We call alternatives \(j = 1, \cdots, J\) inside goods.
 The choice space is restricted on: \[\begin{equation} \begin{split} Q = \{q:& q_0 \in [0, M], q_j \in \{0, 1\}, j = 1, \cdots, J,\\ & q_j q_k = 0, \forall j \neq k, j, k > 0, M < \infty\}. \end{split} \end{equation}\]
4.4.2 Discrete Choice Approach
 The budget constraint reduces to: \[\begin{equation} \begin{cases} q_0 + p_j q_j = y &\text{ if } q_j = 1, j > 0\\ q_0 = y &\text{ otherwise}. \end{cases} \end{equation}\]
 Hence, \[\begin{equation} q_0 = y  \sum_{j = 1}^J p_j q_j. \end{equation}\]
4.4.3 Discrete Choice Approach
 The utility maximization problem can be written as: \[\begin{equation} V(p, y_i, z_i) = \max_{j = 0, 1, \cdots, j} v_j(p_j, y_i, z_i), \end{equation}\] where \[\begin{equation} \begin{split} &v_j(p_j, y_i, z_i)\\ & = \begin{cases} u(y_i  p_j, 0, \cdots, \underbrace{1}_{q_j}, \cdots, 0, z_i) &\text{ if }j > 0,\\ u(y_i, 0, \cdots, 0, z_i) &\text{ if }j = 0, \end{cases} \end{split} \end{equation}\] is called the .
4.4.4 Characteristics Space Approach
Preference is defined over the characteristics of alternatives, \(x_j\):
Car: vehicle, engine power, modelyear, car maker, etc.
PC: CPU power, number of cores, memory, HDD volume, etc.
The choicespecific indirect utility is a function of the characteristics of the alternative: \[\begin{equation} \begin{split} v_j(p_j, y_i, z_i) &=u(y_i  p_j, 0, \cdots, \underbrace{1}_{q_j}, \cdots, 0, z_i)\\ &= u^*(y_i  p_j, x_j, z_i)\\ &\equiv v(p_j, x_j, y_i, z_i). \end{split} \end{equation}\]
4.4.5 Weak Separability and Income Effect
 We usually focus on a particular product category such as cars, PCs, cereals, detergents, and so on.
 Assume that the preference is separable between the category in question (inside goods) and other categories (outside goods).
 \(u(q) = u[q_I, v(q_O)]\):
 \(q_I\): the consumption vector of inside goods.
 \(q_O\): the consumption vector of outside goods.
 Increasing in \(v_O = v(q_O)\).
 \(p = (p_I, p_O)\).
 \(p_I\): the price vector of inside goods.
 \(p_O\): the price vector of outside goods.
 When \(y_O\) is left for the outside goods, the conditional demand for the outside goods \(q_O(y_O, p_O)\) exists.
 Inserting this into the utility function gives: \[\begin{equation} u\{q_I, v[q_O(y_O, p_O)]\} \equiv \tilde{u}(q_I, y_O; p_O). \end{equation}\]
4.4.6 Weak Separability and Income Effect
 Thus, how the preference for the outside good is modeled determines how the individual income affects the choice. \[\begin{equation} \begin{split} &u(y_i  p_j, x_j, z_i) = \tilde{u}(x_j, z_i) + \alpha(y_i  p_j).\\ &u(y_i  p_j, x_j, z_i) = \tilde{u}(x_j, z_i) + \alpha \ln (y_i  p_j). \end{split} \end{equation}\]
 In the first example, the income level does not affect the choice because the term \(\alpha y_i\) is common and constant across choices (there is no income effect).
 We often do not observe income of a consumer, \(y_i\).
 Remember that the price of a product enters because we here consider indirect utility function.
4.4.7 Utility Function Normalization
 The location of utility function is often normalized by setting: \[\begin{equation} u(y^*, 0, \cdots, 0, z^*) = 0, \end{equation}\] for certain choice of \((y^*, z^*)\).
4.4.8 Aggregation of the Individual Demand
 Let \(q(p, x, y_i, z_i) = \{q_j(p, x, y_i, z_i)\}_{j = 0, \cdots, J}\) be the demand function of consumer \(i\), that is: \[\begin{equation} q_j(p, x, y_i, z_i) = 1 \Leftrightarrow j = \text{argmax}_{j = 0, 1, \cdots, j} v(p_j, x_j, y_i, z_i). \end{equation}\]
 Let \(f(y, z)\) be the joint distribution of the income and other consumer characteristics.
 The aggregate demand for good \(j\) is: \[\begin{equation} \sigma_j(p, x) \equiv N \int q_j(p, x, y, z) f(y, z) dy dz, \end{equation}\] where \(N\) is the population.
4.4.9 Horizontal Product Differentiation
 horizontal product differentiation: consumers do not agree on the ranking of the choices.
 There are two convenience stores \(j = 1, 2\) on a street \([0, 1]\).
 Let \(z_i\) be the location of consumer \(i\) and \(x_j\) be the location of the choice on a street \([0, 1]\) with \(x_1 < x_2\).
 A consumer has a preference such that: \[\begin{equation} v_{ij} \equiv v(p_j, x_j, y_i, z_i) \equiv s  t z_i  x_j  p_j. \end{equation}\]
4.4.10 Horizontal Product Differentiation
 Suppose that the prices are low enough that entire consumers on the street are willing to buy either from the stores.
 Consumer \(i\) buys from store \(1\) if and only if: \[\begin{equation} \begin{split} &v(p_1, x_1, y_i, z_i) \ge v(p_2, x_2, y_i, z_i)\\ &\Leftrightarrow s  t z_i  x_1  p_1 \ge s  t z_i  x_2 p_2\\ &\Leftrightarrow z_i \le \frac{p_2  p_1}{2 t} + \frac{x_1 + x_2}{2} \equiv \overline{z}_1(p_1, p_2). \end{split} \end{equation}\]
 Let \(f(z_i)\) be \(U[0, 1]\). Then, the aggregate demand for store 1 is: \[\begin{equation} \begin{split} \sigma_1(p, x) = N \int_{0}^{\overline{z}_1(p_1, p_2)} d z_i = N\overline{z}_1(p_1, p_2). \end{split} \end{equation}\]
4.4.11 Vertical Product Differentiation
 Vertical product differentiation: Consumers agree on the ranking of the choices. Consumers can have different willingness to pay.
 Timothy F. Bresnahan (1987) analyzed automobile demand with this framework.
 There are \(J\) goods and consumer \(i\) has a utility such as: \[\begin{equation} v_{ij} \equiv v(p_j, x_j, y_i, z_i) = z_i x_j  p_j, \end{equation}\] where \(x_j\) is a quality of product \(j\) and \(z_i\) is the consumer’s willingness to pay for the quality with \(x_j < x_{j + 1}\).
 Consumers’ problem is: \[\begin{equation} \max\{0, z_i x_1  p_1, \cdots, z_i x_J  p_J \}. \end{equation}\]
4.4.12 Vertical Product Differentiation
 Consumer \(i\) prefers good \(j + 1\) to good \(j\) if and only if: \[\begin{equation} \begin{split} &v(p_{j + 1}, x_{j + 1}, y_i, z_i) \ge v(p_j, x_j, y_i, z_i)\\ &\Leftrightarrow z_i x_{j + 1}  p_{j + 1} \ge z_i x_j  p_j\\ &\Leftrightarrow z_i \ge \frac{p_{j + 1}  p_j}{x_{j + 1}  x_j} \equiv \Delta_j. \end{split} \end{equation}\]
 So consumer \(i\) purchases good \(j\) if and only if \(z_i \in [\Delta_{j  1}, \Delta_j)\) and buys nothing if: \[\begin{equation} z_i \le \Delta_0 \equiv \min\{p_1/x_1, \cdots p_J/x_j\}. \end{equation}\]
 Letting \(F(z)\) be the distribution function of \(z\), the aggregate demand for good \(j\) is: \[\begin{equation} \sigma_j(p, x, z) = N[F(\Delta_{j})  F(\Delta_{j  1})]. \end{equation}\]
4.4.13 Econometric Models
 So far there was no econometrics.
 Next we define what are observable and unobservable, and what are known and unknown.
 Then consider how to identify and estimate the model.
4.4.14 Multinomial Logit Model: Preference Shock
 This originates at D. L. Mcfadden (1974).
 See Train (2009) for reference.
 Suppose that there is some unobservable component in consumer characteristics.
 In reality, consumers choice change somewhat randomly.
 Let’s capture such a preference shock by consider the following model: \[\begin{equation} v(p_j, x_j, y_i, z_i) + \epsilon_{ij}, \end{equation}\] with some random vector: \[\begin{equation} \epsilon_i \equiv (\epsilon_{i0}, \cdots, \epsilon_{iJ})' \sim G. \end{equation}\]
 At this point, \(G\) can be any distribution and the shocks can be dependent across \(j\) within \(i\).
 \(p, x, y_i, z_i\) are observed but \(\epsilon_{ij}\) are unobserved.
 When the realization of the preference shock is given, the consumer choice is: \[ q_j(p, x, y_i, z_i, \epsilon_{i}) \equiv 1\{j = \text{argmax}_{k = 0, \cdots, J} v(p_k, x_k, y_i, z_i) + \epsilon_{ik}\} \] for \(k = 0, \cdots, J\).
 The choice probability as observed by econometrician is: \[ \sigma_j(p, x, y_i, z_i) \equiv \int q_j(p, x, y_i, z_i, \epsilon_{i}) dG(\epsilon_i). \]
4.4.15 Multinomial Logit Model: Distributional Assumption
Now assume the followings:
\(\epsilon_{ij}\) are independent across \(j\): \(G(\epsilon_i) = \prod_{j = 0, \cdots, J} G_j(\epsilon_{ij})\).
\(\epsilon_{ij}\) are identical across \(j\): \(G_j(\epsilon_{ij}) = \overline{G}(\epsilon_{ij})\).
\(\overline{G}\) is a typeI extreme value.
\(\rightarrow\) The density \(g(\epsilon_{ij}) = \exp[\exp(\epsilon_{ij})  \epsilon_{ij}]\).
This is called the (homoskedastic) multinomial logit model.
Setting the variance of \(\epsilon_{ij}\) at 1 for some \(j\) is a scale normalization.
By dropping some of the assumptions, we can have heteroskedastic multinomial logit model, generalized extreme value model, and so on.
Another popular distribution assumption is to assume a multivariate normal distribution of \(\epsilon_i\). This case is called the multinomial probit model.
4.4.16 Multinomial Logit Model: Choice Probability
 The of consumer \(i\) of good \(j\) is: \[\begin{equation} \begin{split} \sigma_j(p, x, y_i, z_i) & \equiv \mathbb{P}\{j = \text{argmax}_{k = 0, 1, \cdots, J} v(p_k, x_k, y_i, z_i) + \epsilon_{ik} \}\\ &=\mathbb{P}\{v(p_j, x_j, y_i, z_i)  v(p_k, x_k, y_i, z_i) \ge \epsilon_{ik}  \epsilon_{ij}, \forall k \neq j\}\\ & = \text{...after some algebra: leave as an exercise...}\\ &= \frac{\exp[v(p_j, x_j, y_i, z_i) ]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i)] }. \end{split} \end{equation}\]
 For example, if: \[\begin{equation} v(p_k, x_k, y_i, z_i) = \beta_i'x_k + \alpha_i (y_i  p_k), \end{equation}\]
\[\begin{equation} \begin{pmatrix} \beta_i \\ \alpha_i \end{pmatrix} = \begin{pmatrix} \beta_0 \\ \alpha_0 \end{pmatrix} + \begin{pmatrix} \Gamma\\ \pi' \end{pmatrix} z_i. \end{equation}\]  Then, we have: \[\begin{equation} \begin{split} \sigma_{j}(p, x, y_i, z_i) &= \frac{\exp[\beta_i'x_j + \alpha_i (y_i  p_j) ]}{\sum_{k = 0}^J \exp[\beta_i'x_k + \alpha_i (y_i  p_k) ]}\\ &= \frac{\exp[\beta_i'x_j  \alpha_i p_j]}{\sum_{k = 0}^J \exp[\beta_i'x_k  \alpha_i p_k]} \end{split} \end{equation}\]  If we normalize the characteristics vector so that \(w_0 = 0\) holds for the outside option, it becomes: \[ \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[\beta_i'x_j  \alpha_i p_j]}{1 + \sum_{k = 1}^J \exp[\beta_i'x_k  \alpha_i p_k]} \]
4.4.17 Multinomial Logit Model: Inclusive Value
 The expected utility for consumer \(i\) before the preference shocks are drawn under multinomial logit model is given by: \[\begin{equation} \begin{split} &\mathbb{E}\{\max_{j = 0, \cdots, J} v(p_j, x_j, y_i, z_i) + \epsilon_{ij}\} \\ &= \text{ ...after some algebra: leave as an exercise...}\\ &= \ln \Bigg\{\sum_{j = 0}^J \exp[v(p_j, x_j, y_i, z_i)] \Bigg\} + constant. \end{split} \end{equation}\]
 This is sometimes called the inclusive value of the choice set.
4.4.18 Maximum Likelihood Estimation of Multinomial Logit Model
 Suppose we observe a sequence of income \(y_i\), consumer characteristics \(z_i\), choice \(q_{i}\), product characteristics \(x_j\) and price \(p_j\).
 \(q_i = (q_{i0}, \cdots, q_{iJ})'\) and \(q_{ij} = 1\) if \(j\) is chosen and \(0\) otherwise.
 The parameter of interest is the mean indirect utility function \(v\).
 Then the log likelihood of \(\{q_i\}_{i = 1}^N\) conditional on \(\{y_i, z_i\}_{i = 1}^N\) and \(\{x_j,p_j\}_{j = 1}^J\) is: \[\begin{equation} \begin{split} l(v; q, y, z, w) &= \sum_{i = 1}^N \ln \mathbb{P}\{q_i = q(p, x, y_i, z_i)p, x, y_i, z_i\}\\ & = \sum_{i = 1}^N \log \Bigg\{ \prod_{j = 0}^{J} \sigma_{j}(p, x, y_i, z_i)^{q_{ij}} \Bigg\}\\ &= \sum_{i = 1}^N \sum_{j = 0}^J \log \sigma_{j}(p, x, y_i, z_i)^{q_{ij}}. \end{split} \end{equation}\]
 We can estimate the parameters by finding parameters that maximize the log likelihood.
4.4.19 Nonlinear Least Square Estimation of Multinomial Logit Model
The multinomial logit model can be estimated by nonlinear least square method as well.
Suppose that the share of product \(j\) among consumers with characteristics \(z\) and income \(y\) was: \[\begin{equation} \sigma_j(p, x, y, z). \end{equation}\]
Note that: \[\begin{equation} \begin{split} \ln \sigma_{j}(p, x, y, z) &= \ln \Bigg\{ \frac{\exp[v(p_j, x_j, y, z) ]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y, z)] } \Bigg\}\\ &= v(p_j, x_j, y, z)  \ln\Bigg\{ \sum_{k = 0}^J \exp[v(p_k, x_k, y, z)] \Bigg\}. \end{split} \end{equation}\]
Moreover, because of the location normalization of the utility function, \[\begin{equation} \sigma_{0}(p, x, y, z) = \frac{1}{\sum_{k = 0}^J \exp[v(p_j, x_k, y, z)] }. \end{equation}\]
Hence, \[\begin{equation} \ln \sigma_{j}(p, x, y, z)  \ln \sigma_{0}(p, x, y, z) = v(p, x_j, y, z). \end{equation}\]
The lefthand variables are observed in the data.
Let \(s_j(y, z)\) be the share of product \(j\) among consumers with characteristics \(z\) and income \(y\) in the data.
This can be calculated from the consumerlevel data.
More importantly, if there is the total sales data for each demographic, we can use this approach.
Then, we can estimate the parameter by NLLS such that: \[\begin{equation} \min \sum_{(y, z)} \sum_{j = 1}^J \{\ln[s_{j}(y, z)/s_{0}(y, z)]  v(p_j, x_j, y, z)\}^2. \end{equation}\]
If \(v\) is linear in parameter, it is the ordinal least squares: \[\begin{equation} v(p_j, x_j, y_m) = \beta_i' x_j  \alpha_i p_j. \end{equation}\]
\[\begin{equation} \ln[s_{j}(y, z)/s_{0}(y, z)] = \beta_i' x_j  \alpha_i p_j. \end{equation}\]
4.4.20 IIA Problem
 Multinomial logit problem is intuitive and easy to implement.
 However, there are several problems in the model.
 The most important problem is the problem.
 Notice that: \[\begin{equation} \frac{\sigma_j(p, x, y, z)}{\sigma_{k}(p, x, y, z)} = \frac{\exp[v(p_j, x_j, y, z)]}{\exp[v(p_k, x_k, y, z)]}. \end{equation}\]
 The ratio of choice probabilities between two alternatives depend only on the mean indirect utility of these two alternatives and independent of irrelevant alternatives (IIA).
 Why is this a problem?
4.4.21 Blue Bus and Red Bus Problem
 Suppose that you can go to a town by bus or by train.
 Half of commuters use a bus and the other half use a train.
 The existing bus was blue. Now, the county introduced a red bus, which is identical to the existing blue bus.
 No one take care of the color of bus. So the mean indirect utility of blue bus and red bus are equal.
 What is the new share across blue bus, red bus, and train?
 IIA \(\to\) share of blue bus = share of train.
 Buses are identical \(\to\) share of blue bus = share of red bus.
 Therefore, shares have to be 1/3, respectively.
 But shouldn’t it be that train keeps half share and bus have half share in total?
4.4.22 Restrictive Price Elasticity
 IIA property restrict price elasticities in an unfavorable manner.
 This is a serious problem because the main purpose for us to estimate demand functions is to identify the price elasticity.
 Let \(v(p_j, x_j, y, z) = \beta_z'x_j  \alpha_z p_j\). Then, we have: \[\begin{equation} e_{jk} = \begin{cases} \alpha p_{j} (1  \sigma_j(p, x, y, z)) &\text{ if } k = j\\ \alpha p_{k} \sigma_k(p_k, x_k, y, z) &\text{ if } k \neq j. \end{cases} \end{equation}\]
 The price elasticity is completely determined by the existing choice probabilities of the relevant alternatives.
 Suppose that there are coca cola, Pepsi cola, and a coffee.
 The shares were 1/2, 1/6, 1/3, respectively.
 Suppose that the price of coca cola increased.
 We expect that they instead purchase Pepsi cola because Pepsi cola is more similar to coca cola than coffee.
 However, according to the previous result, twice more consumers substitute to coffee rather than to Pepsi cola.
4.4.23 Monotonic Inclusive Value
 Suppose that there is a good whose mean indirect utility is \(v\).
 The inclusive value for this choice set is \(\ln[1 + \exp(v)]\).
 Suppose that we put \(J\) same goods on the shelf and consumer can choose any of them.
 The inclusive value is \(\ln[1 + J \exp(v)]\).
 We just added the same goods. But the expected utility of consumer increases monotonically in the number of alternatives.
4.4.24 The Source of the Problem
 The source of the problem is that there is no correlation in the preference shock across products.
 When the preference shock to coca cola is high, the preference shock to Pepsi cola should be high, while the preference shock to coffee should be relatively independent.
 Because the expected value of the maximum of the preference shocks increases according to the number of alternatives, the inclusive value becomes increasing in the number of alternatives.
 However, the preference shocks should be the same for the same good. Then, the the expected value of the maximum of the preference shock should not increase even if we add the same products on the shelf.
4.4.25 Correlation in Preference Shocks
 Therefore, the preference shock should be such that: preference shocks between two alternative should be more correlated when they are closer in the characteristics space.
 So we have to allow the covariance matrix of the preference shock to be free parameters.
 If we allow flexible covariance matrix, the curse of dimensionality in the number of alternatives comes back: The dimensionality of the covariance matrix is \(J^2\).
 Another way is to remove \(\epsilon_{ij}\): it is called a pure characteristics model (Steven Berry & Pakes, 2007).
 But the pure characteristics model is computationally not straightforward.
 We explore the way of introducing mild correlation across similar products in the preference shocks.
4.4.26 Observed and Unobserved Consumer Heterogeneity
 Consider beverage demand and let \(x_j = \text{carbonated}_j\) and \(z_i = \text{teenager}_i\).
 Suppose that the mean indirect utility is: \[\begin{equation} v(p_j, x_j, y_i, z_i) = \beta_i (\text{carbonated})_j  \alpha_i p_j, \end{equation}\]
\[\begin{equation} \beta_i = 0.1 + 0.2 \cdot (\text{teenager})_i. \end{equation}\]
The mean utility of a carbonated drink for a teenager is 0.3 but only 0.1 for others.
When coca cola was not available, teenager will substitute more to Pepsi cola than nonteenagers.
IIA holds at the marketsegment level but not at the market level.
How to avoid IIA at the marketsegment level?: Introduce unobserved consumer heterogeneity.
Suppose that the mean indirect utility is: \[\begin{equation} \beta_i = 0.1 + 0.2 \cdot (\text{teenager})_i + \nu_i. \end{equation}\]
Consumers with high \(\nu_i\) values carbonated drinks more than those with low \(\nu_i\) values.
When coca cola was not available, consumers with high \(\nu_i\) will substitute more to Pepsi cola than those with low \(\nu_i\) values.
IIA holds at the marketsegment\(\nu\) level but not at the marketsegment level.
In the above example, “\(0.2 \cdot (\text{carbonated})_i\)” captures the consumer heterogeneity by observed characteristics and “\(\nu_i\)” by unobserved characteristics.
4.4.27 Mixed Logit Model
 Suppose that the mean indirect utility is: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j  \alpha_i p_j, \end{equation}\] with \[\begin{equation} (\beta_i, \alpha_i) \sim f(\beta_i, \alpha_iy_i, z_i). \end{equation}\]
 If \(\epsilon_{ij}\) is drawn i.i.d. from typeI extreme value distribution, the choice probability of good \(j\) by consumer \(i\) conditional on \(p, x, y_i, z_i\) is: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \int_{\beta_i, \alpha_i} \frac{\exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_iy_i, z_i) d\beta_i d\alpha_i. \end{equation}\]
 This is called the mixedlogit model.
 If the distribution of \(\epsilon_{ij}\) is different, it is no longer mixed logit.
 Conditional on \((\beta_i, \alpha_i)\) the choice probability is written in the same way with the multinomial logit model.
 \(\beta_i, \alpha_i\) are marginal out, because econometrician does not observe them.
4.4.28 Mixed Logit Model : Parametric Assumptions
 It is often assumed that: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j  \alpha_i p_j. \end{equation}\]
 McFadden & Train (2000) showed that any discrete choice models that are consistent with the random utility maximization can be arbitrarily closely approximated by this class of mixedlogit model.
 The distribution of \(\beta_i\) and \(\alpha_i\) is often assumed to be: \[\begin{equation} \begin{split} &\beta_i = \beta_0 + \Gamma z_i + \Sigma \nu_i,\\ &\alpha_i = \alpha_0 + \pi' z_i + \omega \upsilon_i, \end{split} \end{equation}\] where \(\nu_i\) and \(\upsilon_i\) are i.i.d. standard normal random vectors.
4.4.29 Mixed Logit Model: IIA
 There is no IIA at the marketsegment level: \[\begin{equation} \frac{\sigma_{j}(p, x, y, z)}{\sigma_{l}(p, x, y_i, z_i)} = \frac{\int_{\beta_i, \alpha_i} \frac{\exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_iy_i) d\beta_i d\alpha_i}{\int_{\beta_i, \alpha_i} \frac{\exp[v(p_l, x_l, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_iy_i) d\beta_i d\alpha_i}. \end{equation}\]
 The share ratio depends on the price and characteristics of all the other products.
4.4.30 Mixed Logit Moel: Price Elasticities
 Let: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j  \alpha_i p_j. \end{equation}\]
 The price elasticities of the choice probabilities conditional on \(p, x, y_i, z_i\) is: \[\begin{equation} e_{jk} = \begin{cases} \frac{p_j}{\sigma_j} \int \alpha_i \sigma_{ij}(1  \sigma_{ij})f(\beta_i, \alpha_iy_i, z_i) d\beta_i d\alpha_i &\text{ if } j = k\\ \frac{p_k}{\sigma_j} \int \alpha_i \sigma_{ij} \sigma_{ik} f(\beta_i, \alpha_iy_i, z_i) d\beta_i d\alpha_i &\text{ otherwise}, \end{cases} \end{equation}\] where \[\begin{equation} \sigma_{ij} = \frac{\exp(\beta_i'x_j  \alpha_i p_j)}{\sum_{k = 0}^J \exp(\beta_i'x_k  \alpha_i p_k)}. \end{equation}\]
 The price elasticity depends on the density of unobserved consumer types.
4.4.31 Simulated Maximum Likelihood Estimation of the Mixed Logit Model
 The choice probability of the mixed logit model is an integration of the multinomial logit choice probability.
 This is not derived analytically in general.
 We can use simulation to evaluate the choice probability:
 Draw \(R\) values of \(\beta\) and \(\alpha\), \(\{\beta^r, \alpha^r \}_{r = 1}^R\).
 Compute the multinomial choice probabilities associated with \((\beta^r, \alpha^r)\) for each \(r = 1, \cdots, R\).
 Approximate the choice probability with the mean of the simulated multinomial choice share: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) \approx \hat{\sigma}_{j}(p, x, y_i, z_i) \equiv \frac{1}{R} \sum_{r = 1}^R \frac{\exp[v(p_j, x_j, y_i, z_i, \beta^r, \alpha^r)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta^r, \alpha^r)]}. \end{equation}\]
 This is one of the numerical integration: Monte Carlo integration.
 Another approach is to use quadrature. See Judd (1998) for reference.
4.4.32 Simulated Maximum Likelihood Estimation of the Mixed Logit Model
There are \(t = 1, \cdots, T\) markets and there \(i = 1, \cdots, N\) consumers in each market.
Let \(\mathcal{J}_t\) be the set of products that are available in market \(t\).
Suppose that we observe income \(y_{it}\), characteristics \(z_{it}\), and choice \(q_{it}\) for each consumer in a market.
Suppose that we observe product characteristics \(x_{jt}\) and price \(p_{jt}\) of each product in each market.
The simulated conditional log likelihood is: \[\begin{equation} \begin{split} &\sum_{i = 1}^N \sum_{t = 1}^T \ln \mathbb{P}\{q_{it} = q(p_t, x_t, y_{it}, z_{it})p_t, x_t, y_{it}, z_{it}\} \\ &\approx \sum_{i = 1}^N \ln \Bigg\{ \prod_{j \in \mathcal{J}_t \cup \{0\}} \hat{\sigma}_{j}(p_t, x_t, y_{it}, z_{it})^{q_{itj}} \Bigg\}. \end{split} \end{equation}\]
We find parameters that maximize the simulated conditional log likelihood.
4.4.33 Simulated Nonlinear Least Square Estimation of the Mixed Logit Model
 Suppose that we only know the sales or share at the marketsegment level.
 That is, we only observe the share of product \(j\) in market \(t\) among consumers of characteristics \(z\) and income \(y\), \(s_{jt}(y, z)\).
 Then we can estimate the parameter by: \[\begin{equation} \min \sum_{t = 1}^T \sum_{j \in \mathcal{J}_t \cup \{0\}} \sum_{(y, z) \in \mathcal{Y} \times \mathcal{Z}} \{s_{jt}(y, z)  \hat{\sigma}_{j}(p_t, x_t, y, z)\}^2. \end{equation}\]
4.4.34 Nested Logit Model: A Special Case of Mixed Logit Model
 Let \(w_{j1}, \cdots, w_{jG}\) be the indicator of product category, i.e., \(w_{jg}\) takes value 1 if good \(j\) belong to category \(g\) and 0 otherwise.
 e.g., car category = {Sports, Luxury, Large, Midsize, Small}.
 We have: \[\begin{equation} v(p, x_j, y_i, z_i) = \beta'x_j  \alpha_i p_j + \sum_{g = 1}^G \zeta_{ig} w_{jg} + \epsilon_{ij}. \end{equation}\]
 If \(\zeta_{ig}\) takes high value, the consumer attaches higher value to the category.
 When a product in category \(g\) was not available, consumers with high \(\zeta_{ig}\) will substitute more to the other products in the same category than consumers with low \(\zeta_{ig}\).
4.4.35 Nested Logit Model: Distributional Assumption
 Let \[\begin{equation} \varepsilon_{ij} \equiv \sum_{g = 1}^G \zeta_{ig} w_{jg} + \epsilon_{ij}. \end{equation}\]
 Under certain distributional assumption on \(\zeta_{ig}\) and \(\epsilon_{ij}\), the term \(\varepsilon_{ij}\) have a cumulative distribution (Cardell, 1997): \[\begin{equation} F(\varepsilon_i) = \exp\Bigg\{ \sum_{g = 1}^G \Bigg(\sum_{j \in \text{ category } g} \exp[\varepsilon_{ij}/\lambda_g] \Bigg)^{\lambda_g} \Bigg\}. \end{equation}\]
4.4.36 Nested Logit Model: Choice Probability
 Under this distributional assumption, the choice probability is: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g] \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g  1}}{\sum_{g = 1}^G \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g}}, \end{equation}\] if good \(j\) belongs to category \(g\).
 The higher \(\lambda_g \in [0, 1]\) implies lower correlation within category \(g\).
 \(\lambda_g = 1\) for all \(g\) coincides with the multinomial logit model.
4.4.37 Nested Logit Model: Decomposition of the Choice Probability
 The choice probability can be decomposed into two parts: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g]}{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]} \frac{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]^{\lambda_g}}{\sum_{g = 1}^G \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g}}. \end{equation}\]
 Letting: \[ I_{g}(p, x, y_i, z_i) \equiv \log \sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g], \] we have: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g]}{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]} \frac{\exp[\lambda_g I_{g}(p, x, y_i, z_i)]}{\sum_{g = 1}^G \exp[\lambda_g I_{g}(p, x, y_i, z_i)]}. \end{equation}\]
 The second first term can be interpreted as the probability of choosing product \(j\) conditional on choosing category \(g\) and the second term as the probability of choosing category \(g\).
4.4.38 Discrete Choice Model with Unobserved Fixed Effects
 We have assumed that good \(j\) is characterized by a vector of observed characteristics \(x_j\).
 Can econometrician observe all the relevant characteristics of the products in the choice set? Maybe no. For example, econometrician may not observe brand values that are created by advertisement and recognized by consumers.
 Such unobserved product characteristics is likely to be correlated with the price.
 This can cause endogeneity problems.
 In the following, we consider the situation where only marketsegment level share data is available.
 Because we can construct the marketshare level data from individual choice level data, all the arguments should go through with the individual choice level data.
4.4.39 Unobserved Fixed Effects in Multinomial Logit Model
 To fix the idea, let’s revisit the multinomial logit model.
 For now, we do not consider either observed or unobserved consumer heterogeneity.
 Including observed heterogeneity is straightforward.
 We discuss how to include unobserved heterogeneity in the subsequent sections.
 Suppose that the indirect utility function of good \(j\) for consumer \(i\) in market \(t\) is: \[\begin{equation} \beta' x_{jt}  \alpha p_{jt} + \xi_{jt} + \epsilon_{ijt}, \end{equation}\]
 \(\epsilon_{ijt}\) is i.i.d. TypeI extreme value.
 \(\xi_{jt}\) is the unobserved productmarketspecific fixed effect of product \(j\) in market \(t\), which can be correlated with \(p_{jt}\).
 We hold the assumption that \(x_{jt}\) is uncorrelated with \(\xi_{jt}\).
 The choice probability of good \(j\) for this consumer and hence the choice share in this market is:
\[\begin{equation} \sigma_j(p_t, x_t, \xi_t) = \frac{\exp(\beta' x_j  \alpha p_{jt} + \xi_{jt})}{1 + \sum_{k = 1}^J\exp(\beta' x_k  \alpha p_{kt} + \xi_{kt} ) }. \end{equation}\]  How to deal with the endogeneity between \(p_{jt}\) and \(\xi_{jt}\)?
4.4.40 Instrumental Variables and Inversion
 Suppose that we have a vector of instrumental variables \(w_{jt}\) such that: \[\begin{equation} \mathbb{E}\{\xi_{jt}w_{jt}\} = 0. \end{equation}\]
 In a liner model, we invert the model for the unobserved fixed effects: \[\begin{equation} \xi_{jt} = y_{jt}  \beta'x_{jt}, \end{equation}\]
 Notice that the unobserved fixed effect is written as a function of parameters and data.
 Then we exploit the moment condition by: \[\begin{equation} \begin{split} &\mathbb{E}\{\xi_{jt}w_{jt}\} = 0,\\ &\Rightarrow \mathbb{E}\{ \xi_{jt} w_{jt}\} = 0,\\ &\Leftrightarrow \mathbb{E}\{(y_{jt}  \beta'x_{jt}) w_{jt} \} = 0 \end{split} \end{equation}\]
 We can estimate \(\beta\) by finding the value that makes the sample analogue of the above expectation zero.
4.4.41 Inversion in Multinomial Logit Model
 Can we invert the multinomial model for \(\xi_{jt}\)?
 We have: \[\begin{equation} \begin{split} &\ln [\sigma_{jt}(p_t, x_t, \xi_t) / \sigma_{0t}(p_t, x_t, \xi_t)] = \beta' x_j  \alpha p_{jt} + \xi_{jt}\\ &\Leftrightarrow \xi_{jt} = \ln [\sigma_j(p_t, x_t, \xi_t) / \sigma_0(p_t, x_t, \xi_t)]  [\beta' x_j  \alpha p_{jt}]. \end{split} \end{equation}\]
 Therefore, the moment condition can be written as: \[\begin{equation} \begin{split} &\mathbb{E}\{\xi_{jt}w_{jt}\} = 0,\\ &\Rightarrow \mathbb{E}\{\xi_{jt} w_{jt}\} = 0,\\ &\Leftrightarrow \mathbb{E}\{(\ln [\sigma_{jt}(p_t, x_t, \xi_t) / \sigma_{0t}(p_t, x_t, \xi_t)]  [\beta' x_j  \alpha p_{jt}]) w_{jt} \} = 0. \end{split} \end{equation}\]
 We can evaluate the sample analogue of the expectation by replacing the theoretical choice probability \(\sigma\) with the observed share \(s\).
 At the end, it is no different from the linear model where the dependent variable is \(\ln s_{jt}/s_{0t}\).
4.4.42 Marketinvariant Productspecific Fixed Effects
 Furthermore, if you can assume \(\xi_{jt} = \xi_j\), then \[\begin{equation} \ln [\sigma_j(p_t, x_t, \xi_t) / \sigma_0(p_t, x_t, \xi_t)] = \beta' x_{jt}  \alpha p_{jt} + \xi_{j}. \end{equation}\]
 This is nothing but a linear regression on \(x_j\) and \(p_{jt}\) with productspecific unobserved fixed effect.
 This can be estimated by a withinestimator.
 This specification is a good starting point: we better start with the simplest specification and use the estimate as the initial guess for the following specifications.
4.4.43 Unobserved Consumer Heterogeneity and Unobserved Fixed Effects in Mixedlogit Model
 So far we abstracted away from the unobserved consumer heterogeneity.
 Next, suppose that the indirect utility function of good \(j\) for consumer \(i\) in market \(t\) is: \[\begin{equation} \beta_i' x_{jt}  \alpha_i p_{jt} + \xi_{jt} + \epsilon_{ijt}, \end{equation}\] where \(\epsilon_{ik}\) is i.i.d. TypeI extreme value.
 The coefficient are drawn according to: \[\begin{equation} \begin{split} &\beta_{it} = \beta_0 + \Sigma \nu_{it},\\ &\alpha_{it} = \alpha_0 + \Omega \upsilon_{it}, \end{split} \end{equation}\]
 \(\nu_i\) are i.i.d. standard normal random variables.
 Then the indirect utility of good \(j\) for consumer \(i\) in market \(t\) is written as: \[\begin{equation} \underbrace{\beta_0' x_{jt}  \alpha_0 p_{jt} + \xi_{jt}}_{\text{(conditional) mean}} + \underbrace{\nu_{it}' \Sigma x_{jt}  \upsilon_{it}' \Omega p_{jt}}_{\text{deviation from the mean}} \end{equation}\]
 We refer to \(\beta_0, \alpha_0\) as linear parameters and \(\Sigma, \Omega\) as nonlinear parameters, because of the reason I explain in the subsequent section.
 Let \(\theta_1\) be the linear parameters and \(\theta_2\) the nonlinear parameters and let \(\theta = (\theta_1', \theta_2')'\).
4.4.44 Unobserved Fixed Effects in Mixedlogit Model
 The choice share of good \(j\) in market \(t\) is: \[\begin{equation} \begin{split} &\sigma_{j}(p_t, x_t, \xi_t; \theta)\\ &= \int \frac{\exp[\beta_0' x_{jt}  \alpha_0 p_{jt} + \xi_{jt} + \nu_{it}' \Sigma x_{jt}  \upsilon_{it}' \Omega p_{jt}]}{1 + \sum_{k \in \mathcal{J}_t} \exp[\exp[\beta_0' x_{kt}  \alpha_0 p_{kt} + \xi_{kt} + \nu_{it}' \Sigma x_{kt}  \upsilon_{it}' \Omega p_{kt}]]} f(\nu, \upsilon) d \nu d \upsilon. \end{split} \end{equation}\]
 How can we represent \(\xi_{jt}\) as a function of parameters of interest to exploit the moment condition?
4.4.45 Representing \(\xi_{jt}\) as a Function of Parameters of Interest
 Let \(s_{jt}\) be the share of product \(j\) in market \(t\).
 The following system of equations implicitly determines \(\xi_{jt}\) as a function of parameters of interest: \[\begin{equation} s_{jt} = \sigma_j(p_t, x_t, \xi_t; \theta). \end{equation}\]
 Let \(\xi_{jt}(\theta)\) is the solution to the system of equations above given parameter \(\theta\).
 If it exists, it is the unobserved heterogeneity as a function of parameters and data.
 Does this solution exist?
 Is it unique?
 Is there efficient method to find the solution?
4.4.46 Summarizing the Conditional Mean Term
 Now, let \(\delta_{jt}\) be the conditional mean term in the indirect utility: \[\begin{equation} \delta_{jt} \equiv \beta_0' x_{jt}  \alpha_0 p_{jt} + \xi_{jt}. \end{equation}\]
 I call it the average utility of the product in the market.
 Then, the choice share of product \(j\) in market \(t\) is written as: \[\begin{equation} \begin{split} &\sigma_{jt}(\delta_t, \theta_2) \\ &\equiv \int \frac{\exp\Bigg(\delta_{jt} + \nu' \Sigma x_{jt}  \upsilon' \Omega p_{jt}\Bigg)}{1 + \sum_{k \in \mathcal{J}_t} \exp\Bigg(\delta_{kt} + \nu' \Sigma x_{kt}  \upsilon' \Omega p_{kt}\Bigg)} f(\nu, \upsilon) d\nu d\upsilon, \end{split} \end{equation}\] for \(j = 1, \cdots, J, t = 1, \cdots, T\).
4.4.47 Contraction Mapping for \(\delta_t\).
 Now, fix \(\theta_2\) and define an operator \(T\) such that: \[\begin{equation} T_t(\delta_t) = \delta_t + \ln \underbrace{s_{t}}_{\text{data}}  \ln \underbrace{\sigma_{t}(\delta_t, \theta_2)}_{\text{model}}, \end{equation}\] where \(\delta_t = (\delta_{1t}, \cdots, \delta_{Jt})'\), \(s_t = (s_{1t}, \cdots, s_{Jt})'\) and \(\sigma_t = (\sigma_{1t}, \cdots, \sigma_{Jt})'\).
 Let \(\delta_t^{(0)} = (\delta_{1t}^{(0)}, \cdots, \delta_{Jt}^{(0)})'\) be an arbitrary starting vector of average utility of products in a market.
 Using the operator above, we update \(\delta_{t}^{(r)}\) by: \[\begin{equation} \delta_{t}^{(r + 1)} = T_t(\delta_{t}^{(r)}) = \delta_t^{(r)} + \ln s_{t}  \ln \sigma_{t}(\delta_t^{(r)}, \theta_2), \end{equation}\] for \(r = 0, 1, \cdots\).
 Steven Berry, Levinsohn, & Pakes (1995) proved that \(T_t\) as specified above is a contraction mapping with modulus less than one.
 This means that:
 \(T_t\) has a unique fixed point;
 For arbitrary \(\delta_t^{(r)}\), \(\lim_{r \to \infty} T_t^r(\delta_t^{(0)})\) is the unique fixed point.
 The fixed point of \(T_t\) is \(\delta_t^*\) such that \(\delta_t^* = T_t(\delta_t^*)\), i.e., \[\begin{equation} \begin{split} &\delta_t^* = \delta_t^* + \ln s_{t}  \ln \sigma_{t}(\delta_t^*, \theta_2),\\ &\Leftrightarrow s_{t} = \sigma_{t}(\delta_t^*, \theta_2). \end{split} \end{equation}\]
 So, the fixed point \(\delta_t^*\) is the conditional mean indirect utility that solves the equality given nonlinear parameter \(\theta_2\).
 Moreover, the solution is unique.
 Moreover, it can be found by iterating the operator.
 Let \(\delta_t(\theta_2)\) be the solution to this equation, i.e., the limit of this operation.
 The above result is useful because it ensures the inversion and provides the algorithm to find the solution.
 The invertibility itself holds under more general settings (Steven Berry, Gandhi, & Haile, 2013).
4.4.48 Solving for \(\xi_{jt}(\theta)\)
 We defined the average utility as: \[\begin{equation} \delta_{jt} = \beta_0' x_{jt}  \alpha_0 p_{jt} + \xi_{jt}. \end{equation}\]
 Hence, if we set: \[\begin{equation} \xi_{jt}(\theta) \equiv \delta_{jt}(\theta_2)  \Bigg[\beta_0' x_{jt}  \alpha_0 p_{jt} \Bigg], \end{equation}\] the \(\xi_{jt}(\theta)\) solves the equality: \[\begin{equation} s_{jt} = \sigma_{j}(p_t, x, \xi_t; \theta). \end{equation}\]
4.4.49 Solving for \(\xi_{jt}(\theta)\): Summary
 In summary, \(\xi_{jt}\) that solves the equality exists and unique, and can be computed by:
 Fix \(\theta = \{\theta_1, \theta_2\}\).
 Fix arbitrary starting value \(\delta_t^{(0)}\) for \(t = 1, \cdots, T\).
 Let \(\delta_t(\theta_2)\) be the limit of \(T_t^r(\delta_t^{(0)})\) for \(r = 0, 1, \cdots\) for each \(t = 1, \cdots, T\).
 Stop the iteration if \(\delta_t(\theta_2)^{(r + 1)}  \delta_t(\theta_2)^{(r)}\) is below a threshold.
 Let \(\xi_{jt}(\theta)\) be such that: \[\begin{equation} \xi_{jt}(\theta) = \delta_{jt}(\theta_2)  \beta_0' x_{jt}  \alpha_0 p_{jt}. \end{equation}\]
 Then we can evaluate the moment at \(\theta\) by: \[\begin{equation} \mathbb{E}\{\xi_{jt}(\theta)w_{jt}\} = 0. \end{equation}\]
 We run this algorithm every time we evaluate the moment condition at a parameter value.
4.4.50 GMM Objective Function
 Find \(\theta\) that solves: \[\begin{equation} \min_{\theta} \xi(\theta)' W \Phi^{1} W' \xi(\theta), \end{equation}\] where \(\Phi\) is a weight matrix, \[\begin{equation} \xi(\theta) = \begin{pmatrix} \xi_{11}(\theta)\\ \vdots\\ \xi_{J_1 1}(\theta)\\ \vdots\\ \xi_{1T} \\ \vdots\\ \xi_{J_T T} \end{pmatrix}, W = \begin{pmatrix} w_{11}' \\ \vdots \\ w_{J_11}' \\ \vdots \\ w_{1T}' \\ \vdots \\ w_{J_TT}' \\ \end{pmatrix}. \end{equation}\]
 There are \(J \to \infty\) and \(T \to \infty\) asymptotics. Either is fine to consistently estimate the parameters.
 \(w_{jt} = (x_{jt}', w_{jt}^*)'\) where \(w_{jt}^*\) is an excluded instrument that is relevant to \(p_{jt}\).
4.4.51 Estimating Linear Parameters
 The firstorder condition for \(\theta_1\) is: \[\begin{equation} \theta_1 = (X_1'W \Phi^{1} W'X_1)^{1} X_1' W \Phi^{1} W' \delta(\theta_2), \end{equation}\] where \[\begin{equation} X_1 = \begin{pmatrix} x_{11}' &  p_{11}\\ \vdots & \vdots \\ x_{J_1 1}' &  p_{J_1 1}\\ \vdots & \vdots \\ x_{1T}' &  p_{1T}\\ \vdots & \vdots \\ x_{J_T T} &  p_{J_T T} \end{pmatrix}, \delta(\theta_2) = \begin{pmatrix} \delta_1(\theta_2)\\ \vdots\\ \delta_T(\theta_2) \end{pmatrix} \end{equation}\].
 If \(\theta_2\) is given, the optimal \(\theta_1\) is computed by the above formula.
 \(\rightarrow\) We only have to search over \(\theta_2\).
 This is the reason why we called \(\theta_1\) linear parameters and \(\theta_2\) nonlinear parameters.
4.4.52 BLP Algorithm
 Find \(\theta_2\) that maximizes the GMM objective function.
 To do so:
 Pick up \(\theta_2\).
 Compute \(\delta(\theta_2)\) by the fixedpoint algorithm.
 Compute associated \(\theta_1\) by the formula: \[\begin{equation} \theta_1 = (X_1'W \Phi^{1} W'X_1)^{1} X_1' W \Phi^{1} W' \delta(\theta_2), \end{equation}\]
 Compute \(\xi(\theta)\) from the above \(\delta(\theta_2)\) and \(\theta_1\).
 Evaluate the GMM objective function with the \(\xi(\theta)\).
4.4.53 Mathematical Program with Equilibrium Constraints (MPEC)
 In the BLP algorithm, for each parameter \(\theta\), find \(\xi(\theta)\) that solve: \[\begin{equation} s = \sigma(p, x, \xi; \theta) \end{equation}\] by the fixedpoint algorithm and then evaluate the GMM objective function.
 This inner loop takes time if the stopping criterion is tight.
 If the stopping criterion is loose, the loop may stop earlier but the error may be unacceptably large.
 Dubé, Fox, & Su (2012) suggest to minimize the GMM objective function with the above equation as the constraints. \[\begin{equation} \min_{\theta} \xi(\theta)' W \Phi^{1} W' \xi(\theta) \text{ s.t. } s = \sigma(p, x, \xi; \theta). \end{equation}\]
 To enjoy the benefit of this approach, we have to analytically derive the gradient and hessian of the objective function and the constraints, which are anyway needed if we estimate the standard error with the plugin method.
 If the problem is of small scale, BLP algorithm will be fast enough and easier to implement.
 If the problem is of large scale, you may better use the MPEC approach.
4.4.54 Instrumental Variables
 The remaining problem is how to choose the excluded instrumental variable \(w_{jt}^*\) for each product/market.
 Cost shifters:
 Traditional instruments.
 Hausmantype IV (Hausman, Leonard, & Zona, 1994):
 Assume that demand shocks are independent across markets, whereas the cost shocks are correlated.
 The latter will be true if the product is produced by the same manufacturer.
 Then, the price of the same product in the other markets \(p_{j, t}\) will be valid instruments for the price of the product in a given market, \(p_{jt}\).
 BLPtype IV (Steven Berry, Levinsohn, & Pakes, 1995):
 In oligopoly, the price of a good in a market depends on the market structure, i.e., what kind of products are available in the market.
 For example, if there are similar products in the market, the price will tend to be lower.
 Then, the product characteristics of other products in the market , will be valid instrument for the price of goods in a given market, \(p_{jt}\).
 If there are multiproduct firms, whether the other good is owned by the same company will also affect the price.
 Specifically, Steven Berry, Levinsohn, & Pakes (1995) use:
\[\begin{equation} \sum_{k \neq j \in \mathcal{J}_t \cap \mathcal{F}_{f}} x_{kt}, \end{equation}\]
\[\begin{equation} \sum_{k \neq j \in \mathcal{J}_t \setminus \mathcal{F}_{f}} x_{kt}. \end{equation}\]  \(f\) is the firm that owns product \(j\) and \(\mathcal{F}_{f}\) is the set of products firm \(f\) owns.
 Differentiation IV (Gandhi & Houde, 2015):
 Let \(d_{jkt} = d(x_{jt}, x_{kt})\) be some distance between product characteristics.
 They showed that under certain conditions the optimal BLPtype IV is a function \(d_{jt}\{d_{jkt}\}_{k \neq j \in \mathcal{J}_t}\).
 The suggest to use the moments of \(d_{jt}\) as the excluded instrument variables.
 Weak instruments problem of BLPtype IV:
 Armstrong (2016) argued that estimates based on BLPtype IV may be inconsistent when \(J \times \infty\) asymptotics is considered, because then the market approaches the competitive market and the correlation between the markup and the product characteristics of the rivals disappear.
 Specifically, the estimator is inconsistent if all of the following conditions are met:
 \(J \to \infty\) but \(T\) is fixed;
 The demand/cost functions are such that the correlation between markups and characteristics of other products decreases quickly enough as \(J \to \infty\).
 There is no cost instruments or other sources of identification.