# Chapter 4 Demand Function Estimation

## 4.1 Motivations

From demand function and utility maximization assumption, we can reveal the preference of the decision maker.

Thus, estimating demand function is necessary for

**evaluating the consumer welfare**.In IO, estimating the

**price elasticity of demand**is specifically important, because it determines the**market power**of a monopolist and the size of the dead-weight loss.In macroeconomics, estimating demand is in important to determine the

**price level**, because the price level is the minimum expenditure for a consumer to achieve the certain level of utility.In marketing, estimating demand is necessary to design the optimal pricing, advertising, and all the other marketing interventions.

In principle, the theory can be applied to whatever decisions other than the consumer choice.

Nevo (2000):

- How do the hypothetical mergers in the ready-to-eat cereal industry affect the market price, markup, and consumer surplus?
- To do so, the authors estimate the demand for ready-to-eat cereals and the cost functions for each product. Then, the authors conduct counterfactual simulations of mergers to quantify the effects.

Chung & Alcácer (2002):

- To what extent do firms go abroad to access technology available in other locations?
- To study this issue, the authors estimate the firms’ locational choice when going abroad.

Rysman (2004):

- In Yellow pages, how do consumers evaluate the advertisement on it, and how do advertisers value consumer usage?
- To study this, the author simultaneously estimate the consumer demand for usage of a directory, advertiser demand for advertising, and a publisher’s first-order condition.

Gentzow (2004):

- Are online and print newspapers substitutes or complements?
- To study this, the author estimate a demand function in which online and print newspapers can be either substitutes or complements.

Bayer et al. (2007):

- How is the preference of people for schools and neighborhoods? How is this capitalized into housing prices?
- To do so, the authors estimate the discrete choice of residents over locations. To deal with the endogeneity between the neighborhood and the unobserved attributes of the location, the authors use the discontinuity at the school attendance zone.

Archak, Ghose, & Ipeirotis (2011):

- How does the information embedded in product reviews the consumer choice?
- To study this, the authors estimate the discrete choice model of consumers in which the text information from the product reviews are included as the product attributes.

Holmes (2011):

- Wal-Mart maintain high store density. How large is the economy of density and the sales cannibalization?
- To study this, the author first estimate the demand function across neighborhood Wal-Mart to capture the sales cannibalization, and then estimate the cost structure from their entry and exit behaviors.

Handbury & Weinstein (2014):

- Urban and rural areas differ in available products. How does the price difference change if the heterogeneity in the product availability is incorporated?
- To do so, the authors estimate the demand function at each location, and the construct the spatial price index based on the available products at each location.

## 4.2 Analyzing Consumer Behaviors

**Alternative set**.**Utility function**.- Add system of
**choice sets**. - Aad utility maximization.

- Add system of
\(\rightarrow\)

**Demand function**.In case of producer behavior, there was a chance to directly observe the output of the most primitive function, the production function.

In case of consumer behavior, we never directly observe the output of the most primitive function, the utility function.

We can at most identify demand functions.

**Revealed preference theory**:- Samuelson (1938), Houthakker (1950), Richter (1966), Afriat (1967), Varian (1982).
- If the demand function is derived from a preference by maximizing the preference, the demand function should satisfy some restrictions.
- If the assumption is true, we can recover part of the preference from the demand function.

## 4.3 Continuous Choice

- The alternative set \(\mathcal{X}\) is a subset of \(\mathbb{R}^J\).
- The utility function \(u\) is rational, monotone, and continuous on \(\mathcal{X}\).
- The choice sets are given by a system of
**linear budget set**: \[ \mathcal{B}(p, w) = \{q \in \mathcal{X}: p \cdot q \le w\}. \] - If choice sets are non-linear, the following duality approach needs to be modified.

### 4.3.1 Duality between Utility and Expenditure Functions

- It is rather a special case that we can derive a closed form solution to a utility maximization problem.
- We can use the first-order conditions as moment conditions for identification. \[\begin{equation} \frac{\partial u(q)}{\partial q_i} = \lambda p_i, i = 1, \cdots, J. \end{equation}\]
- The derivation of a demand function from the identified utility function in general require a numerical simulation, which can be bothering.
- As well as the duality between production and cost functions, we have the same duality theorem for utility and expenditure functions.
- There is a one-to-one mapping between a class of utility functions and a class of expenditure functions.
- Therefore, it is okay to start from an expenditure function.
- It is rare that we can recover the utility function associated with an expenditure function in a closed form. But it is not often required for analysis.
- Moreover, we can easily derive other important functions from the expenditure functions.
- Let \(p\) be the price vector and \(u\) be the target utility level.
- Let \(u(q)\) be a utility function.
- An expenditure function associated with the utility function is defined by: \[\begin{equation} e(u, p) = \min_{q} p \cdot q, u(q) \ge u. \end{equation}\]
- Let \(x\) be the total expenditure such that: \[\begin{equation} x = e(u, p). \end{equation}\]
- We can start the analysis by specifying this function instead of the utility function.

### 4.3.2 Deriving Other Functions

- It is easy to derive other functions from an expenditure function.
**Indirect utility function**: invert the expenditure function to get: \[\begin{equation} u = e^{-1}(p, x) \equiv v(p, x). \end{equation}\]**Hicksian demand function**: apply Shepard’s lemma: \[\begin{equation} q_i = \frac{\partial e(u, p)}{\partial p_i} \equiv h_i(u, p). \end{equation}\]**Marshallian demand function**: insert Hicksian demand function to the expenditure function: \[\begin{equation} q_i = h_i(v(p, x), p) \equiv d_i(p, x). \end{equation}\]

### 4.3.3 Starting from an Indirect Utility Function

- It is almost equivalent to start from an indirect utility function.
- An indirect utility function with the utility function is defined by: \[\begin{equation} v(p, x) \equiv \max_{q} u(q), p'q \le x. \end{equation}\]
- We can derive Marshallian demand function by Roy’s identity: \[\begin{equation} q_i = \frac{- \partial v(p, x)/\partial p_i}{\partial v(p, x)/\partial x} \equiv d_i(p, x). \end{equation}\]

### 4.3.5 Almost Ideal Demand System (AIDS)

- Based on Deaton & Muellbauer (1980).
- See Angus Deaton & John Muellbauer (1980) for further reference.
- Consider an expenditure function that satisfies the following useful conditions:
- It allows aggregation (this motivation is less important in recent days).
- It gives an arbitrary first-order approximation to any demand system.
- It can satisfy the restrictions of utility maximization.
- It can be used to test the restrictions of utility maximization.

### 4.3.6 PIGLOG Class

- PIGLOG (price-independent generalized logarithmic) class (Muellbauer, 1976). \[\begin{equation} \ln e(u, p) = (1 - u) \ln a(p) + u\ln b(p), \end{equation}\] where \(a(p)\) and \(b(p)\) are arbitrary linear homogeneous concave functions.
- Consider households that differ in total income.
- PIGLOC form ensures that the aggregate demand can be written in the same form where the total income is replaced with the sum of household total income.
- The derivatives should be given free parameters for the model to be an arbitrary first-order approximation to any demand system.
- In AIDS, we specify \(a(p)\) and \(b(p)\) as: \[\begin{equation} \begin{split} \ln a(p) &\equiv a_0 + \sum_{k} \alpha_k \ln p_k + \frac{1}{2}\sum_{k} \sum_{j} \gamma_{kj}^* \ln p_k \ln p_j\\ \ln b(p) &\equiv \ln a(p) + \beta_0 \prod_{k} p_k^{\beta_k}. \end{split} \end{equation}\]

### 4.3.9 Specify the Detail II

- It can satisfy the restrictions of utility maximization.
- It can be used to test the restrictions of utility maximization.
- \(\sum_{j} x_j = 1\): \[\begin{equation} \sum_{j} \alpha_j = 1, \sum_{j} \gamma_{jk} = 0, \sum_{j} \beta_j = 0. \end{equation}\]
- \(e(u, p)\) is linear homogeneous in \(p\): \[\begin{equation} \sum_{j} \gamma_{ij} = 0. \end{equation}\]
- Symmetry: \[\begin{equation} \gamma_{ij} = \gamma_{ji}. \end{equation}\]

### 4.3.10 Estimation

- We can estimate parameters based on the share equations.
- If we use aggregate data, the aggregate error term is correlated with the price vector.
- Therefore, we need at least as many instrumental variables as the dimension of the price vector.
- With valid instrumental variables, we can estimate the model with GMM.
- If we use household-level data, the household-specific errors controlling for aggregate errors will not be correlated with the price vector if the price is determined in a competitive market.

### 4.3.11 From Product Space Approach to Characteristics Space Approach

- The framework up to here is called
**product space approach**because the utility has been defined over a product space. - When there are \(J\) goods, there are \(J^2\) parameters for prices.
- One way to resolve this issue is to introduce a priori knowledge about the preference.
- For example, we can introduce a priori segmentation with separability.
- It is hard to evaluate the effect of introducing new product.
- Again, we have to a priori decide which segment/product is similar to the new product.
- This leads us to the
**characteristics space approach**(Lancaster, 1966; Muth, 1966):- Consumption is an activity in which goods are inputs and in which the output is a collection of characteristics.
- Utility ranks collections of characteristics and only to rank collections of goods indirectly through the characteristics that they possesses.
- There are \(k = 1, \cdots, K\) activities.
- The activity \(y\) requires to consume \(x = A y\) products.
- The activity \(y\) generates \(z = B y\) characteristics.
- The budget constraint is \(p \cdot x \le 1\).
- The utility is defined over the characteristics \(u(z)\).
- The consumer’s problem is: \[ \max_y u(z) \] s.t. \[ p \cdot x \le 1, x = Ay, z = By, x, y, z \ge 0. \]

- Then, only the dimension of characteristics matters and the value of new products can be evaluated by the contribution to the production of characteristics.
- The early application includes Rosen (1974), Muellbauer (1974), Gorman (1980).
- The nonparametric analysis based on the reveals preference is Blow, Browning, & Crawford (2008).

### 4.3.12 From Continuous Choice Approach to Discrete Choice Approach

- The aggregate demand is a collection of choice across consumers and within consumers over time.
- It makes sense to model individual choices and then aggregate rather than directly modeling the aggregate demand.
- The resulting aggregate demand will satisfy restrictions that are consistent with the underlying consumer choice model.
- If there is an interaction across choices, the aggregation is not trivial.
- This is especially true when aggregating choices within consumers.
- For now, assume that each choice is independent.

## 4.4 Discrete Choice

### 4.4.1 Discrete Choice Approach

- Let \(u(q, z_i)\) be the utility of a consumer over \(J + 1\) dimensional consumption bundle \(q\) characterized by consumer characteristics \(z_i\).
- The consumer solves: \[\begin{equation} V(p, y_i, z_i) = \max_{q}u(q, z_i), \text{ s.t. } p'q \le y_i. \end{equation}\]
- Alternative \(0\) is an
**outside good**. - Normalize \(p_0 = 1\).
- We call alternatives \(j = 1, \cdots, J\)
**inside goods**. - The choice space is restricted on: \[\begin{equation} \begin{split} Q = \{q:& q_0 \in [0, M], q_j \in \{0, 1\}, j = 1, \cdots, J,\\ & q_j q_k = 0, \forall j \neq k, j, k > 0, M < \infty\}. \end{split} \end{equation}\]

### 4.4.2 Discrete Choice Approach

- The budget constraint reduces to: \[\begin{equation} \begin{cases} q_0 + p_j q_j = y &\text{ if } q_j = 1, j > 0\\ q_0 = y &\text{ otherwise}. \end{cases} \end{equation}\]
- Hence, \[\begin{equation} q_0 = y - \sum_{j = 1}^J p_j q_j. \end{equation}\]

### 4.4.3 Discrete Choice Approach

- The utility maximization problem can be written as:
\[\begin{equation}
V(p, y_i, z_i) = \max_{j = 0, 1, \cdots, j} v_j(p_j, y_i, z_i),
\end{equation}\]
where
\[\begin{equation}
\begin{split}
&v_j(p_j, y_i, z_i)\\
& =
\begin{cases}
u(y_i - p_j, 0, \cdots, \underbrace{1}_{q_j}, \cdots, 0, z_i) &\text{ if }j > 0,\\
u(y_i, 0, \cdots, 0, z_i) &\text{ if }j = 0,
\end{cases}
\end{split}
\end{equation}\]
is called the
**choice-specific indirect utility**.

### 4.4.4 Characteristics Space Approach

Preference is defined over the characteristics of alternatives, \(x_j\):

Car: vehicle, engine power, model-year, car maker, etc.

PC: CPU power, number of cores, memory, HDD volume, etc.

The choice-specific indirect utility is a function of the characteristics of the alternative: \[\begin{equation} \begin{split} v_j(p_j, y_i, z_i) &=u(y_i - p_j, 0, \cdots, \underbrace{1}_{q_j}, \cdots, 0, z_i)\\ &= u^*(y_i - p_j, x_j, z_i)\\ &\equiv v(p_j, x_j, y_i, z_i). \end{split} \end{equation}\]

### 4.4.5 Weak Separability and Income Effect

- We usually focus on a particular product category such as cars, PCs, cereals, detergents, and so on.
- Assume that the preference is separable between the category in question (
**inside goods**) and other categories (**outside goods**). - \(u(q) = u[q_I, v(q_O)]\):
- \(q_I\): the consumption vector of inside goods.
- \(q_O\): the consumption vector of outside goods.
- Increasing in \(v_O = v(q_O)\).

- \(p = (p_I, p_O)\).
- \(p_I\): the price vector of inside goods.
- \(p_O\): the price vector of outside goods.

- When \(y_O\) is left for the outside goods, the conditional demand for the outside goods \(q_O(y_O, p_O)\) exists.
- Inserting this into the utility function gives: \[\begin{equation} u\{q_I, v[q_O(y_O, p_O)]\} \equiv \tilde{u}(q_I, y_O; p_O). \end{equation}\]

### 4.4.6 Weak Separability and Income Effect

- Thus, how the preference for the outside good is modeled determines how the individual income affects the choice. \[\begin{equation} \begin{split} &u(y_i - p_j, x_j, z_i) = \tilde{u}(x_j, z_i) + \alpha(y_i - p_j).\\ &u(y_i - p_j, x_j, z_i) = \tilde{u}(x_j, z_i) + \alpha \ln (y_i - p_j). \end{split} \end{equation}\]
- In the first example, the income level does not affect the choice because the term \(\alpha y_i\) is common and constant across choices (there is no income effect).
- We often do not observe income of a consumer, \(y_i\).
- Remember that the price of a product enters because we here consider
**indirect**utility function.

### 4.4.7 Utility Function Normalization

- The
**location**of utility function is often normalized by setting: \[\begin{equation} u(y^*, 0, \cdots, 0, z^*) = 0, \end{equation}\] for certain choice of \((y^*, z^*)\).

### 4.4.8 Aggregation of the Individual Demand

- Let \(q(p, x, y_i, z_i) = \{q_j(p, x, y_i, z_i)\}_{j = 0, \cdots, J}\) be the demand function of consumer \(i\), that is: \[\begin{equation} q_j(p, x, y_i, z_i) = 1 \Leftrightarrow j = \text{argmax}_{j = 0, 1, \cdots, j} v(p_j, x_j, y_i, z_i). \end{equation}\]
- Let \(f(y, z)\) be the joint distribution of the income and other consumer characteristics.
- The aggregate demand for good \(j\) is: \[\begin{equation} \sigma_j(p, x) \equiv N \int q_j(p, x, y, z) f(y, z) dy dz, \end{equation}\] where \(N\) is the population.

### 4.4.9 Horizontal Product Differentiation

**horizontal product differentiation**: consumers do not agree on the ranking of the choices.- There are two convenience stores \(j = 1, 2\) on a street \([0, 1]\).
- Let \(z_i\) be the location of consumer \(i\) and \(x_j\) be the location of the choice on a street \([0, 1]\) with \(x_1 < x_2\).
- A consumer has a preference such that: \[\begin{equation} v_{ij} \equiv v(p_j, x_j, y_i, z_i) \equiv s - t |z_i - x_j| - p_j. \end{equation}\]

### 4.4.10 Horizontal Product Differentiation

- Suppose that the prices are low enough that entire consumers on the street are willing to buy either from the stores.
- Consumer \(i\) buys from store \(1\) if and only if: \[\begin{equation} \begin{split} &v(p_1, x_1, y_i, z_i) \ge v(p_2, x_2, y_i, z_i)\\ &\Leftrightarrow s - t |z_i - x_1| - p_1 \ge s - t |z_i - x_2|- p_2\\ &\Leftrightarrow z_i \le \frac{p_2 - p_1}{2 t} + \frac{x_1 + x_2}{2} \equiv \overline{z}_1(p_1, p_2). \end{split} \end{equation}\]
- Let \(f(z_i)\) be \(U[0, 1]\). Then, the aggregate demand for store 1 is: \[\begin{equation} \begin{split} \sigma_1(p, x) = N \int_{0}^{\overline{z}_1(p_1, p_2)} d z_i = N\overline{z}_1(p_1, p_2). \end{split} \end{equation}\]

### 4.4.11 Vertical Product Differentiation

**Vertical product differentiation**: Consumers agree on the ranking of the choices. Consumers can have different willingness to pay.- Timothy F. Bresnahan (1987) analyzed automobile demand with this framework.
- There are \(J\) goods and consumer \(i\) has a utility such as: \[\begin{equation} v_{ij} \equiv v(p_j, x_j, y_i, z_i) = z_i x_j - p_j, \end{equation}\] where \(x_j\) is a quality of product \(j\) and \(z_i\) is the consumer’s willingness to pay for the quality with \(x_j < x_{j + 1}\).
- Consumers’ problem is: \[\begin{equation} \max\{0, z_i x_1 - p_1, \cdots, z_i x_J - p_J \}. \end{equation}\]

### 4.4.12 Vertical Product Differentiation

- Consumer \(i\) prefers good \(j + 1\) to good \(j\) if and only if: \[\begin{equation} \begin{split} &v(p_{j + 1}, x_{j + 1}, y_i, z_i) \ge v(p_j, x_j, y_i, z_i)\\ &\Leftrightarrow z_i x_{j + 1} - p_{j + 1} \ge z_i x_j - p_j\\ &\Leftrightarrow z_i \ge \frac{p_{j + 1} - p_j}{x_{j + 1} - x_j} \equiv \Delta_j. \end{split} \end{equation}\]
- So consumer \(i\) purchases good \(j\) if and only if \(z_i \in [\Delta_{j - 1}, \Delta_j)\) and buys nothing if: \[\begin{equation} z_i \le \Delta_0 \equiv \min\{p_1/x_1, \cdots p_J/x_j\}. \end{equation}\]
- Letting \(F(z)\) be the distribution function of \(z\), the aggregate demand for good \(j\) is: \[\begin{equation} \sigma_j(p, x, z) = N[F(\Delta_{j}) - F(\Delta_{j - 1})]. \end{equation}\]

### 4.4.13 Econometric Models

- So far there was no econometrics.
- Next we define what are observable and unobservable, and what are known and unknown.
- Then consider how to identify and estimate the model.

### 4.4.14 Multinomial Logit Model: Preference Shock

- This originates at D. L. Mcfadden (1974).
- See Train (2009) for reference.
- Suppose that there is some unobservable component in consumer characteristics.
- In reality, consumers choice change somewhat randomly.
- Let’s capture such a
**preference shock**by consider the following model: \[\begin{equation} v(p_j, x_j, y_i, z_i) + \epsilon_{ij}, \end{equation}\] with some random vector: \[\begin{equation} \epsilon_i \equiv (\epsilon_{i0}, \cdots, \epsilon_{iJ})' \sim G. \end{equation}\] - At this point, \(G\) can be any distribution and the shocks can be dependent across \(j\) within \(i\).
- \(p, x, y_i, z_i\) are
**observed**but \(\epsilon_{ij}\) are**unobserved**. - When the realization of the preference shock is given, the consumer choice is: \[ q_j(p, x, y_i, z_i, \epsilon_{i}) \equiv 1\{j = \text{argmax}_{k = 0, \cdots, J} v(p_k, x_k, y_i, z_i) + \epsilon_{ik}\} \] for \(k = 0, \cdots, J\).
- The
**choice probability**as observed by econometrician is: \[ \sigma_j(p, x, y_i, z_i) \equiv \int q_j(p, x, y_i, z_i, \epsilon_{i}) dG(\epsilon_i). \]

### 4.4.15 Multinomial Logit Model: Distributional Assumption

Now assume the followings:

\(\epsilon_{ij}\) are independent across \(j\): \(G(\epsilon_i) = \prod_{j = 0, \cdots, J} G_j(\epsilon_{ij})\).

\(\epsilon_{ij}\) are identical across \(j\): \(G_j(\epsilon_{ij}) = \overline{G}(\epsilon_{ij})\).

\(\overline{G}\) is a type-I extreme value.

\(\rightarrow\) The density \(g(\epsilon_{ij}) = \exp[-\exp(-\epsilon_{ij}) - \epsilon_{ij}]\).

This is called the (homoskedastic)

**multinomial logit model**.Setting the variance of \(\epsilon_{ij}\) at 1 for some \(j\) is a

**scale**normalization.By dropping some of the assumptions, we can have heteroskedastic multinomial logit model, generalized extreme value model, and so on.

Another popular distribution assumption is to assume a multivariate normal distribution of \(\epsilon_i\). This case is called the

**multinomial probit model**.

### 4.4.16 Multinomial Logit Model: Choice Probability

- The
**choice probability**of consumer \(i\) of good \(j\) is: \[\begin{equation} \begin{split} \sigma_j(p, x, y_i, z_i) & \equiv \mathbb{P}\{j = \text{argmax}_{k = 0, 1, \cdots, J} v(p_k, x_k, y_i, z_i) + \epsilon_{ik} \}\\ &=\mathbb{P}\{v(p_j, x_j, y_i, z_i) - v(p_k, x_k, y_i, z_i) \ge \epsilon_{ik} - \epsilon_{ij}, \forall k \neq j\}\\ & = \text{...after some algebra: leave as an exercise...}\\ &= \frac{\exp[v(p_j, x_j, y_i, z_i) ]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i)] }. \end{split} \end{equation}\] - For example, if: \[\begin{equation} v(p_k, x_k, y_i, z_i) = \beta_i'x_k + \alpha_i (y_i - p_k), \end{equation}\]

\[\begin{equation} \begin{pmatrix} \beta_i \\ \alpha_i \end{pmatrix} = \begin{pmatrix} \beta_0 \\ \alpha_0 \end{pmatrix} + \begin{pmatrix} \Gamma\\ \pi' \end{pmatrix} z_i. \end{equation}\] - Then, we have: \[\begin{equation} \begin{split} \sigma_{j}(p, x, y_i, z_i) &= \frac{\exp[\beta_i'x_j + \alpha_i (y_i - p_j) ]}{\sum_{k = 0}^J \exp[\beta_i'x_k + \alpha_i (y_i - p_k) ]}\\ &= \frac{\exp[\beta_i'x_j - \alpha_i p_j]}{\sum_{k = 0}^J \exp[\beta_i'x_k - \alpha_i p_k]} \end{split} \end{equation}\] - If we normalize the characteristics vector so that \(w_0 = 0\) holds for the outside option, it becomes: \[ \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[\beta_i'x_j - \alpha_i p_j]}{1 + \sum_{k = 1}^J \exp[\beta_i'x_k - \alpha_i p_k]} \]

### 4.4.17 Multinomial Logit Model: Inclusive Value

- The expected utility for consumer \(i\) before the preference shocks are drawn under multinomial logit model is given by: \[\begin{equation} \begin{split} &\mathbb{E}\{\max_{j = 0, \cdots, J} v(p_j, x_j, y_i, z_i) + \epsilon_{ij}\} \\ &= \text{ ...after some algebra: leave as an exercise...}\\ &= \ln \Bigg\{\sum_{j = 0}^J \exp[v(p_j, x_j, y_i, z_i)] \Bigg\} + constant. \end{split} \end{equation}\]
- This is sometimes called the
**inclusive value**of the choice set.

### 4.4.18 Maximum Likelihood Estimation of Multinomial Logit Model

- Suppose we observe a sequence of income \(y_i\), consumer characteristics \(z_i\), choice \(q_{i}\), product characteristics \(x_j\) and price \(p_j\).
- \(q_i = (q_{i0}, \cdots, q_{iJ})'\) and \(q_{ij} = 1\) if \(j\) is chosen and \(0\) otherwise.
- The parameter of interest is the mean indirect utility function \(v\).
- Then the log likelihood of \(\{q_i\}_{i = 1}^N\) conditional on \(\{y_i, z_i\}_{i = 1}^N\) and \(\{x_j,p_j\}_{j = 1}^J\) is: \[\begin{equation} \begin{split} l(v; q, y, z, w) &= \sum_{i = 1}^N \ln \mathbb{P}\{q_i = q(p, x, y_i, z_i)|p, x, y_i, z_i\}\\ & = \sum_{i = 1}^N \log \Bigg\{ \prod_{j = 0}^{J} \sigma_{j}(p, x, y_i, z_i)^{q_{ij}} \Bigg\}\\ &= \sum_{i = 1}^N \sum_{j = 0}^J \log \sigma_{j}(p, x, y_i, z_i)^{q_{ij}}. \end{split} \end{equation}\]
- We can estimate the parameters by finding parameters that maximize the log likelihood.

### 4.4.19 Nonlinear Least Square Estimation of Multinomial Logit Model

The multinomial logit model can be estimated by nonlinear least square method as well.

Suppose that the share of product \(j\) among consumers with characteristics \(z\) and income \(y\) was: \[\begin{equation} \sigma_j(p, x, y, z). \end{equation}\]

Note that: \[\begin{equation} \begin{split} \ln \sigma_{j}(p, x, y, z) &= \ln \Bigg\{ \frac{\exp[v(p_j, x_j, y, z) ]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y, z)] } \Bigg\}\\ &= v(p_j, x_j, y, z) - \ln\Bigg\{ \sum_{k = 0}^J \exp[v(p_k, x_k, y, z)] \Bigg\}. \end{split} \end{equation}\]

Moreover, because of the location normalization of the utility function, \[\begin{equation} \sigma_{0}(p, x, y, z) = \frac{1}{\sum_{k = 0}^J \exp[v(p_j, x_k, y, z)] }. \end{equation}\]

Hence, \[\begin{equation} \ln \sigma_{j}(p, x, y, z) - \ln \sigma_{0}(p, x, y, z) = v(p, x_j, y, z). \end{equation}\]

The left-hand variables are observed in the data.

Let \(s_j(y, z)\) be the share of product \(j\) among consumers with characteristics \(z\) and income \(y\)

**in the data**.This can be calculated from the consumer-level data.

More importantly, if there is the total sales data for each demographic, we can use this approach.

Then, we can estimate the parameter by NLLS such that: \[\begin{equation} \min \sum_{(y, z)} \sum_{j = 1}^J \{\ln[s_{j}(y, z)/s_{0}(y, z)] - v(p_j, x_j, y, z)\}^2. \end{equation}\]

If \(v\) is linear in parameter, it is the ordinal least squares: \[\begin{equation} v(p_j, x_j, y_m) = \beta_i' x_j - \alpha_i p_j. \end{equation}\]

\[\begin{equation} \ln[s_{j}(y, z)/s_{0}(y, z)] = \beta_i' x_j - \alpha_i p_j. \end{equation}\]

### 4.4.20 IIA Problem

- Multinomial logit problem is intuitive and easy to implement.
- However, there are several problems in the model.
- The most important problem is the
**independence of irrelevant alternatives (IIA)**problem. - Notice that: \[\begin{equation} \frac{\sigma_j(p, x, y, z)}{\sigma_{k}(p, x, y, z)} = \frac{\exp[v(p_j, x_j, y, z)]}{\exp[v(p_k, x_k, y, z)]}. \end{equation}\]
- The ratio of choice probabilities between two alternatives depend only on the mean indirect utility of these two alternatives and
**independent of irrelevant alternatives (IIA)**. - Why is this a problem?

### 4.4.21 Blue Bus and Red Bus Problem

- Suppose that you can go to a town by bus or by train.
- Half of commuters use a bus and the other half use a train.
- The existing bus was blue. Now, the county introduced a red bus, which is identical to the existing blue bus.
- No one take care of the color of bus. So the mean indirect utility of blue bus and red bus are equal.
- What is the new share across blue bus, red bus, and train?
- IIA \(\to\) share of blue bus = share of train.
- Buses are identical \(\to\) share of blue bus = share of red bus.
- Therefore, shares have to be 1/3, respectively.
- But shouldn’t it be that train keeps half share and bus have half share in total?

### 4.4.22 Restrictive Price Elasticity

- IIA property restrict price elasticities in an unfavorable manner.
- This is a serious problem because the main purpose for us to estimate demand functions is to identify the price elasticity.
- Let \(v(p_j, x_j, y, z) = \beta_z'x_j - \alpha_z p_j\). Then, we have: \[\begin{equation} e_{jk} = \begin{cases} -\alpha p_{j} (1 - \sigma_j(p, x, y, z)) &\text{ if } k = j\\ \alpha p_{k} \sigma_k(p_k, x_k, y, z) &\text{ if } k \neq j. \end{cases} \end{equation}\]
- The price elasticity is completely determined by the existing choice probabilities of the relevant alternatives.
- Suppose that there are coca cola, Pepsi cola, and a coffee.
- The shares were 1/2, 1/6, 1/3, respectively.
- Suppose that the price of coca cola increased.
- We expect that they instead purchase Pepsi cola because Pepsi cola is more similar to coca cola than coffee.
- However, according to the previous result, twice more consumers substitute to coffee rather than to Pepsi cola.

### 4.4.23 Monotonic Inclusive Value

- Suppose that there is a good whose mean indirect utility is \(v\).
- The inclusive value for this choice set is \(\ln[1 + \exp(v)]\).
- Suppose that we put \(J\) same goods on the shelf and consumer can choose any of them.
- The inclusive value is \(\ln[1 + J \exp(v)]\).
- We just added the same goods. But the expected utility of consumer increases monotonically in the number of alternatives.

### 4.4.24 The Source of the Problem

**The source of the problem is that there is no correlation in the preference shock across products**.- When the preference shock to coca cola is high, the preference shock to Pepsi cola should be high, while the preference shock to coffee should be relatively independent.
- Because the expected value of the maximum of the preference shocks increases according to the number of alternatives, the inclusive value becomes increasing in the number of alternatives.
- However, the preference shocks should be the same for the same good. Then, the the expected value of the maximum of the preference shock should not increase even if we add the same products on the shelf.

### 4.4.25 Correlation in Preference Shocks

- Therefore, the preference shock should be such that: preference shocks between two alternative should be more correlated when they are closer in the characteristics space.
- So we have to allow the covariance matrix of the preference shock to be free parameters.
- If we allow flexible covariance matrix, the curse of dimensionality in the number of alternatives comes back: The dimensionality of the covariance matrix is \(J^2\).
- Another way is to remove \(\epsilon_{ij}\): it is called a
**pure characteristics model**(Steven Berry & Pakes, 2007). - But the pure characteristics model is computationally not straightforward.
- We explore the way of introducing mild correlation across similar products in the preference shocks.

### 4.4.26 Observed and Unobserved Consumer Heterogeneity

- Consider beverage demand and let \(x_j = \text{carbonated}_j\) and \(z_i = \text{teenager}_i\).
- Suppose that the mean indirect utility is: \[\begin{equation} v(p_j, x_j, y_i, z_i) = \beta_i (\text{carbonated})_j - \alpha_i p_j, \end{equation}\]

\[\begin{equation} \beta_i = 0.1 + 0.2 \cdot (\text{teenager})_i. \end{equation}\]

The mean utility of a carbonated drink for a teenager is 0.3 but only 0.1 for others.

When coca cola was not available, teenager will substitute more to Pepsi cola than non-teenagers.

IIA holds at the market-segment level but not at the market level.

How to avoid IIA at the market-segment level?: Introduce unobserved consumer heterogeneity.

Suppose that the mean indirect utility is: \[\begin{equation} \beta_i = 0.1 + 0.2 \cdot (\text{teenager})_i + \nu_i. \end{equation}\]

Consumers with high \(\nu_i\) values carbonated drinks more than those with low \(\nu_i\) values.

When coca cola was not available, consumers with high \(\nu_i\) will substitute more to Pepsi cola than those with low \(\nu_i\) values.

IIA holds at the market-segment-\(\nu\) level but not at the market-segment level.

In the above example, “\(0.2 \cdot (\text{carbonated})_i\)” captures the consumer heterogeneity by observed characteristics and “\(\nu_i\)” by unobserved characteristics.

### 4.4.27 Mixed Logit Model

- Suppose that the mean indirect utility is: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j - \alpha_i p_j, \end{equation}\] with \[\begin{equation} (\beta_i, \alpha_i) \sim f(\beta_i, \alpha_i|y_i, z_i). \end{equation}\]
- If \(\epsilon_{ij}\) is drawn i.i.d. from type-I extreme value distribution, the choice probability of good \(j\) by consumer \(i\) conditional on \(p, x, y_i, z_i\) is: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \int_{\beta_i, \alpha_i} \frac{\exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_i|y_i, z_i) d\beta_i d\alpha_i. \end{equation}\]
- This is called the
**mixed-logit model**. - If the distribution of \(\epsilon_{ij}\) is different, it is no longer mixed logit.
- Conditional on \((\beta_i, \alpha_i)\) the choice probability is written in the same way with the multinomial logit model.
- \(\beta_i, \alpha_i\) are marginal out, because econometrician does not observe them.

### 4.4.28 Mixed Logit Model : Parametric Assumptions

- It is often assumed that: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j - \alpha_i p_j. \end{equation}\]
- McFadden & Train (2000) showed that any discrete choice models that are consistent with the random utility maximization can be arbitrarily closely approximated by this class of mixed-logit model.
- The distribution of \(\beta_i\) and \(\alpha_i\) is often assumed to be: \[\begin{equation} \begin{split} &\beta_i = \beta_0 + \Gamma z_i + \Sigma \nu_i,\\ &\alpha_i = \alpha_0 + \pi' z_i + \omega \upsilon_i, \end{split} \end{equation}\] where \(\nu_i\) and \(\upsilon_i\) are i.i.d. standard normal random vectors.

### 4.4.29 Mixed Logit Model: IIA

- There is no IIA at the market-segment level: \[\begin{equation} \frac{\sigma_{j}(p, x, y, z)}{\sigma_{l}(p, x, y_i, z_i)} = \frac{\int_{\beta_i, \alpha_i} \frac{\exp[v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_i|y_i) d\beta_i d\alpha_i}{\int_{\beta_i, \alpha_i} \frac{\exp[v(p_l, x_l, y_i, z_i, \beta_i, \alpha_i)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta_i, \alpha_i)]} f(\beta_i, \alpha_i|y_i) d\beta_i d\alpha_i}. \end{equation}\]
- The share ratio depends on the price and characteristics of all the other products.

### 4.4.30 Mixed Logit Moel: Price Elasticities

- Let: \[\begin{equation} v(p_j, x_j, y_i, z_i, \beta_i, \alpha_i) = \beta_i' x_j - \alpha_i p_j. \end{equation}\]
- The price elasticities of the choice probabilities conditional on \(p, x, y_i, z_i\) is: \[\begin{equation} e_{jk} = \begin{cases} -\frac{p_j}{\sigma_j} \int \alpha_i \sigma_{ij}(1 - \sigma_{ij})f(\beta_i, \alpha_i|y_i, z_i) d\beta_i d\alpha_i &\text{ if } j = k\\ \frac{p_k}{\sigma_j} \int \alpha_i \sigma_{ij} \sigma_{ik} f(\beta_i, \alpha_i|y_i, z_i) d\beta_i d\alpha_i &\text{ otherwise}, \end{cases} \end{equation}\] where \[\begin{equation} \sigma_{ij} = \frac{\exp(\beta_i'x_j - \alpha_i p_j)}{\sum_{k = 0}^J \exp(\beta_i'x_k - \alpha_i p_k)}. \end{equation}\]
- The price elasticity depends on the density of unobserved consumer types.

### 4.4.31 Simulated Maximum Likelihood Estimation of the Mixed Logit Model

- The choice probability of the mixed logit model is an integration of the multinomial logit choice probability.
- This is not derived analytically in general.
- We can use simulation to evaluate the choice probability:
- Draw \(R\) values of \(\beta\) and \(\alpha\), \(\{\beta^r, \alpha^r \}_{r = 1}^R\).
- Compute the multinomial choice probabilities associated with \((\beta^r, \alpha^r)\) for each \(r = 1, \cdots, R\).
- Approximate the choice probability with the mean of the simulated multinomial choice share: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) \approx \hat{\sigma}_{j}(p, x, y_i, z_i) \equiv \frac{1}{R} \sum_{r = 1}^R \frac{\exp[v(p_j, x_j, y_i, z_i, \beta^r, \alpha^r)]}{\sum_{k = 0}^J \exp[v(p_k, x_k, y_i, z_i, \beta^r, \alpha^r)]}. \end{equation}\]
- This is one of the numerical integration:
**Monte Carlo integration**. - Another approach is to use
**quadrature**. See Judd (1998) for reference.

### 4.4.32 Simulated Maximum Likelihood Estimation of the Mixed Logit Model

There are \(t = 1, \cdots, T\) markets and there \(i = 1, \cdots, N\) consumers in each market.

Let \(\mathcal{J}_t\) be the set of products that are available in market \(t\).

Suppose that we observe income \(y_{it}\), characteristics \(z_{it}\), and choice \(q_{it}\) for each consumer in a market.

Suppose that we observe product characteristics \(x_{jt}\) and price \(p_{jt}\) of each product in each market.

The simulated conditional log likelihood is: \[\begin{equation} \begin{split} &\sum_{i = 1}^N \sum_{t = 1}^T \ln \mathbb{P}\{q_{it} = q(p_t, x_t, y_{it}, z_{it})|p_t, x_t, y_{it}, z_{it}\} \\ &\approx \sum_{i = 1}^N \ln \Bigg\{ \prod_{j \in \mathcal{J}_t \cup \{0\}} \hat{\sigma}_{j}(p_t, x_t, y_{it}, z_{it})^{q_{itj}} \Bigg\}. \end{split} \end{equation}\]

We find parameters that maximize the simulated conditional log likelihood.

### 4.4.33 Simulated Non-linear Least Square Estimation of the Mixed Logit Model

- Suppose that we only know the sales or share at the market-segment level.
- That is, we only observe the share of product \(j\) in market \(t\) among consumers of characteristics \(z\) and income \(y\), \(s_{jt}(y, z)\).
- Then we can estimate the parameter by: \[\begin{equation} \min \sum_{t = 1}^T \sum_{j \in \mathcal{J}_t \cup \{0\}} \sum_{(y, z) \in \mathcal{Y} \times \mathcal{Z}} \{s_{jt}(y, z) - \hat{\sigma}_{j}(p_t, x_t, y, z)\}^2. \end{equation}\]

### 4.4.34 Nested Logit Model: A Special Case of Mixed Logit Model

- Let \(w_{j1}, \cdots, w_{jG}\) be the indicator of product category, i.e., \(w_{jg}\) takes value 1 if good \(j\) belong to category \(g\) and 0 otherwise.
- e.g., car category = {Sports, Luxury, Large, Midsize, Small}.
- We have: \[\begin{equation} v(p, x_j, y_i, z_i) = \beta'x_j - \alpha_i p_j + \sum_{g = 1}^G \zeta_{ig} w_{jg} + \epsilon_{ij}. \end{equation}\]
- If \(\zeta_{ig}\) takes high value, the consumer attaches higher value to the category.
- When a product in category \(g\) was not available, consumers with high \(\zeta_{ig}\) will substitute more to the other products in the same category than consumers with low \(\zeta_{ig}\).

### 4.4.35 Nested Logit Model: Distributional Assumption

- Let \[\begin{equation} \varepsilon_{ij} \equiv \sum_{g = 1}^G \zeta_{ig} w_{jg} + \epsilon_{ij}. \end{equation}\]
- Under certain distributional assumption on \(\zeta_{ig}\) and \(\epsilon_{ij}\), the term \(\varepsilon_{ij}\) have a cumulative distribution (Cardell, 1997): \[\begin{equation} F(\varepsilon_i) = \exp\Bigg\{- \sum_{g = 1}^G \Bigg(\sum_{j \in \text{ category } g} \exp[-\varepsilon_{ij}/\lambda_g] \Bigg)^{\lambda_g} \Bigg\}. \end{equation}\]

### 4.4.36 Nested Logit Model: Choice Probability

- Under this distributional assumption, the choice probability is: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g] \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g - 1}}{\sum_{g = 1}^G \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g}}, \end{equation}\] if good \(j\) belongs to category \(g\).
- The higher \(\lambda_g \in [0, 1]\) implies lower correlation within category \(g\).
- \(\lambda_g = 1\) for all \(g\) coincides with the multinomial logit model.

### 4.4.37 Nested Logit Model: Decomposition of the Choice Probability

- The choice probability can be decomposed into two parts: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g]}{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]} \frac{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]^{\lambda_g}}{\sum_{g = 1}^G \Bigg(\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]\Bigg)^{\lambda_g}}. \end{equation}\]
- Letting: \[ I_{g}(p, x, y_i, z_i) \equiv \log \sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g], \] we have: \[\begin{equation} \sigma_{j}(p, x, y_i, z_i) = \frac{\exp[v(p, x_j, y_i, z_i)/\lambda_g]}{\sum_{k \in \text{ category } g} \exp[v(p, x_k, y_i, z_i)/\lambda_g]} \frac{\exp[\lambda_g I_{g}(p, x, y_i, z_i)]}{\sum_{g = 1}^G \exp[\lambda_g I_{g}(p, x, y_i, z_i)]}. \end{equation}\]
- The second first term can be interpreted as the probability of choosing product \(j\) conditional on choosing category \(g\) and the second term as the probability of choosing category \(g\).

### 4.4.38 Discrete Choice Model with Unobserved Fixed Effects

- We have assumed that good \(j\) is characterized by a vector of observed characteristics \(x_j\).
- Can econometrician observe all the relevant characteristics of the products in the choice set? Maybe no. For example, econometrician may not observe brand values that are created by advertisement and recognized by consumers.
- Such unobserved product characteristics is likely to be correlated with the price.
- This can cause
**endogeneity problems**. - In the following, we consider the situation where only market-segment level share data is available.
- Because we can construct the market-share level data from individual choice level data, all the arguments should go through with the individual choice level data.

### 4.4.39 Unobserved Fixed Effects in Multinomial Logit Model

- To fix the idea, let’s revisit the multinomial logit model.
- For now, we do not consider either observed or unobserved consumer heterogeneity.
- Including observed heterogeneity is straightforward.
- We discuss how to include unobserved heterogeneity in the subsequent sections.
- Suppose that the indirect utility function of good \(j\) for consumer \(i\) in market \(t\) is: \[\begin{equation} \beta' x_{jt} - \alpha p_{jt} + \xi_{jt} + \epsilon_{ijt}, \end{equation}\]
- \(\epsilon_{ijt}\) is i.i.d. Type-I extreme value.
- \(\xi_{jt}\) is the
*unobserved product-market-specific fixed effect*of product \(j\) in market \(t\), which can be correlated with \(p_{jt}\). - We hold the assumption that \(x_{jt}\) is uncorrelated with \(\xi_{jt}\).
- The choice probability of good \(j\) for this consumer and hence the choice share in this market is:

\[\begin{equation} \sigma_j(p_t, x_t, \xi_t) = \frac{\exp(\beta' x_j - \alpha p_{jt} + \xi_{jt})}{1 + \sum_{k = 1}^J\exp(\beta' x_k - \alpha p_{kt} + \xi_{kt} ) }. \end{equation}\] - How to deal with the endogeneity between \(p_{jt}\) and \(\xi_{jt}\)?

### 4.4.40 Instrumental Variables and Inversion

- Suppose that we have a vector of instrumental variables \(w_{jt}\) such that: \[\begin{equation} \mathbb{E}\{\xi_{jt}|w_{jt}\} = 0. \end{equation}\]
- In a liner model, we
**invert**the model for the unobserved fixed effects: \[\begin{equation} \xi_{jt} = y_{jt} - \beta'x_{jt}, \end{equation}\] - Notice that the unobserved fixed effect is written as a function of parameters and data.
- Then we exploit the moment condition by: \[\begin{equation} \begin{split} &\mathbb{E}\{\xi_{jt}|w_{jt}\} = 0,\\ &\Rightarrow \mathbb{E}\{ \xi_{jt} w_{jt}\} = 0,\\ &\Leftrightarrow \mathbb{E}\{(y_{jt} - \beta'x_{jt}) w_{jt} \} = 0 \end{split} \end{equation}\]
- We can estimate \(\beta\) by finding the value that makes the sample analogue of the above expectation zero.

### 4.4.41 Inversion in Multinomial Logit Model

- Can we invert the multinomial model for \(\xi_{jt}\)?
- We have: \[\begin{equation} \begin{split} &\ln [\sigma_{jt}(p_t, x_t, \xi_t) / \sigma_{0t}(p_t, x_t, \xi_t)] = \beta' x_j - \alpha p_{jt} + \xi_{jt}\\ &\Leftrightarrow \xi_{jt} = \ln [\sigma_j(p_t, x_t, \xi_t) / \sigma_0(p_t, x_t, \xi_t)] - [\beta' x_j - \alpha p_{jt}]. \end{split} \end{equation}\]
- Therefore, the moment condition can be written as: \[\begin{equation} \begin{split} &\mathbb{E}\{\xi_{jt}|w_{jt}\} = 0,\\ &\Rightarrow \mathbb{E}\{\xi_{jt} w_{jt}\} = 0,\\ &\Leftrightarrow \mathbb{E}\{(\ln [\sigma_{jt}(p_t, x_t, \xi_t) / \sigma_{0t}(p_t, x_t, \xi_t)] - [\beta' x_j - \alpha p_{jt}]) w_{jt} \} = 0. \end{split} \end{equation}\]
- We can evaluate the sample analogue of the expectation by replacing the theoretical choice probability \(\sigma\) with the observed share \(s\).
- At the end, it is no different from the linear model where the dependent variable is \(\ln s_{jt}/s_{0t}\).

### 4.4.42 Market-invariant Product-specific Fixed Effects

- Furthermore, if you can assume \(\xi_{jt} = \xi_j\), then \[\begin{equation} \ln [\sigma_j(p_t, x_t, \xi_t) / \sigma_0(p_t, x_t, \xi_t)] = \beta' x_{jt} - \alpha p_{jt} + \xi_{j}. \end{equation}\]
- This is nothing but a linear regression on \(x_j\) and \(p_{jt}\) with product-specific unobserved fixed effect.
- This can be estimated by a within-estimator.
- This specification is a good starting point: we better start with the simplest specification and use the estimate as the initial guess for the following specifications.

### 4.4.43 Unobserved Consumer Heterogeneity and Unobserved Fixed Effects in Mixed-logit Model

- So far we abstracted away from the unobserved consumer heterogeneity.
- Next, suppose that the indirect utility function of good \(j\) for consumer \(i\) in market \(t\) is: \[\begin{equation} \beta_i' x_{jt} - \alpha_i p_{jt} + \xi_{jt} + \epsilon_{ijt}, \end{equation}\] where \(\epsilon_{ik}\) is i.i.d. Type-I extreme value.
- The coefficient are drawn according to: \[\begin{equation} \begin{split} &\beta_{it} = \beta_0 + \Sigma \nu_{it},\\ &\alpha_{it} = \alpha_0 + \Omega \upsilon_{it}, \end{split} \end{equation}\]
- \(\nu_i\) are i.i.d. standard normal random variables.
- Then the indirect utility of good \(j\) for consumer \(i\) in market \(t\) is written as: \[\begin{equation} \underbrace{\beta_0' x_{jt} - \alpha_0 p_{jt} + \xi_{jt}}_{\text{(conditional) mean}} + \underbrace{\nu_{it}' \Sigma x_{jt} - \upsilon_{it}' \Omega p_{jt}}_{\text{deviation from the mean}} \end{equation}\]
- We refer to \(\beta_0, \alpha_0\) as
**linear parameters**and \(\Sigma, \Omega\) as**non-linear parameters**, because of the reason I explain in the subsequent section. - Let \(\theta_1\) be the linear parameters and \(\theta_2\) the non-linear parameters and let \(\theta = (\theta_1', \theta_2')'\).

### 4.4.44 Unobserved Fixed Effects in Mixed-logit Model

- The choice share of good \(j\) in market \(t\) is: \[\begin{equation} \begin{split} &\sigma_{j}(p_t, x_t, \xi_t; \theta)\\ &= \int \frac{\exp[\beta_0' x_{jt} - \alpha_0 p_{jt} + \xi_{jt} + \nu_{it}' \Sigma x_{jt} - \upsilon_{it}' \Omega p_{jt}]}{1 + \sum_{k \in \mathcal{J}_t} \exp[\exp[\beta_0' x_{kt} - \alpha_0 p_{kt} + \xi_{kt} + \nu_{it}' \Sigma x_{kt} - \upsilon_{it}' \Omega p_{kt}]]} f(\nu, \upsilon) d \nu d \upsilon. \end{split} \end{equation}\]
- How can we represent \(\xi_{jt}\) as a function of parameters of interest to exploit the moment condition?

### 4.4.45 Representing \(\xi_{jt}\) as a Function of Parameters of Interest

- Let \(s_{jt}\) be the share of product \(j\) in market \(t\).
- The following system of equations implicitly determines \(\xi_{jt}\) as a function of parameters of interest: \[\begin{equation} s_{jt} = \sigma_j(p_t, x_t, \xi_t; \theta). \end{equation}\]
- Let \(\xi_{jt}(\theta)\) is the solution to the system of equations above given parameter \(\theta\).
- If it exists, it is the unobserved heterogeneity as a function of parameters and data.
- Does this solution exist?
- Is it unique?
- Is there efficient method to find the solution?

### 4.4.46 Summarizing the Conditional Mean Term

- Now, let \(\delta_{jt}\) be the conditional mean term in the indirect utility: \[\begin{equation} \delta_{jt} \equiv \beta_0' x_{jt} - \alpha_0 p_{jt} + \xi_{jt}. \end{equation}\]
- I call it the average utility of the product in the market.
- Then, the choice share of product \(j\) in market \(t\) is written as: \[\begin{equation} \begin{split} &\sigma_{jt}(\delta_t, \theta_2) \\ &\equiv \int \frac{\exp\Bigg(\delta_{jt} + \nu' \Sigma x_{jt} - \upsilon' \Omega p_{jt}\Bigg)}{1 + \sum_{k \in \mathcal{J}_t} \exp\Bigg(\delta_{kt} + \nu' \Sigma x_{kt} - \upsilon' \Omega p_{kt}\Bigg)} f(\nu, \upsilon) d\nu d\upsilon, \end{split} \end{equation}\] for \(j = 1, \cdots, J, t = 1, \cdots, T\).

### 4.4.47 Contraction Mapping for \(\delta_t\).

- Now, fix \(\theta_2\) and define an operator \(T\) such that: \[\begin{equation} T_t(\delta_t) = \delta_t + \ln \underbrace{s_{t}}_{\text{data}} - \ln \underbrace{\sigma_{t}(\delta_t, \theta_2)}_{\text{model}}, \end{equation}\] where \(\delta_t = (\delta_{1t}, \cdots, \delta_{Jt})'\), \(s_t = (s_{1t}, \cdots, s_{Jt})'\) and \(\sigma_t = (\sigma_{1t}, \cdots, \sigma_{Jt})'\).
- Let \(\delta_t^{(0)} = (\delta_{1t}^{(0)}, \cdots, \delta_{Jt}^{(0)})'\) be an arbitrary starting vector of average utility of products in a market.
- Using the operator above, we update \(\delta_{t}^{(r)}\) by: \[\begin{equation} \delta_{t}^{(r + 1)} = T_t(\delta_{t}^{(r)}) = \delta_t^{(r)} + \ln s_{t} - \ln \sigma_{t}(\delta_t^{(r)}, \theta_2), \end{equation}\] for \(r = 0, 1, \cdots\).
- Steven Berry, Levinsohn, & Pakes (1995) proved that \(T_t\) as specified above is a
**contraction mapping with modulus less than one**. - This means that:
- \(T_t\) has a unique fixed point;
- For arbitrary \(\delta_t^{(r)}\), \(\lim_{r \to \infty} T_t^r(\delta_t^{(0)})\) is the unique fixed point.

- The fixed point of \(T_t\) is \(\delta_t^*\) such that \(\delta_t^* = T_t(\delta_t^*)\), i.e., \[\begin{equation} \begin{split} &\delta_t^* = \delta_t^* + \ln s_{t} - \ln \sigma_{t}(\delta_t^*, \theta_2),\\ &\Leftrightarrow s_{t} = \sigma_{t}(\delta_t^*, \theta_2). \end{split} \end{equation}\]
- So, the fixed point \(\delta_t^*\) is the conditional mean indirect utility that solves the equality given non-linear parameter \(\theta_2\).
- Moreover, the solution is unique.
- Moreover, it can be found by iterating the operator.
- Let \(\delta_t(\theta_2)\) be the solution to this equation, i.e., the limit of this operation.
- The above result is useful because it ensures the inversion and provides the algorithm to find the solution.
- The invertibility itself holds under more general settings (Steven Berry, Gandhi, & Haile, 2013).

### 4.4.48 Solving for \(\xi_{jt}(\theta)\)

- We defined the average utility as: \[\begin{equation} \delta_{jt} = \beta_0' x_{jt} - \alpha_0 p_{jt} + \xi_{jt}. \end{equation}\]
- Hence, if we set: \[\begin{equation} \xi_{jt}(\theta) \equiv \delta_{jt}(\theta_2) - \Bigg[\beta_0' x_{jt} - \alpha_0 p_{jt} \Bigg], \end{equation}\] the \(\xi_{jt}(\theta)\) solves the equality: \[\begin{equation} s_{jt} = \sigma_{j}(p_t, x, \xi_t; \theta). \end{equation}\]

### 4.4.49 Solving for \(\xi_{jt}(\theta)\): Summary

- In summary, \(\xi_{jt}\) that solves the equality exists and unique, and can be computed by:
- Fix \(\theta = \{\theta_1, \theta_2\}\).
- Fix arbitrary starting value \(\delta_t^{(0)}\) for \(t = 1, \cdots, T\).
- Let \(\delta_t(\theta_2)\) be the limit of \(T_t^r(\delta_t^{(0)})\) for \(r = 0, 1, \cdots\) for each \(t = 1, \cdots, T\).
- Stop the iteration if \(|\delta_t(\theta_2)^{(r + 1)} - \delta_t(\theta_2)^{(r)}|\) is below a threshold.
- Let \(\xi_{jt}(\theta)\) be such that: \[\begin{equation} \xi_{jt}(\theta) = \delta_{jt}(\theta_2) - \beta_0' x_{jt} - \alpha_0 p_{jt}. \end{equation}\]
- Then we can evaluate the moment at \(\theta\) by: \[\begin{equation} \mathbb{E}\{\xi_{jt}(\theta)|w_{jt}\} = 0. \end{equation}\]
- We run this algorithm every time we evaluate the moment condition at a parameter value.

### 4.4.50 GMM Objective Function

- Find \(\theta\) that solves: \[\begin{equation} \min_{\theta} \xi(\theta)' W \Phi^{-1} W' \xi(\theta), \end{equation}\] where \(\Phi\) is a weight matrix, \[\begin{equation} \xi(\theta) = \begin{pmatrix} \xi_{11}(\theta)\\ \vdots\\ \xi_{J_1 1}(\theta)\\ \vdots\\ \xi_{1T} \\ \vdots\\ \xi_{J_T T} \end{pmatrix}, W = \begin{pmatrix} w_{11}' \\ \vdots \\ w_{J_11}' \\ \vdots \\ w_{1T}' \\ \vdots \\ w_{J_TT}' \\ \end{pmatrix}. \end{equation}\]
- There are \(J \to \infty\) and \(T \to \infty\) asymptotics. Either is fine to consistently estimate the parameters.
- \(w_{jt} = (x_{jt}', w_{jt}^*)'\) where \(w_{jt}^*\) is an excluded instrument that is relevant to \(p_{jt}\).

### 4.4.51 Estimating Linear Parameters

- The first-order condition for \(\theta_1\) is: \[\begin{equation} \theta_1 = (X_1'W \Phi^{-1} W'X_1)^{-1} X_1' W \Phi^{-1} W' \delta(\theta_2), \end{equation}\] where \[\begin{equation} X_1 = \begin{pmatrix} x_{11}' & - p_{11}\\ \vdots & \vdots \\ x_{J_1 1}' & - p_{J_1 1}\\ \vdots & \vdots \\ x_{1T}' & - p_{1T}\\ \vdots & \vdots \\ x_{J_T T} & - p_{J_T T} \end{pmatrix}, \delta(\theta_2) = \begin{pmatrix} \delta_1(\theta_2)\\ \vdots\\ \delta_T(\theta_2) \end{pmatrix} \end{equation}\].
- If \(\theta_2\) is given, the optimal \(\theta_1\) is computed by the above formula.
- \(\rightarrow\) We only have to search over \(\theta_2\).
- This is the reason why we called \(\theta_1\) linear parameters and \(\theta_2\) non-linear parameters.

### 4.4.52 BLP Algorithm

- Find \(\theta_2\) that maximizes the GMM objective function.
- To do so:

- Pick up \(\theta_2\).
- Compute \(\delta(\theta_2)\) by the fixed-point algorithm.
- Compute associated \(\theta_1\) by the formula: \[\begin{equation} \theta_1 = (X_1'W \Phi^{-1} W'X_1)^{-1} X_1' W \Phi^{-1} W' \delta(\theta_2), \end{equation}\]
- Compute \(\xi(\theta)\) from the above \(\delta(\theta_2)\) and \(\theta_1\).
- Evaluate the GMM objective function with the \(\xi(\theta)\).

### 4.4.53 Mathematical Program with Equilibrium Constraints (MPEC)

- In the BLP algorithm, for each parameter \(\theta\), find \(\xi(\theta)\) that solve: \[\begin{equation} s = \sigma(p, x, \xi; \theta) \end{equation}\] by the fixed-point algorithm and then evaluate the GMM objective function.
- This inner loop takes time if the stopping criterion is tight.
- If the stopping criterion is loose, the loop may stop earlier but the error may be unacceptably large.
- Dubé, Fox, & Su (2012) suggest to minimize the GMM objective function with the above equation as the constraints. \[\begin{equation} \min_{\theta} \xi(\theta)' W \Phi^{-1} W' \xi(\theta) \text{ s.t. } s = \sigma(p, x, \xi; \theta). \end{equation}\]
- To enjoy the benefit of this approach, we have to analytically derive the gradient and hessian of the objective function and the constraints, which are anyway needed if we estimate the standard error with the plug-in method.
- If the problem is of small scale, BLP algorithm will be fast enough and easier to implement.
- If the problem is of large scale, you may better use the MPEC approach.

### 4.4.54 Control Function Approach

- The BLP method requires a share data. It does not work when each consumer face different choice set.
- How to deal with the endogeneity between the price and unobserved fixed effect in such a case?
- The equilibrium price vector in a market is determined by product characteristics, and productivity and demand conditions.
- It will depends on the unobserved product characteristics \(\xi_{jt}\).
- Suppose that the equilibrium price vector in a market is determined by: \[\begin{equation} p_{t} = P(x_t, w_t, \xi_t), \end{equation}\] where \(w_t\) is a vector of variables that affect the price but are excluded from the indirect utility function.
- If we can estimate it and it is invertible in \(\xi_t\), then, we can have a proxy for \(\xi_t\) such as: \[\begin{equation} \hat{\xi}_t = \hat{P}^{-1}(x_t, z_t). \end{equation}\]
- If we insert this into the previous models, \(\xi_{jt}\) is no longer
**unobserved**product characteristics.

### 4.4.55 Control Function Approach

- For example, suppose that \(\xi_{jt}\) is decomposed into anticipated shock \(\nu_{jt}\) and ex-post shock \(\eta_{jt}\) as \[\begin{equation} \xi_{jt} = \rho \nu_{jt} + \eta_{jt}. \end{equation}\]
- Then, the equilibrium price should only depend on the anticpated shock as \[\begin{equation} p_{jt} = \beta x_{jt} + \gamma w_{jt} + \nu_{jt}. \end{equation}\]
- \(p_{jt}\) and \(\xi_{jt}\) are correlated due to ex-ante shock \(\nu_{jt}\).
- In the first stage, regress \(p_{jt}\) on \(x_{jt}\) and \(w_{jt}\) to obtain \(\hat{\beta}\) and \(\hat{\gamma}\).
- Then, construct \(\hat{\nu}_{jt} = p_{jt} - \hat{\beta} x_{jt} - \hat{\gamma} w_{jt}\).
- Then, we have \(\xi_{jt} = \rho \hat{\nu}_{jt} + \eta_{jt}\).
- In the second stage, insert this into the choice probability.
- \(\eta_{jt}\) is an unobserved
**random**shock. - We integrate \(\eta_{jt}\) to calculate the likelihood.
- The instrument \(w_{jt}\) is necessary, because, otherwise, \(\nu_{jt}\) has no variation conditional on \(p_{jt}\) and \(x_{jt}\) and \(\rho\) cannot be identified.

### 4.4.56 Instrumental Variables

- The remaining problem is how to choose the excluded instrumental variable \(w_{jt}^*\) for each product/market.
**Cost shifters**:- Traditional instruments.

**Hausman-type IV**(Hausman, Leonard, & Zona, 1994):- Assume that demand shocks are independent across markets, whereas the cost shocks are correlated.
- The latter will be true if the product is produced by the same manufacturer.
- Then, the price of the same product in the other markets \(p_{j, -t}\) will be valid instruments for the price of the product in a given market, \(p_{jt}\).

**BLP-type IV**(Steven Berry et al., 1995):- In oligopoly, the price of a good in a market depends on the market structure, i.e., what kind of products are available in the market.
- For example, if there are similar products in the market, the price will tend to be lower.
- Then, the product characteristics of other products in the market , will be valid instrument for the price of goods in a given market, \(p_{jt}\).
- If there are multi-product firms, whether the other good is owned by the same company will also affect the price.
- Specifically, Steven Berry et al. (1995) use:

\[\begin{equation} \sum_{k \neq j \in \mathcal{J}_t \cap \mathcal{F}_{f}} x_{kt}, \end{equation}\]

\[\begin{equation} \sum_{k \neq j \in \mathcal{J}_t \setminus \mathcal{F}_{f}} x_{kt}. \end{equation}\] - \(f\) is the firm that owns product \(j\) and \(\mathcal{F}_{f}\) is the set of products firm \(f\) owns.

**Differentiation IV**(Gandhi & Houde, 2015):- Let \(d_{jkt} = d(x_{jt}, x_{kt})\) be some distance between product characteristics.
- They showed that under certain conditions the optimal BLP-type IV is a function \(d_{-jt}\{d_{jkt}\}_{k \neq j \in \mathcal{J}_t}\).
- The suggest to use the moments of \(d_{-jt}\) as the excluded instrument variables.

- Weak instruments problem of BLP-type IV:
- Armstrong (2016) argued that estimates based on BLP-type IV may be inconsistent when \(J \times \infty\) asymptotics is considered, because then the market approaches the competitive market and the correlation between the markup and the product characteristics of the rivals disappear.
- Specifically, the estimator is inconsistent if all of the following conditions are met:

- \(J \to \infty\) but \(T\) is fixed;
- The demand/cost functions are such that the correlation between markups and characteristics of other products decreases quickly enough as \(J \to \infty\).
- There is no cost instruments or other sources of identification.

### 4.4.57 PyBLP

- Conlon & Gortmaker (2020) summarized the best practice for the BLP and implemented them in a Python packages
`PyBLP`

. - From R, we can use
`reticulate`

to call Python functions.

### References

*International Economic Review*,

*8*(1).

*Economics and Consumer Behavior*. New York: Cambridge University Press.

*Management Science*,

*57*(8), 1485–1509.

*Econometrica*,

*84*(5), 1961–1980.

*A Unified Framework for Measuring Preferences for Schools and Neighborhoods*(No. 4) (Vol. 115).

*Econometrica*,

*81*(5), 2087–2111.

*Econometrica*,

*63*(4), 841–890.

*International Economic Review*,

*48*(4), 1193–1225.

*Revealed Preference Analysis of Characteristics Models*(Vol. 75, pp. 371–389).

*The Journal of Industrial Economics*,

*35*(4), 457–482.

*Econometric Theory*,

*13*, 185–213.

*Management Science*,

*48*(12).

*The RAND Journal of Economics*,

*51*(4), 1108–1161.

*An Almost Ideal Demand System*(No. 3) (Vol. 70, pp. 312–326).

*Econometrica*,

*80*(5), 2231–2267.

*Measuring Substitution Patterns in Differentiated Products Industries*.

*Valuing new goods in a model with complementarity : Online newspapers*(No. 3) (Vol. 97, pp. 713–744).

*A Possible Procedure for Analysing Quality Differentials in the Egg Market*(No. 5) (Vol. 47, pp. 843–856).

*The Review of Economic Studies*,

*82*(1), 258–296.

*Annales d’Économie et de Statistique*, (34), 159–180.

*Econometrica*,

*79*(1), 253–302.

*Economica*,

*17*(66), 159–174.

*Numerical methods in economics*. MIT Press.

*Journal of Political Economy*,

*74*(2), 132–157.

*Frontiers in Econometrics*. New York: Academic Press.

*Journal of Applied Econometrics*,

*15*(5), 447–470.

*The American Economic Review*,

*64*(6), 977–994.

*Econometrica*,

*44*(5), 979–999.

*Household Production and Consumer Demand Functions*(No. 3) (Vol. 34, pp. 699–708).

*Mergers with differentiated products: The case of the ready-to-eat cereal industry*(No. 3) (Vol. 31).

*The Measurement of Consumers’ Expenditure and Behaviour in the United Kingdom 1920-1938, vol. 1*. Cambridge: Cambridge University Press.

*Econometrica*,

*34*(3).

*Journal of Political Economy*,

*82*(1), 34–55.

*Competition Between Networks: A Study of the Market for Yellow Pages*(Vol. 71, pp. 483–512).

*Economica*,

*5*(17), 61–71.

*Discrete Choice Methods with Simulation*. Cambridge: Cambridge University Press.

*Econometrica*,

*50*(4), 945–973.