Chapter 15 Assignment 5: Merger Simulation

The deadline is April 1 1:30pm.

15.1 Simulate data

We simulate data from a discrete choice model that is the same with in assignment 4 except for that the price is derived from the Nash equlibrium. There are \(T\) markets and each market has \(N\) consumers. There are \(J\) products and the indirect utility of consumer \(i\) in market \(t\) for product \(j\) is: \[ u_{itj} = \beta_{it}' x_j + \alpha_{it} p_{jt} + \xi_{jt} + \epsilon_{ijt}, \] where \(\epsilon_{ijt}\) is an i.i.d. type-I extreme random variable. \(x_j\) is \(K\)-dimensional observed characteristics of the product. \(p_{jt}\) is the retail price of the product in the market.

\(\xi_{jt}\) is product-market specific fixed effect. \(p_{jt}\) can be correlated with \(\xi_{jt}\) but \(x_{jt}\)s are independent of \(\xi_{jt}\). \(j = 0\) is an outside option whose indirect utility is: \[ u_{it0} = \epsilon_{i0t}, \] where \(\epsilon_{i0t}\) is an i.i.d. type-I extreme random variable.

\(\beta_{it}\) and \(\alpha_{it}\) are different across consumers, and they are distributed as: \[ \beta_{itk} = \beta_{0k} + \sigma_k \nu_{itk}, \] \[ \alpha_{it} = - \exp(\mu + \omega \upsilon_{it}) = - \exp(\mu + \frac{\omega^2}{2}) + [- \exp(\mu + \omega \upsilon_{it}) + \exp(\mu + \frac{\omega^2}{2})] \equiv \alpha_0 + \tilde{\alpha}_{it}, \] where \(\nu_{itk}\) for \(k = 1, \cdots, K\) and \(\upsilon_{it}\) are i.i.d. standard normal random variables. \(\alpha_0\) is the mean of \(\alpha_i\) and \(\tilde{\alpha}_i\) is the deviation from the mean.

Given a choice set in the market, \(\mathcal{J}_t \cup \{0\}\), a consumer chooses the alternative that maximizes her utility: \[ q_{ijt} = 1\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \] The choice probability of product \(j\) for consumer \(i\) in market \(t\) is: \[ \sigma_{ijt}(p_t, x_t, \xi_t) = \mathbb{P}\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \]

Suppose that we only observe the (smooth) share data: \[ s_{jt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \sigma_{ijt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \frac{\exp(u_{ijt})}{1 + \sum_{k \in \mathcal{J}_t \cup \{0\}} \exp(u_{ikt})}. \] along with the product-market characteristics \(x_{jt}\) and the retail prices \(p_{jt}\) for \(j \in \mathcal{J}_t \cup \{0\}\) for \(t = 1, \cdots, T\). We do not observe the choice data \(q_{ijt}\) nor shocks \(\xi_{jt}, \nu_{it}, \upsilon_{it}, \epsilon_{ijt}\).

We draw \(\xi_{jt}\) from i.i.d. normal distribution with mean 0 and standard deviation \(\sigma_{\xi}\).

  1. Set the seed, constants, and parameters of interest as follows.
# set the seed
set.seed(1)
# number of products
J <- 10
# dimension of product characteristics including the intercept
K <- 3
# number of markets
T <- 100
# number of consumers per market
N <- 500
# number of Monte Carlo
L <- 500
# set parameters of interests
beta <- rnorm(K); 
beta[1] <- 4
beta
## [1]  4.0000000  0.1836433 -0.8356286
sigma <- abs(rnorm(K)); sigma
## [1] 1.5952808 0.3295078 0.8204684
mu <- 0.5
omega <- 1

Generate the covariates as follows.

The product-market characteristics: \[ x_{j1} = 1, x_{jk} \sim N(0, \sigma_x), k = 2, \cdots, K, \] where \(\sigma_x\) is referred to as sd_x in the code.

The product-market-specific unobserved fixed effect: \[ \xi_{jt} \sim N(0, \sigma_\xi), \] where \(\sigma_xi\) is referred to as sd_xi in the code.

The marginal cost of product \(j\) in market \(t\): \[ c_{jt} \sim \text{logNormal}(0, \sigma_c), \] where \(\sigma_c\) is referred to as sd_c in the code.

The price is determined by a Nash equilibrium. Let \(\Delta_t\) be the \(J_t \times J_t\) ownership matrix in which the \((j, k)\)-th element \(\delta_{tjk}\) is equal to 1 if product \(j\) and \(k\) are owned by the same firm and 0 otherwise. Assume that \(\delta_{tjk} = 1\) if and only if \(j = k\) for all \(t = 1, \cdots, T\), i.e., each firm owns only one product. Next, define \(\Omega_t\) be \(J_t \times J_t\) matrix such that whose \((j, k)\)-the element \(\omega_{tjk}(p_t, x_t, \xi_t, \Delta_t)\) is: \[ \omega_{tjk}(p_t, x_t, \xi_t, \Delta_t) = - \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} \delta_{tjk}. \] Then, the equilibrium price vector \(p_t\) is determined by solving the following equilibrium condition: \[ p_t = c_t + \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t). \]

The value of the auxiliary parameters are set as follows:

# set auxiliary parameters
price_xi <- 1
sd_x <- 2
sd_xi <- 0.5
sd_c <- 0.05
sd_p <- 0.05
  1. X is the data frame such that a row contains the characteristics vector \(x_{j}\) of a product and columns are product index and observed product characteristics. The dimension of the characteristics \(K\) is specified above. Add the row of the outside option whose index is \(0\) and all the characteristics are zero.
X
## # A tibble: 11 x 4
##        j   x_1     x_2     x_3
##    <dbl> <dbl>   <dbl>   <dbl>
##  1     0     0  0       0     
##  2     1     1  0.975  -0.0324
##  3     2     1  1.48    1.89  
##  4     3     1  1.15    1.64  
##  5     4     1 -0.611   1.19  
##  6     5     1  3.02    1.84  
##  7     6     1  0.780   1.56  
##  8     7     1 -1.24    0.149 
##  9     8     1 -4.43   -3.98  
## 10     9     1  2.25    1.24  
## 11    10     1 -0.0899 -0.112
  1. M is the data frame such that a row contains the price \(\xi_{jt}\), marginal cost \(c_{jt}\), and price \(p_{jt}\). For now, set \(p_{jt} = 0\) and fill the equilibrium price later. After generating the variables, drop some products in each market. In order to change the number of available products in each market, for each market, first draw \(J_t\) from a discrete uniform distribution between \(1\) and \(J\). Then, drop products from each market using dplyr::sample_frac function with the realized number of available products. The variation in the available products is important for the identification of the distribution of consumer-level unobserved heterogeneity. Add the row of the outside option to each market whose index is \(0\) and all the variables take value zero.
M
## # A tibble: 689 x 5
##        j     t      xi     c     p
##    <dbl> <int>   <dbl> <dbl> <dbl>
##  1     0     1  0      0         0
##  2     1     1 -0.0779 0.951     0
##  3     2     1 -0.735  1.04      0
##  4     7     1  0.194  0.961     0
##  5    10     1 -0.207  1.02      0
##  6     0     2  0      0         0
##  7     8     2  0.278  0.955     0
##  8     0     3  0      0         0
##  9     3     3 -0.0562 1.02      0
## 10     0     4  0      0         0
## # … with 679 more rows
  1. Generate the consumer-level heterogeneity. V is the data frame such that a row contains the vector of shocks to consumer-level heterogeneity, \((\nu_{i}', \upsilon_i)\). They are all i.i.d. standard normal random variables.
V
## # A tibble: 50,000 x 6
##        i     t   v_x_1    v_x_2  v_x_3      v_p
##    <int> <int>   <dbl>    <dbl>  <dbl>    <dbl>
##  1     1     1  0.559  -0.362   -0.707  0.594  
##  2     2     1 -1.00   -0.306    0.324 -0.368  
##  3     3     1  0.900   0.464    0.253 -0.994  
##  4     4     1  0.152  -0.640   -0.622 -0.290  
##  5     5     1 -0.301  -2.18     0.151  0.475  
##  6     6     1  0.0512 -1.05     0.430  0.159  
##  7     7     1  0.292   0.00469  1.29   0.761  
##  8     8     1  0.245  -0.330   -0.420 -0.00911
##  9     9     1  0.0827 -0.00644 -1.59  -1.02   
## 10    10     1 -0.0404 -0.0764   0.259 -0.911  
## # … with 49,990 more rows
  1. We use compute_indirect_utility(df, beta, sigma, mu, omega), compute_choice_smooth(X, M, V, beta, sigma, mu, omega), and compute_share_smooth(X, M, V, beta, sigma, mu, omega) to compute \(s_t(p_t, x_t, \xi_t)\). On top of this, we need a function compute_derivative_share_smooth(X, M, V, beta, sigma, mu, omega) that approximate:

\[ \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} = \begin{cases} \frac{1}{N} \sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)[1 - \sigma_{ijt}(p_t, x_t, \xi_t)] &\text{ if } j = k\\ - \frac{1}{N}\sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)\sigma_{kt}(p_t, x_t, \xi_t)] &\text{ if } j \neq k. \end{cases} \]

The returned object should be a list across markets and each element of the list should be \(J_t \times J_t\) matrix whose \((j, k)\)-th element is \(\partial s_{jt}/\partial p_{it}\) (do not include the outside option). The computation will be looped across markets. I recommend to use a parallel computing for this loop.

derivative_share_smooth <-
  compute_derivative_share_smooth(X, M, V, beta, sigma, mu, omega)
derivative_share_smooth[[1]]
##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.55323517  0.07416782  0.22222078  0.24032215
## [2,]  0.07416782 -0.17952618  0.05347946  0.04757626
## [3,]  0.22222078  0.05347946 -0.47677137  0.18862102
## [4,]  0.24032215  0.04757626  0.18862102 -0.48803506
derivative_share_smooth[[T]]
##             [,1]        [,2]        [,3]        [,4]        [,5]
## [1,] -0.07358769  0.01724997  0.01539716  0.00728682  0.01037598
## [2,]  0.01724997 -0.18464980  0.03941164  0.02590020  0.05210013
## [3,]  0.01539716  0.03941164 -0.14947032  0.01760727  0.02950338
## [4,]  0.00728682  0.02590020  0.01760727 -0.11933630  0.04443484
## [5,]  0.01037598  0.05210013  0.02950338  0.04443484 -0.18480392
## [6,]  0.02286198  0.04875629  0.04659132  0.02338740  0.04568463
##             [,6]
## [1,]  0.02286198
## [2,]  0.04875629
## [3,]  0.04659132
## [4,]  0.02338740
## [5,]  0.04568463
## [6,] -0.18893897
  1. Make a list Delta such that each element of the list is \(J_t \times J_t\) matrix \(\Delta_t\).
Delta[[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
Delta[[T]]
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    1    0
## [6,]    0    0    0    0    0    1
  1. Write a function update_price(logp, X, M, V, beta, sigma, mu, omega, Delta) that receives a price vector \(p_t^{(r)}\) and returns \(p_t^{(r + 1)}\) by: \[ p_t^{(r + 1)} = c_t + \Omega_t(p_t^{(r)}, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t^{(r)}, x_t, \xi_t). \] The returned object should be a vector whose row represents the condition for an inside product of each market. To impose non-negativity constraint on the price vector, we pass log price and exponentiate inside the function. Iterate this until \(\max_{jt}|p_{jt}^{(r + 1)} - p_{jt}^{(r)}| < \lambda\), for example with \(\lambda = 10^{-6}\). This iteration may or may not converge. The convergence depends on the parameters and the realization of the shocks. If the algorithm does not converge, first check the code.
# set the threshold
lambda <- 1e-6
# set the initial price
p <- M[M$j > 0, "p"]
logp <- log(rep(1, dim(p)[1]))
p_new <- update_price(logp, X, M, V, beta, sigma, mu, omega, Delta)
# iterate
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- update_price(log(p_old), X, M, V, beta, sigma, mu, omega, Delta)
  distance <- max(abs(p_new - p_old))
  print(distance)
}
# save
p_actual <- p_new
save(p_actual, file = "data/A5_price_actual.RData")

15.2 Estimate the parameters

  1. Write a function estimate_marginal_cost() that estimate \(c_t\) by the equilibrium condition as: \[ c_t = p_t - \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t) \]

Of course, in reality, we first draw Monte Carlo shocks to approximate the share, estimate the demand parameters, and use these shocks and estimates to estimate the marginal costs. In this assignment, we check the if the estimated marginal costs coincide with the true marginal costs to confirm that the codes are correctly written.

# load
load(file = "data/A5_price_actual.RData")
# take the logarithm
logp <- log(p_actual)
# estimate the marginal cost
marginal_cost_estimate <- estimate_marginal_cost(logp, X, M, V, beta, sigma, mu, omega, Delta)
marginal_cost_actual <- M[M$j > 0, ]$c
# plot the estimate vs actual marginal costs
marginal_cost_df <-
  data.frame(actual = marginal_cost_actual,
             estimate = marginal_cost_estimate)
ggplot(marginal_cost_df, aes(x = estimate, y = actual)) +
  geom_point()

  1. (Optional) Translate compute_indirect_utility, compute_choice_smooth, compute_derivative_share_smooth, update_price into C++ using Rcpp and Eigen. Check that the outputs coincide at the machine precision level. I give you extra 2 points for this task on top of the usual 10 points for this assignment.

15.3 Conduct counterfactual simulation

  1. Suppose that the firm of product 1 owner purchase the firms that own product 2 and 3. Let Delta_counterfactual be the relevant ownership matrix. Make Delta_counterfactual.
Delta_counterfactual[[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    0    0
## [2,]    1    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
Delta_counterfactual[[T]]
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    1    0
## [6,]    0    0    0    0    0    1
  1. Compute the counterfactual price using the iteration with update_price. You can start the iteration from the equilibrium price. Show the average percentage change in the price for each product. In theory, the price of any product should not drop. But some prices can slightly drop because of the numerical errors.
logp <- log(p_actual)
p_new <- update_price(logp, X, M, V, beta, sigma, mu, omega, Delta_counterfactual)
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- update_price(log(p_old), X, M, V, beta, sigma, mu, omega, Delta_counterfactual)
  distance <- max(abs(p_new - p_old))
  print(distance)
}
p_counterfactual <- p_new
save(p_counterfactual, file = "data/A5_price_counterfactual.RData")
j p_change
1 0.0714188
2 0.1408300
3 0.1568128
4 -0.0016428
5 0.0053848
6 -0.0004226
7 0.0020979
8 -0.0033997
9 0.0003623
10 0.0006657
  1. Write a function compute_producer_surplus(p, marginal_cost, X, M, V, beta, sigma, mu, omega) that returns the producer surplus for each product in each market. Compute the actual and counterfactual producer surplus under the estimated marginal costs. Show the average percentage change in the producer surplus for each product.
# compute actual producer surplus
producer_surplus_actual <-
  compute_producer_surplus(p_actual, marginal_cost_estimate, X, M, V, beta, sigma, mu, omega)
summary(producer_surplus_actual)
##        s           
##  Min.   :0.008895  
##  1st Qu.:0.038264  
##  Median :0.068219  
##  Mean   :0.163312  
##  3rd Qu.:0.135748  
##  Max.   :1.678255
# compute counterfactual producer surplus
producer_surplus_counterfactual <-
  compute_producer_surplus(p_counterfactual, marginal_cost_estimate, X, M, V, beta, sigma, mu, omega)
summary(producer_surplus_counterfactual)
##        s           
##  Min.   :0.009512  
##  1st Qu.:0.040473  
##  Median :0.071149  
##  Mean   :0.167135  
##  3rd Qu.:0.140125  
##  Max.   :1.678255
j producer_surplus_change
1 0.0347098
2 0.0118844
3 0.0058072
4 0.0557049
5 0.0686250
6 0.0756544
7 0.0418440
8 0.0073660
9 0.0530648
10 0.0389568
  1. Write a function compute_consumer_surplus(p, X, M, V, beta, sigma, mu, omega) that returns the consumer surplus for each consumer in each market. Compute the actual and counterfactual consumer surplus under the estimated marginal costs. Show the percentage change in the total consumer surplus.
# compute actual consumer surplus
consumer_surplus_actual <- 
  compute_consumer_surplus(p_actual, X, M, V, beta, sigma, mu, omega)
summary(consumer_surplus_actual)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.02023   0.95853   4.14497   4.46279 236.27783
# compute counterfactual consumer surplus
consumer_surplus_counterfactual <- 
  compute_consumer_surplus(p_counterfactual, X, M, V, beta, sigma, mu, omega)
summary(consumer_surplus_counterfactual)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.01637   0.91300   4.11099   4.40189 236.29217
consumer_surplus_change <- 
  (sum(consumer_surplus_counterfactual) - 
     sum(consumer_surplus_actual)) /
  sum(consumer_surplus_actual)
consumer_surplus_change
## [1] -0.008198756