Chapter 15 Assignment 5: Merger Simulation

15.1 Simulate data

We simulate data from a discrete choice model that is the same with in assignment 4 except for that the price is derived from the Nash equlibrium. There are T markets and each market has N consumers. There are J products and the indirect utility of consumer i in market t for product j is: uitj=βitxj+αitpjt+ξjt+ϵijt, where ϵijt is an i.i.d. type-I extreme random variable. xj is K-dimensional observed characteristics of the product. pjt is the retail price of the product in the market.

ξjt is product-market specific fixed effect. pjt can be correlated with ξjt but xjts are independent of ξjt. j=0 is an outside option whose indirect utility is: uit0=ϵi0t, where ϵi0t is an i.i.d. type-I extreme random variable.

βit and αit are different across consumers, and they are distributed as: βitk=β0k+σkνitk, αit=exp(μ+ωυit)=exp(μ+ω22)+[exp(μ+ωυit)+exp(μ+ω22)]α0+˜αit, where νitk for k=1,,K and υit are i.i.d. standard normal random variables. α0 is the mean of αi and ˜αi is the deviation from the mean.

Given a choice set in the market, Jt{0}, a consumer chooses the alternative that maximizes her utility: qijt=1{uijt=max The choice probability of product j for consumer i in market t is: \sigma_{ijt}(p_t, x_t, \xi_t) = \mathbb{P}\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}.

Suppose that we only observe the (smooth) share data: s_{jt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \sigma_{ijt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \frac{\exp(u_{ijt})}{1 + \sum_{k \in \mathcal{J}_t \cup \{0\}} \exp(u_{ikt})}. along with the product-market characteristics x_{jt} and the retail prices p_{jt} for j \in \mathcal{J}_t \cup \{0\} for t = 1, \cdots, T. We do not observe the choice data q_{ijt} nor shocks \xi_{jt}, \nu_{it}, \upsilon_{it}, \epsilon_{ijt}.

We draw \xi_{jt} from i.i.d. normal distribution with mean 0 and standard deviation \sigma_{\xi}.

  1. Set the seed, constants, and parameters of interest as follows.
# set the seed
set.seed(1)
# number of products
J <- 10
# dimension of product characteristics including the intercept
K <- 3
# number of markets
T <- 100
# number of consumers per market
N <- 500
# number of Monte Carlo
L <- 500
# set parameters of interests
beta <- rnorm(K); 
beta[1] <- 4
beta
## [1]  4.0000000  0.1836433 -0.8356286
sigma <- abs(rnorm(K)); sigma
## [1] 1.5952808 0.3295078 0.8204684
mu <- 0.5
omega <- 1

Generate the covariates as follows.

The product-market characteristics: x_{j1} = 1, x_{jk} \sim N(0, \sigma_x), k = 2, \cdots, K, where \sigma_x is referred to as sd_x in the code.

The product-market-specific unobserved fixed effect: \xi_{jt} \sim N(0, \sigma_\xi), where \sigma_xi is referred to as sd_xi in the code.

The marginal cost of product j in market t: c_{jt} \sim \text{logNormal}(0, \sigma_c), where \sigma_c is referred to as sd_c in the code.

The price is determined by a Nash equilibrium. Let \Delta_t be the J_t \times J_t ownership matrix in which the (j, k)-th element \delta_{tjk} is equal to 1 if product j and k are owned by the same firm and 0 otherwise. Assume that \delta_{tjk} = 1 if and only if j = k for all t = 1, \cdots, T, i.e., each firm owns only one product. Next, define \Omega_t be J_t \times J_t matrix such that whose (j, k)-the element \omega_{tjk}(p_t, x_t, \xi_t, \Delta_t) is: \omega_{tjk}(p_t, x_t, \xi_t, \Delta_t) = - \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} \delta_{tjk}. Then, the equilibrium price vector p_t is determined by solving the following equilibrium condition: p_t = c_t + \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t).

The value of the auxiliary parameters are set as follows:

# set auxiliary parameters
price_xi <- 1
sd_x <- 2
sd_xi <- 0.5
sd_c <- 0.05
sd_p <- 0.05
  1. X is the data frame such that a row contains the characteristics vector x_{j} of a product and columns are product index and observed product characteristics. The dimension of the characteristics K is specified above. Add the row of the outside option whose index is 0 and all the characteristics are zero.
# make product characteristics data
X <- 
  matrix(
       sd_x * rnorm(J * (K - 1)), 
       nrow = J
       )
X <- 
  cbind(
       rep(1, J),
       X
       )
colnames(X) <- paste("x", 1:K, sep = "_")
X <- 
  data.frame(j = 1:J, X) %>%
  tibble::as_tibble()
# add outside option
X <- 
  rbind(
  rep(0, dim(X)[2]),
  X
  ) 
X
## # A tibble: 11 × 4
##        j   x_1     x_2     x_3
##    <dbl> <dbl>   <dbl>   <dbl>
##  1     0     0  0       0     
##  2     1     1  0.975  -0.0324
##  3     2     1  1.48    1.89  
##  4     3     1  1.15    1.64  
##  5     4     1 -0.611   1.19  
##  6     5     1  3.02    1.84  
##  7     6     1  0.780   1.56  
##  8     7     1 -1.24    0.149 
##  9     8     1 -4.43   -3.98  
## 10     9     1  2.25    1.24  
## 11    10     1 -0.0899 -0.112
  1. M is the data frame such that a row contains the price \xi_{jt}, marginal cost c_{jt}, and price p_{jt}. For now, set p_{jt} = 0 and fill the equilibrium price later. After generating the variables, drop some products in each market. In order to change the number of available products in each market, for each market, first draw J_t from a discrete uniform distribution between 1 and J. Then, drop products from each market using dplyr::sample_frac function with the realized number of available products. The variation in the available products is important for the identification of the distribution of consumer-level unobserved heterogeneity. Add the row of the outside option to each market whose index is 0 and all the variables take value zero.
# make market-product data
M <- 
  expand.grid(
    j = 1:J, 
    t = 1:T
    ) %>%
  tibble::as_tibble() %>%
  dplyr::mutate(
    xi = sd_xi * rnorm(J*T),
    c = exp(sd_c * rnorm(J*T)),
    p = 0
  ) 
M <- 
  M %>%
  dplyr::group_by(t) %>%
  dplyr::sample_frac(size = purrr::rdunif(1, J)/J) %>%
  dplyr::ungroup()
# add outside option
outside <- 
  data.frame(
    j = 0, 
    t = 1:T, 
    xi = 0, 
    c = 0, 
    p = 0
    )
M <- 
  rbind(
    M,
    outside
    ) %>%
  dplyr::arrange(
    t, 
    j
    )
M
## # A tibble: 696 × 5
##        j     t      xi     c     p
##    <dbl> <int>   <dbl> <dbl> <dbl>
##  1     0     1  0      0         0
##  2     2     1 -0.735  1.04      0
##  3     6     1 -0.0514 0.980     0
##  4     7     1  0.194  0.961     0
##  5     8     1 -0.0269 0.989     0
##  6     0     2  0      0         0
##  7     1     2 -0.197  0.988     0
##  8     2     2 -0.0297 1.04      0
##  9     4     2  0.382  1.00      0
## 10     5     2 -0.0823 1.02      0
## # ℹ 686 more rows
  1. Generate the consumer-level heterogeneity. V is the data frame such that a row contains the vector of shocks to consumer-level heterogeneity, (\nu_{i}', \upsilon_i). They are all i.i.d. standard normal random variables.
# make consumer-market data
V <- 
  matrix(
    rnorm(N * T * (K + 1)), 
    nrow = N * T
    ) 
colnames(V) <- 
  c(
    paste("v_x", 1:K, sep = "_"), 
    "v_p"
    )
V <- 
  data.frame(
  expand.grid(
    i = 1:N, 
    t = 1:T
    ),
  V
  ) %>%
  tibble::as_tibble()
V
## # A tibble: 50,000 × 6
##        i     t   v_x_1   v_x_2   v_x_3     v_p
##    <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
##  1     1     1  0.448   0.985   0.611   0.408 
##  2     2     1 -0.386  -0.389  -1.11   -0.667 
##  3     3     1  0.0567  0.0510  0.0329 -0.119 
##  4     4     1  0.585   0.303   0.860  -1.20  
##  5     5     1 -0.449  -1.17    0.599   0.212 
##  6     6     1 -0.782   0.0596  1.30    0.485 
##  7     7     1  1.60   -1.62   -1.91    0.669 
##  8     8     1 -1.65    2.10    0.726  -0.108 
##  9     9     1 -0.848  -0.933   2.29   -0.0195
## 10    10     1 -0.130  -0.897  -1.90   -0.185 
## # ℹ 49,990 more rows
  1. We use compute_indirect_utility(df, beta, sigma, mu, omega), compute_choice_smooth(X, M, V, beta, sigma, mu, omega), and compute_share_smooth(X, M, V, beta, sigma, mu, omega) to compute s_t(p_t, x_t, \xi_t). On top of this, we need a function compute_derivative_share_smooth(X, M, V, beta, sigma, mu, omega) that approximate:

\frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} = \begin{cases} \frac{1}{N} \sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)[1 - \sigma_{ijt}(p_t, x_t, \xi_t)] &\text{ if } j = k\\ - \frac{1}{N}\sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)\sigma_{ikt}(p_t, x_t, \xi_t)] &\text{ if } j \neq k. \end{cases}

The returned object should be a list across markets and each element of the list should be J_t \times J_t matrix whose (j, k)-th element is \partial s_{jt}/\partial p_{it} (do not include the outside option). The computation will be looped across markets. I recommend to use a parallel computing for this loop.

derivative_share_smooth <-
  compute_derivative_share_smooth(
    X = X,
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
derivative_share_smooth[[1]]
##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.15195456  0.07243405  0.04067066  0.03615775
## [2,]  0.07243405 -0.21334454  0.06758143  0.06900886
## [3,]  0.04067066  0.06758143 -0.22465048  0.11247770
## [4,]  0.03615775  0.06900886  0.11247770 -0.22508246
derivative_share_smooth[[T]]
##               [,1]         [,2]         [,3]         [,4]         [,5]
##  [1,] -0.096590746  0.003580709  0.003713348  0.011204782  0.012198589
##  [2,]  0.003580709 -0.054948229  0.003309376  0.009438704  0.009893682
##  [3,]  0.003713348  0.003309376 -0.055735044  0.009505457  0.009233146
##  [4,]  0.011204782  0.009438704  0.009505457 -0.160285167  0.021922796
##  [5,]  0.012198589  0.009893682  0.009233146  0.021922796 -0.136324417
##  [6,]  0.008403911  0.007465135  0.007249620  0.022645270  0.019978806
##  [7,]  0.006573094  0.003856270  0.004084570  0.016649978  0.009219389
##  [8,]  0.033096170  0.006496123  0.007885498  0.038995016  0.016868502
##  [9,]  0.013495765  0.008866219  0.008606532  0.022541027  0.030984223
## [10,]  0.003846416  0.001815922  0.001912074  0.006631000  0.005317866
##               [,6]         [,7]         [,8]         [,9]        [,10]
##  [1,]  0.008403911  0.006573094  0.033096170  0.013495765  0.003846416
##  [2,]  0.007465135  0.003856270  0.006496123  0.008866219  0.001815922
##  [3,]  0.007249620  0.004084570  0.007885498  0.008606532  0.001912074
##  [4,]  0.022645270  0.016649978  0.038995016  0.022541027  0.006631000
##  [5,]  0.019978806  0.009219389  0.016868502  0.030984223  0.005317866
##  [6,] -0.119320239  0.009873958  0.019723897  0.018980139  0.004460069
##  [7,]  0.009873958 -0.101062317  0.035849014  0.010662149  0.003862923
##  [8,]  0.019723897  0.035849014 -0.207076797  0.027381997  0.018831945
##  [9,]  0.018980139  0.010662149  0.027381997 -0.148409509  0.006094688
## [10,]  0.004460069  0.003862923  0.018831945  0.006094688 -0.053007318
  1. Make a list delta such that each element of the list is J_t \times J_t matrix \delta_t.
delta[[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
delta[[T]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    0    0    0    0    0    0    0    0     0
##  [2,]    0    1    0    0    0    0    0    0    0     0
##  [3,]    0    0    1    0    0    0    0    0    0     0
##  [4,]    0    0    0    1    0    0    0    0    0     0
##  [5,]    0    0    0    0    1    0    0    0    0     0
##  [6,]    0    0    0    0    0    1    0    0    0     0
##  [7,]    0    0    0    0    0    0    1    0    0     0
##  [8,]    0    0    0    0    0    0    0    1    0     0
##  [9,]    0    0    0    0    0    0    0    0    1     0
## [10,]    0    0    0    0    0    0    0    0    0     1
  1. Write a function update_price(logp, X, M, V, beta, sigma, mu, omega, delta) that receives a price vector p_t^{(r)} and returns p_t^{(r + 1)} by: p_t^{(r + 1)} = c_t + \Omega_t(p_t^{(r)}, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t^{(r)}, x_t, \xi_t). The returned object should be a vector whose row represents the condition for an inside product of each market. To impose non-negativity constraint on the price vector, we pass log price and exponentiate inside the function. Iterate this until \max_{jt}|p_{jt}^{(r + 1)} - p_{jt}^{(r)}| < \lambda, for example with \lambda = 10^{-6}. This iteration may or may not converge. The convergence depends on the parameters and the realization of the shocks. If the algorithm does not converge, first check the code.
# set the threshold
lambda <- 1e-6
# set the initial price
p <- M[M$j > 0, "p"]
logp <- log(rep(1, dim(p)[1]))
p_new <- 
  update_price(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta
    )
# iterate
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- 
    update_price(
      log(p_old), 
      X, 
      M, 
      V, 
      beta, 
      sigma, 
      mu, 
      omega, 
      delta
      )
  distance <- max(abs(p_new - p_old))
  print(distance)
}
# save
p_actual <- p_new
saveRDS(
  p_actual, 
  file = "lecture/data/a5/price_actual.rds" %>% here::here()
)
# load
p_actual <- 
  readRDS(
    file = "lecture/data/a5/price_actual.rds" %>% here::here()
  )
p_actual %>% head()
##          [,1]
## [1,] 2.009611
## [2,] 2.024696
## [3,] 2.099574
## [4,] 6.374277
## [5,] 1.664490
## [6,] 1.808518

15.2 Estimate the parameters

  1. Write a function estimate_marginal_cost() that estimate c_t by the equilibrium condition as: c_t = p_t - \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t)

Of course, in reality, we first draw Monte Carlo shocks to approximate the share, estimate the demand parameters, and use these shocks and estimates to estimate the marginal costs. In this assignment, we check the if the estimated marginal costs coincide with the true marginal costs to confirm that the codes are correctly written.

# take the logarithm
logp <- log(p_actual)
# estimate the marginal cost
marginal_cost_estimate <- 
  estimate_marginal_cost(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta)
marginal_cost_actual <- M[M$j > 0, ]$c
# plot the estimate vs actual marginal costs
marginal_cost_df <-
  data.frame(
    actual = marginal_cost_actual,
    estimate = marginal_cost_estimate
    )
ggplot(
  marginal_cost_df, 
  aes(
    x = estimate, 
    y = actual
    )
  ) +
  geom_point() + 
  theme_classic()

  1. (Optional) Translate compute_indirect_utility, compute_choice_smooth, compute_derivative_share_smooth, update_price into C++ using Rcpp and Eigen. Check that the outputs coincide at the machine precision level. I give you extra 2 points for this task on top of the usual 10 points for this assignment.

15.3 Conduct counterfactual simulation

  1. Suppose that the firm of product 1 owner purchase the firms that own product 2 and 3. Let delta_counterfactual be the relevant ownership matrix. Make delta_counterfactual.
delta_counterfactual[[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
delta_counterfactual[[T]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    1    0    0    0    0    0    0     0
##  [2,]    1    1    1    0    0    0    0    0    0     0
##  [3,]    1    1    1    0    0    0    0    0    0     0
##  [4,]    0    0    0    1    0    0    0    0    0     0
##  [5,]    0    0    0    0    1    0    0    0    0     0
##  [6,]    0    0    0    0    0    1    0    0    0     0
##  [7,]    0    0    0    0    0    0    1    0    0     0
##  [8,]    0    0    0    0    0    0    0    1    0     0
##  [9,]    0    0    0    0    0    0    0    0    1     0
## [10,]    0    0    0    0    0    0    0    0    0     1
  1. Compute the counterfactual price using the iteration with update_price. You can start the iteration from the equilibrium price. Show the average percentage change in the price for each product. In theory, the price of any product should not drop. But some prices can slightly drop because of the numerical errors.
logp <- log(p_actual)
p_new <- 
  update_price(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta_counterfactual
    )
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- 
    update_price(
      log(p_old), 
      X, 
      M, 
      V, 
      beta, 
      sigma, 
      mu, 
      omega, 
      delta_counterfactual
      )
  distance <- max(abs(p_new - p_old))
  print(distance)
}
p_counterfactual <- p_new
saveRDS(
  p_counterfactual, 
  file = "lecture/data/a5/price_counterfactual.rds" %>% here::here()
)
j p_change
1 0.0501744
2 0.1179955
3 0.1479700
4 0.0013059
5 0.0004309
6 -0.0002448
7 0.0008458
8 -0.0033448
9 0.0004042
10 0.0015040
  1. Write a function compute_producer_surplus(p, marginal_cost, X, M, V, beta, sigma, mu, omega) that returns the producer surplus for each product in each market. Compute the actual and counterfactual producer surplus under the estimated marginal costs. Show the average percentage change in the producer surplus for each product.
# compute actual producer surplus
producer_surplus_actual <-
  compute_producer_surplus(
    p = p_actual, 
    marginal_cost = marginal_cost_estimate, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(producer_surplus_actual)
##        s           
##  Min.   :0.008247  
##  1st Qu.:0.041636  
##  Median :0.071153  
##  Mean   :0.168761  
##  3rd Qu.:0.150643  
##  Max.   :1.607658
# compute counterfactual producer surplus
producer_surplus_counterfactual <-
  compute_producer_surplus(
    p = p_counterfactual, 
    marginal_cost = marginal_cost_estimate, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(producer_surplus_counterfactual)
##        s           
##  Min.   :0.008212  
##  1st Qu.:0.042687  
##  Median :0.073644  
##  Mean   :0.171809  
##  3rd Qu.:0.153746  
##  Max.   :1.607658
j producer_surplus_change
1 0.0231358
2 0.0131566
3 0.0073846
4 0.0534174
5 0.0449804
6 0.0563809
7 0.0372859
8 0.0072057
9 0.0421801
10 0.0385013
  1. Write a function compute_consumer_surplus(p, X, M, V, beta, sigma, mu, omega) that returns the consumer surplus for each consumer in each market. Compute the actual and counterfactual consumer surplus under the estimated marginal costs. Show the percentage change in the total consumer surplus.
# compute actual consumer surplus
consumer_surplus_actual <- 
  compute_consumer_surplus(
    p = p_actual, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(consumer_surplus_actual)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.03447   1.10805   4.33269   4.65720 230.98190
# compute counterfactual consumer surplus
consumer_surplus_counterfactual <- 
  compute_consumer_surplus(
    p = p_counterfactual, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(consumer_surplus_counterfactual)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.03054   1.07265   4.31048   4.61989 231.02355
consumer_surplus_change <- 
  (sum(consumer_surplus_counterfactual) - 
     sum(consumer_surplus_actual)) /
  sum(consumer_surplus_actual)
consumer_surplus_change
## [1] -0.005127025