Chapter 15 Assignment 5: Merger Simulation

15.1 Simulate data

We simulate data from a discrete choice model that is the same with in assignment 4 except for that the price is derived from the Nash equlibrium. There are \(T\) markets and each market has \(N\) consumers. There are \(J\) products and the indirect utility of consumer \(i\) in market \(t\) for product \(j\) is: \[ u_{itj} = \beta_{it}' x_j + \alpha_{it} p_{jt} + \xi_{jt} + \epsilon_{ijt}, \] where \(\epsilon_{ijt}\) is an i.i.d. type-I extreme random variable. \(x_j\) is \(K\)-dimensional observed characteristics of the product. \(p_{jt}\) is the retail price of the product in the market.

\(\xi_{jt}\) is product-market specific fixed effect. \(p_{jt}\) can be correlated with \(\xi_{jt}\) but \(x_{jt}\)s are independent of \(\xi_{jt}\). \(j = 0\) is an outside option whose indirect utility is: \[ u_{it0} = \epsilon_{i0t}, \] where \(\epsilon_{i0t}\) is an i.i.d. type-I extreme random variable.

\(\beta_{it}\) and \(\alpha_{it}\) are different across consumers, and they are distributed as: \[ \beta_{itk} = \beta_{0k} + \sigma_k \nu_{itk}, \] \[ \alpha_{it} = - \exp(\mu + \omega \upsilon_{it}) = - \exp(\mu + \frac{\omega^2}{2}) + [- \exp(\mu + \omega \upsilon_{it}) + \exp(\mu + \frac{\omega^2}{2})] \equiv \alpha_0 + \tilde{\alpha}_{it}, \] where \(\nu_{itk}\) for \(k = 1, \cdots, K\) and \(\upsilon_{it}\) are i.i.d. standard normal random variables. \(\alpha_0\) is the mean of \(\alpha_i\) and \(\tilde{\alpha}_i\) is the deviation from the mean.

Given a choice set in the market, \(\mathcal{J}_t \cup \{0\}\), a consumer chooses the alternative that maximizes her utility: \[ q_{ijt} = 1\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \] The choice probability of product \(j\) for consumer \(i\) in market \(t\) is: \[ \sigma_{ijt}(p_t, x_t, \xi_t) = \mathbb{P}\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \]

Suppose that we only observe the (smooth) share data: \[ s_{jt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \sigma_{ijt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \frac{\exp(u_{ijt})}{1 + \sum_{k \in \mathcal{J}_t \cup \{0\}} \exp(u_{ikt})}. \] along with the product-market characteristics \(x_{jt}\) and the retail prices \(p_{jt}\) for \(j \in \mathcal{J}_t \cup \{0\}\) for \(t = 1, \cdots, T\). We do not observe the choice data \(q_{ijt}\) nor shocks \(\xi_{jt}, \nu_{it}, \upsilon_{it}, \epsilon_{ijt}\).

We draw \(\xi_{jt}\) from i.i.d. normal distribution with mean 0 and standard deviation \(\sigma_{\xi}\).

Set the seed, constants, and parameters of interest as follows.

# set the seed
set.seed(1)
# number of products
J <- 10
# dimension of product characteristics including the intercept
K <- 3
# number of markets
T <- 100
# number of consumers per market
N <- 500
# number of Monte Carlo
L <- 500

# set parameters of interests
beta <- rnorm(K); 
beta[1] <- 4
beta

## [1]  4.0000000  0.1836433 -0.8356286

sigma <- abs(rnorm(K)); sigma

## [1] 1.5952808 0.3295078 0.8204684

mu <- 0.5
omega <- 1

Generate the covariates as follows.

The product-market characteristics: \[ x_{j1} = 1, x_{jk} \sim N(0, \sigma_x), k = 2, \cdots, K, \] where \(\sigma_x\) is referred to as sd_x in the code.

The product-market-specific unobserved fixed effect: \[ \xi_{jt} \sim N(0, \sigma_\xi), \] where \(\sigma_xi\) is referred to as sd_xi in the code.

The marginal cost of product \(j\) in market \(t\): \[ c_{jt} \sim \text{logNormal}(0, \sigma_c), \] where \(\sigma_c\) is referred to as sd_c in the code.

The price is determined by a Nash equilibrium. Let \(\Delta_t\) be the \(J_t \times J_t\) ownership matrix in which the \((j, k)\)-th element \(\delta_{tjk}\) is equal to 1 if product \(j\) and \(k\) are owned by the same firm and 0 otherwise. Assume that \(\delta_{tjk} = 1\) if and only if \(j = k\) for all \(t = 1, \cdots, T\), i.e., each firm owns only one product. Next, define \(\Omega_t\) be \(J_t \times J_t\) matrix such that whose \((j, k)\)-the element \(\omega_{tjk}(p_t, x_t, \xi_t, \Delta_t)\) is: \[ \omega_{tjk}(p_t, x_t, \xi_t, \Delta_t) = - \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} \delta_{tjk}. \] Then, the equilibrium price vector \(p_t\) is determined by solving the following equilibrium condition: \[ p_t = c_t + \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t). \]

The value of the auxiliary parameters are set as follows:

# set auxiliary parameters
price_xi <- 1
sd_x <- 2
sd_xi <- 0.5
sd_c <- 0.05
sd_p <- 0.05

X is the data frame such that a row contains the characteristics vector \(x_{j}\) of a product and columns are product index and observed product characteristics. The dimension of the characteristics \(K\) is specified above. Add the row of the outside option whose index is \(0\) and all the characteristics are zero.

# make product characteristics data
X <- 
  matrix(
       sd_x * rnorm(J * (K - 1)), 
       nrow = J
       )
X <- 
  cbind(
       rep(1, J),
       X
       )
colnames(X) <- paste("x", 1:K, sep = "_")
X <- 
  data.frame(j = 1:J, X) %>%
  tibble::as_tibble()
# add outside option
X <- 
  rbind(
  rep(0, dim(X)[2]),
  X
  )

## # A tibble: 11 × 4
##        j   x_1     x_2     x_3
##    <dbl> <dbl>   <dbl>   <dbl>
##  1     0     0  0       0     
##  2     1     1  0.975  -0.0324
##  3     2     1  1.48    1.89  
##  4     3     1  1.15    1.64  
##  5     4     1 -0.611   1.19  
##  6     5     1  3.02    1.84  
##  7     6     1  0.780   1.56  
##  8     7     1 -1.24    0.149 
##  9     8     1 -4.43   -3.98  
## 10     9     1  2.25    1.24  
## 11    10     1 -0.0899 -0.112

M is the data frame such that a row contains the price \(\xi_{jt}\), marginal cost \(c_{jt}\), and price \(p_{jt}\). For now, set \(p_{jt} = 0\) and fill the equilibrium price later. After generating the variables, drop some products in each market. In order to change the number of available products in each market, for each market, first draw \(J_t\) from a discrete uniform distribution between \(1\) and \(J\). Then, drop products from each market using dplyr::sample_frac function with the realized number of available products. The variation in the available products is important for the identification of the distribution of consumer-level unobserved heterogeneity. Add the row of the outside option to each market whose index is \(0\) and all the variables take value zero.

# make market-product data
M <- 
  expand.grid(
    j = 1:J, 
    t = 1:T
    ) %>%
  tibble::as_tibble() %>%
  dplyr::mutate(
    xi = sd_xi * rnorm(J*T),
    c = exp(sd_c * rnorm(J*T)),
    p = 0
  ) 
M <- 
  M %>%
  dplyr::group_by(t) %>%
  dplyr::sample_frac(size = purrr::rdunif(1, J)/J) %>%
  dplyr::ungroup()
# add outside option
outside <- 
  data.frame(
    j = 0, 
    t = 1:T, 
    xi = 0, 
    c = 0, 
    p = 0
    )
M <- 
  rbind(
    M,
    outside
    ) %>%
  dplyr::arrange(
    t, 
    j
    )

## # A tibble: 696 × 5
##        j     t      xi     c     p
##    <dbl> <int>   <dbl> <dbl> <dbl>
##  1     0     1  0      0         0
##  2     2     1 -0.735  1.04      0
##  3     6     1 -0.0514 0.980     0
##  4     7     1  0.194  0.961     0
##  5     8     1 -0.0269 0.989     0
##  6     0     2  0      0         0
##  7     1     2 -0.197  0.988     0
##  8     2     2 -0.0297 1.04      0
##  9     4     2  0.382  1.00      0
## 10     5     2 -0.0823 1.02      0
## # ℹ 686 more rows

Generate the consumer-level heterogeneity. V is the data frame such that a row contains the vector of shocks to consumer-level heterogeneity, \((\nu_{i}', \upsilon_i)\). They are all i.i.d. standard normal random variables.

# make consumer-market data
V <- 
  matrix(
    rnorm(N * T * (K + 1)), 
    nrow = N * T
    ) 
colnames(V) <- 
  c(
    paste("v_x", 1:K, sep = "_"), 
    "v_p"
    )
V <- 
  data.frame(
  expand.grid(
    i = 1:N, 
    t = 1:T
    ),
  V
  ) %>%
  tibble::as_tibble()

## # A tibble: 50,000 × 6
##        i     t   v_x_1   v_x_2   v_x_3     v_p
##    <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
##  1     1     1  0.448   0.985   0.611   0.408 
##  2     2     1 -0.386  -0.389  -1.11   -0.667 
##  3     3     1  0.0567  0.0510  0.0329 -0.119 
##  4     4     1  0.585   0.303   0.860  -1.20  
##  5     5     1 -0.449  -1.17    0.599   0.212 
##  6     6     1 -0.782   0.0596  1.30    0.485 
##  7     7     1  1.60   -1.62   -1.91    0.669 
##  8     8     1 -1.65    2.10    0.726  -0.108 
##  9     9     1 -0.848  -0.933   2.29   -0.0195
## 10    10     1 -0.130  -0.897  -1.90   -0.185 
## # ℹ 49,990 more rows

We use compute_indirect_utility(df, beta, sigma, mu, omega), compute_choice_smooth(X, M, V, beta, sigma, mu, omega), and compute_share_smooth(X, M, V, beta, sigma, mu, omega) to compute \(s_t(p_t, x_t, \xi_t)\). On top of this, we need a function compute_derivative_share_smooth(X, M, V, beta, sigma, mu, omega) that approximate:

\[ \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} = \begin{cases} \frac{1}{N} \sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)[1 - \sigma_{ijt}(p_t, x_t, \xi_t)] &\text{ if } j = k\\ - \frac{1}{N}\sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)\sigma_{ikt}(p_t, x_t, \xi_t)] &\text{ if } j \neq k. \end{cases} \]

The returned object should be a list across markets and each element of the list should be \(J_t \times J_t\) matrix whose \((j, k)\)-th element is \(\partial s_{jt}/\partial p_{it}\) (do not include the outside option). The computation will be looped across markets. I recommend to use a parallel computing for this loop.

derivative_share_smooth <-
  compute_derivative_share_smooth(
    X = X,
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )

derivative_share_smooth[[1]]

##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.15195456  0.07243405  0.04067066  0.03615775
## [2,]  0.07243405 -0.21334454  0.06758143  0.06900886
## [3,]  0.04067066  0.06758143 -0.22465048  0.11247770
## [4,]  0.03615775  0.06900886  0.11247770 -0.22508246

derivative_share_smooth[[T]]

##               [,1]         [,2]         [,3]         [,4]         [,5]
##  [1,] -0.096590746  0.003580709  0.003713348  0.011204782  0.012198589
##  [2,]  0.003580709 -0.054948229  0.003309376  0.009438704  0.009893682
##  [3,]  0.003713348  0.003309376 -0.055735044  0.009505457  0.009233146
##  [4,]  0.011204782  0.009438704  0.009505457 -0.160285167  0.021922796
##  [5,]  0.012198589  0.009893682  0.009233146  0.021922796 -0.136324417
##  [6,]  0.008403911  0.007465135  0.007249620  0.022645270  0.019978806
##  [7,]  0.006573094  0.003856270  0.004084570  0.016649978  0.009219389
##  [8,]  0.033096170  0.006496123  0.007885498  0.038995016  0.016868502
##  [9,]  0.013495765  0.008866219  0.008606532  0.022541027  0.030984223
## [10,]  0.003846416  0.001815922  0.001912074  0.006631000  0.005317866
##               [,6]         [,7]         [,8]         [,9]        [,10]
##  [1,]  0.008403911  0.006573094  0.033096170  0.013495765  0.003846416
##  [2,]  0.007465135  0.003856270  0.006496123  0.008866219  0.001815922
##  [3,]  0.007249620  0.004084570  0.007885498  0.008606532  0.001912074
##  [4,]  0.022645270  0.016649978  0.038995016  0.022541027  0.006631000
##  [5,]  0.019978806  0.009219389  0.016868502  0.030984223  0.005317866
##  [6,] -0.119320239  0.009873958  0.019723897  0.018980139  0.004460069
##  [7,]  0.009873958 -0.101062317  0.035849014  0.010662149  0.003862923
##  [8,]  0.019723897  0.035849014 -0.207076797  0.027381997  0.018831945
##  [9,]  0.018980139  0.010662149  0.027381997 -0.148409509  0.006094688
## [10,]  0.004460069  0.003862923  0.018831945  0.006094688 -0.053007318

Make a list delta such that each element of the list is \(J_t \times J_t\) matrix \(\delta_t\).

delta[[1]]

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

delta[[T]]

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    0    0    0    0    0    0    0    0     0
##  [2,]    0    1    0    0    0    0    0    0    0     0
##  [3,]    0    0    1    0    0    0    0    0    0     0
##  [4,]    0    0    0    1    0    0    0    0    0     0
##  [5,]    0    0    0    0    1    0    0    0    0     0
##  [6,]    0    0    0    0    0    1    0    0    0     0
##  [7,]    0    0    0    0    0    0    1    0    0     0
##  [8,]    0    0    0    0    0    0    0    1    0     0
##  [9,]    0    0    0    0    0    0    0    0    1     0
## [10,]    0    0    0    0    0    0    0    0    0     1

Write a function update_price(logp, X, M, V, beta, sigma, mu, omega, delta) that receives a price vector \(p_t^{(r)}\) and returns \(p_t^{(r + 1)}\) by: \[ p_t^{(r + 1)} = c_t + \Omega_t(p_t^{(r)}, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t^{(r)}, x_t, \xi_t). \] The returned object should be a vector whose row represents the condition for an inside product of each market. To impose non-negativity constraint on the price vector, we pass log price and exponentiate inside the function. Iterate this until \(\max_{jt}|p_{jt}^{(r + 1)} - p_{jt}^{(r)}| < \lambda\), for example with \(\lambda = 10^{-6}\). This iteration may or may not converge. The convergence depends on the parameters and the realization of the shocks. If the algorithm does not converge, first check the code.

# set the threshold
lambda <- 1e-6
# set the initial price
p <- M[M$j > 0, "p"]
logp <- log(rep(1, dim(p)[1]))
p_new <- 
  update_price(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta
    )
# iterate
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- 
    update_price(
      log(p_old), 
      X, 
      M, 
      V, 
      beta, 
      sigma, 
      mu, 
      omega, 
      delta
      )
  distance <- max(abs(p_new - p_old))
  print(distance)
}
# save
p_actual <- p_new
saveRDS(
  p_actual, 
  file = "lecture/data/a5/price_actual.rds" %>% here::here()
)

# load
p_actual <- 
  readRDS(
    file = "lecture/data/a5/price_actual.rds" %>% here::here()
  )
p_actual %>% head()

##          [,1]
## [1,] 2.009611
## [2,] 2.024696
## [3,] 2.099574
## [4,] 6.374277
## [5,] 1.664490
## [6,] 1.808518

15.2 Estimate the parameters

Write a function estimate_marginal_cost() that estimate \(c_t\) by the equilibrium condition as: \[ c_t = p_t - \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t) \]

Of course, in reality, we first draw Monte Carlo shocks to approximate the share, estimate the demand parameters, and use these shocks and estimates to estimate the marginal costs. In this assignment, we check the if the estimated marginal costs coincide with the true marginal costs to confirm that the codes are correctly written.

# take the logarithm
logp <- log(p_actual)
# estimate the marginal cost
marginal_cost_estimate <- 
  estimate_marginal_cost(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta)
marginal_cost_actual <- M[M$j > 0, ]$c
# plot the estimate vs actual marginal costs
marginal_cost_df <-
  data.frame(
    actual = marginal_cost_actual,
    estimate = marginal_cost_estimate
    )
ggplot(
  marginal_cost_df, 
  aes(
    x = estimate, 
    y = actual
    )
  ) +
  geom_point() + 
  theme_classic()

(Optional) Translate compute_indirect_utility, compute_choice_smooth, compute_derivative_share_smooth, update_price into C++ using Rcpp and Eigen. Check that the outputs coincide at the machine precision level. I give you extra 2 points for this task on top of the usual 10 points for this assignment.

15.3 Conduct counterfactual simulation

Suppose that the firm of product 1 owner purchase the firms that own product 2 and 3. Let delta_counterfactual be the relevant ownership matrix. Make delta_counterfactual.

delta_counterfactual[[1]]

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

delta_counterfactual[[T]]

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    1    0    0    0    0    0    0     0
##  [2,]    1    1    1    0    0    0    0    0    0     0
##  [3,]    1    1    1    0    0    0    0    0    0     0
##  [4,]    0    0    0    1    0    0    0    0    0     0
##  [5,]    0    0    0    0    1    0    0    0    0     0
##  [6,]    0    0    0    0    0    1    0    0    0     0
##  [7,]    0    0    0    0    0    0    1    0    0     0
##  [8,]    0    0    0    0    0    0    0    1    0     0
##  [9,]    0    0    0    0    0    0    0    0    1     0
## [10,]    0    0    0    0    0    0    0    0    0     1

Compute the counterfactual price using the iteration with update_price. You can start the iteration from the equilibrium price. Show the average percentage change in the price for each product. In theory, the price of any product should not drop. But some prices can slightly drop because of the numerical errors.

logp <- log(p_actual)
p_new <- 
  update_price(
    logp = logp, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega, 
    delta = delta_counterfactual
    )
distance <- 10000
while (distance > lambda) {
  p_old <- p_new
  p_new <- 
    update_price(
      log(p_old), 
      X, 
      M, 
      V, 
      beta, 
      sigma, 
      mu, 
      omega, 
      delta_counterfactual
      )
  distance <- max(abs(p_new - p_old))
  print(distance)
}
p_counterfactual <- p_new
saveRDS(
  p_counterfactual, 
  file = "lecture/data/a5/price_counterfactual.rds" %>% here::here()
)

j	p_change
1	0.0501744
2	0.1179955
3	0.1479700
4	0.0013059
5	0.0004309
6	-0.0002448
7	0.0008458
8	-0.0033448
9	0.0004042
10	0.0015040

Write a function compute_producer_surplus(p, marginal_cost, X, M, V, beta, sigma, mu, omega) that returns the producer surplus for each product in each market. Compute the actual and counterfactual producer surplus under the estimated marginal costs. Show the average percentage change in the producer surplus for each product.

# compute actual producer surplus
producer_surplus_actual <-
  compute_producer_surplus(
    p = p_actual, 
    marginal_cost = marginal_cost_estimate, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(producer_surplus_actual)

##        s           
##  Min.   :0.008247  
##  1st Qu.:0.041636  
##  Median :0.071153  
##  Mean   :0.168761  
##  3rd Qu.:0.150643  
##  Max.   :1.607658

# compute counterfactual producer surplus
producer_surplus_counterfactual <-
  compute_producer_surplus(
    p = p_counterfactual, 
    marginal_cost = marginal_cost_estimate, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(producer_surplus_counterfactual)

##        s           
##  Min.   :0.008212  
##  1st Qu.:0.042687  
##  Median :0.073644  
##  Mean   :0.171809  
##  3rd Qu.:0.153746  
##  Max.   :1.607658

j	producer_surplus_change
1	0.0231358
2	0.0131566
3	0.0073846
4	0.0534174
5	0.0449804
6	0.0563809
7	0.0372859
8	0.0072057
9	0.0421801
10	0.0385013

Write a function compute_consumer_surplus(p, X, M, V, beta, sigma, mu, omega) that returns the consumer surplus for each consumer in each market. Compute the actual and counterfactual consumer surplus under the estimated marginal costs. Show the percentage change in the total consumer surplus.

# compute actual consumer surplus
consumer_surplus_actual <- 
  compute_consumer_surplus(
    p = p_actual, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(consumer_surplus_actual)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.03447   1.10805   4.33269   4.65720 230.98190

# compute counterfactual consumer surplus
consumer_surplus_counterfactual <- 
  compute_consumer_surplus(
    p = p_counterfactual, 
    X = X, 
    M = M, 
    V = V, 
    beta = beta, 
    sigma = sigma, 
    mu = mu, 
    omega = omega
    )
summary(consumer_surplus_counterfactual)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.03054   1.07265   4.31048   4.61989 231.02355

consumer_surplus_change <- 
  (sum(consumer_surplus_counterfactual) - 
     sum(consumer_surplus_actual)) /
  sum(consumer_surplus_actual)
consumer_surplus_change

## [1] -0.005127025