Chapter 15 Assignment 5: Merger Simulation
15.1 Simulate data
We simulate data from a discrete choice model that is the same with in assignment 4 except for that the price is derived from the Nash equlibrium. There are \(T\) markets and each market has \(N\) consumers. There are \(J\) products and the indirect utility of consumer \(i\) in market \(t\) for product \(j\) is: \[ u_{itj} = \beta_{it}' x_j + \alpha_{it} p_{jt} + \xi_{jt} + \epsilon_{ijt}, \] where \(\epsilon_{ijt}\) is an i.i.d. type-I extreme random variable. \(x_j\) is \(K\)-dimensional observed characteristics of the product. \(p_{jt}\) is the retail price of the product in the market.
\(\xi_{jt}\) is product-market specific fixed effect. \(p_{jt}\) can be correlated with \(\xi_{jt}\) but \(x_{jt}\)s are independent of \(\xi_{jt}\). \(j = 0\) is an outside option whose indirect utility is: \[ u_{it0} = \epsilon_{i0t}, \] where \(\epsilon_{i0t}\) is an i.i.d. type-I extreme random variable.
\(\beta_{it}\) and \(\alpha_{it}\) are different across consumers, and they are distributed as: \[ \beta_{itk} = \beta_{0k} + \sigma_k \nu_{itk}, \] \[ \alpha_{it} = - \exp(\mu + \omega \upsilon_{it}) = - \exp(\mu + \frac{\omega^2}{2}) + [- \exp(\mu + \omega \upsilon_{it}) + \exp(\mu + \frac{\omega^2}{2})] \equiv \alpha_0 + \tilde{\alpha}_{it}, \] where \(\nu_{itk}\) for \(k = 1, \cdots, K\) and \(\upsilon_{it}\) are i.i.d. standard normal random variables. \(\alpha_0\) is the mean of \(\alpha_i\) and \(\tilde{\alpha}_i\) is the deviation from the mean.
Given a choice set in the market, \(\mathcal{J}_t \cup \{0\}\), a consumer chooses the alternative that maximizes her utility: \[ q_{ijt} = 1\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \] The choice probability of product \(j\) for consumer \(i\) in market \(t\) is: \[ \sigma_{ijt}(p_t, x_t, \xi_t) = \mathbb{P}\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}. \]
Suppose that we only observe the (smooth) share data: \[ s_{jt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \sigma_{ijt}(p_t, x_t, \xi_t) = \frac{1}{N} \sum_{i = 1}^N \frac{\exp(u_{ijt})}{1 + \sum_{k \in \mathcal{J}_t \cup \{0\}} \exp(u_{ikt})}. \] along with the product-market characteristics \(x_{jt}\) and the retail prices \(p_{jt}\) for \(j \in \mathcal{J}_t \cup \{0\}\) for \(t = 1, \cdots, T\). We do not observe the choice data \(q_{ijt}\) nor shocks \(\xi_{jt}, \nu_{it}, \upsilon_{it}, \epsilon_{ijt}\).
We draw \(\xi_{jt}\) from i.i.d. normal distribution with mean 0 and standard deviation \(\sigma_{\xi}\).
- Set the seed, constants, and parameters of interest as follows.
# set the seed
set.seed(1)
# number of products
J <- 10
# dimension of product characteristics including the intercept
K <- 3
# number of markets
T <- 100
# number of consumers per market
N <- 500
# number of Monte Carlo
L <- 500
## [1] 4.0000000 0.1836433 -0.8356286
## [1] 1.5952808 0.3295078 0.8204684
Generate the covariates as follows.
The product-market characteristics:
\[
x_{j1} = 1, x_{jk} \sim N(0, \sigma_x), k = 2, \cdots, K,
\]
where \(\sigma_x\) is referred to as sd_x
in the code.
The product-market-specific unobserved fixed effect:
\[
\xi_{jt} \sim N(0, \sigma_\xi),
\]
where \(\sigma_xi\) is referred to as sd_xi
in the code.
The marginal cost of product \(j\) in market \(t\):
\[
c_{jt} \sim \text{logNormal}(0, \sigma_c),
\]
where \(\sigma_c\) is referred to as sd_c
in the code.
The price is determined by a Nash equilibrium. Let \(\Delta_t\) be the \(J_t \times J_t\) ownership matrix in which the \((j, k)\)-th element \(\delta_{tjk}\) is equal to 1 if product \(j\) and \(k\) are owned by the same firm and 0 otherwise. Assume that \(\delta_{tjk} = 1\) if and only if \(j = k\) for all \(t = 1, \cdots, T\), i.e., each firm owns only one product. Next, define \(\Omega_t\) be \(J_t \times J_t\) matrix such that whose \((j, k)\)-the element \(\omega_{tjk}(p_t, x_t, \xi_t, \Delta_t)\) is: \[ \omega_{tjk}(p_t, x_t, \xi_t, \Delta_t) = - \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} \delta_{tjk}. \] Then, the equilibrium price vector \(p_t\) is determined by solving the following equilibrium condition: \[ p_t = c_t + \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t). \]
The value of the auxiliary parameters are set as follows:
X
is the data frame such that a row contains the characteristics vector \(x_{j}\) of a product and columns are product index and observed product characteristics. The dimension of the characteristics \(K\) is specified above. Add the row of the outside option whose index is \(0\) and all the characteristics are zero.
# make product characteristics data
X <-
matrix(
sd_x * rnorm(J * (K - 1)),
nrow = J
)
X <-
cbind(
rep(1, J),
X
)
colnames(X) <- paste("x", 1:K, sep = "_")
X <-
data.frame(j = 1:J, X) %>%
tibble::as_tibble()
# add outside option
X <-
rbind(
rep(0, dim(X)[2]),
X
)
## # A tibble: 11 × 4
## j x_1 x_2 x_3
## <dbl> <dbl> <dbl> <dbl>
## 1 0 0 0 0
## 2 1 1 0.975 -0.0324
## 3 2 1 1.48 1.89
## 4 3 1 1.15 1.64
## 5 4 1 -0.611 1.19
## 6 5 1 3.02 1.84
## 7 6 1 0.780 1.56
## 8 7 1 -1.24 0.149
## 9 8 1 -4.43 -3.98
## 10 9 1 2.25 1.24
## 11 10 1 -0.0899 -0.112
M
is the data frame such that a row contains the price \(\xi_{jt}\), marginal cost \(c_{jt}\), and price \(p_{jt}\). For now, set \(p_{jt} = 0\) and fill the equilibrium price later. After generating the variables, drop some products in each market. In order to change the number of available products in each market, for each market, first draw \(J_t\) from a discrete uniform distribution between \(1\) and \(J\). Then, drop products from each market usingdplyr::sample_frac
function with the realized number of available products. The variation in the available products is important for the identification of the distribution of consumer-level unobserved heterogeneity. Add the row of the outside option to each market whose index is \(0\) and all the variables take value zero.
# make market-product data
M <-
expand.grid(
j = 1:J,
t = 1:T
) %>%
tibble::as_tibble() %>%
dplyr::mutate(
xi = sd_xi * rnorm(J*T),
c = exp(sd_c * rnorm(J*T)),
p = 0
)
M <-
M %>%
dplyr::group_by(t) %>%
dplyr::sample_frac(size = purrr::rdunif(1, J)/J) %>%
dplyr::ungroup()
# add outside option
outside <-
data.frame(
j = 0,
t = 1:T,
xi = 0,
c = 0,
p = 0
)
M <-
rbind(
M,
outside
) %>%
dplyr::arrange(
t,
j
)
## # A tibble: 696 × 5
## j t xi c p
## <dbl> <int> <dbl> <dbl> <dbl>
## 1 0 1 0 0 0
## 2 2 1 -0.735 1.04 0
## 3 6 1 -0.0514 0.980 0
## 4 7 1 0.194 0.961 0
## 5 8 1 -0.0269 0.989 0
## 6 0 2 0 0 0
## 7 1 2 -0.197 0.988 0
## 8 2 2 -0.0297 1.04 0
## 9 4 2 0.382 1.00 0
## 10 5 2 -0.0823 1.02 0
## # ℹ 686 more rows
- Generate the consumer-level heterogeneity.
V
is the data frame such that a row contains the vector of shocks to consumer-level heterogeneity, \((\nu_{i}', \upsilon_i)\). They are all i.i.d. standard normal random variables.
# make consumer-market data
V <-
matrix(
rnorm(N * T * (K + 1)),
nrow = N * T
)
colnames(V) <-
c(
paste("v_x", 1:K, sep = "_"),
"v_p"
)
V <-
data.frame(
expand.grid(
i = 1:N,
t = 1:T
),
V
) %>%
tibble::as_tibble()
## # A tibble: 50,000 × 6
## i t v_x_1 v_x_2 v_x_3 v_p
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0.448 0.985 0.611 0.408
## 2 2 1 -0.386 -0.389 -1.11 -0.667
## 3 3 1 0.0567 0.0510 0.0329 -0.119
## 4 4 1 0.585 0.303 0.860 -1.20
## 5 5 1 -0.449 -1.17 0.599 0.212
## 6 6 1 -0.782 0.0596 1.30 0.485
## 7 7 1 1.60 -1.62 -1.91 0.669
## 8 8 1 -1.65 2.10 0.726 -0.108
## 9 9 1 -0.848 -0.933 2.29 -0.0195
## 10 10 1 -0.130 -0.897 -1.90 -0.185
## # ℹ 49,990 more rows
- We use
compute_indirect_utility(df, beta, sigma, mu, omega)
,compute_choice_smooth(X, M, V, beta, sigma, mu, omega)
, andcompute_share_smooth(X, M, V, beta, sigma, mu, omega)
to compute \(s_t(p_t, x_t, \xi_t)\). On top of this, we need a functioncompute_derivative_share_smooth(X, M, V, beta, sigma, mu, omega)
that approximate:
\[ \frac{\partial s_{jt}(p_t, x_t, \xi_t)}{\partial p_{kt}} = \begin{cases} \frac{1}{N} \sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)[1 - \sigma_{ijt}(p_t, x_t, \xi_t)] &\text{ if } j = k\\ - \frac{1}{N}\sum_{i = 1}^N \alpha_i \sigma_{ijt}(p_t, x_t, \xi_t)\sigma_{ikt}(p_t, x_t, \xi_t)] &\text{ if } j \neq k. \end{cases} \]
The returned object should be a list across markets and each element of the list should be \(J_t \times J_t\) matrix whose \((j, k)\)-th element is \(\partial s_{jt}/\partial p_{it}\) (do not include the outside option). The computation will be looped across markets. I recommend to use a parallel computing for this loop.
derivative_share_smooth <-
compute_derivative_share_smooth(
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega
)
## [,1] [,2] [,3] [,4]
## [1,] -0.15195456 0.07243405 0.04067066 0.03615775
## [2,] 0.07243405 -0.21334454 0.06758143 0.06900886
## [3,] 0.04067066 0.06758143 -0.22465048 0.11247770
## [4,] 0.03615775 0.06900886 0.11247770 -0.22508246
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.096590746 0.003580709 0.003713348 0.011204782 0.012198589 0.008403911
## [2,] 0.003580709 -0.054948229 0.003309376 0.009438704 0.009893682 0.007465135
## [3,] 0.003713348 0.003309376 -0.055735044 0.009505457 0.009233146 0.007249620
## [4,] 0.011204782 0.009438704 0.009505457 -0.160285167 0.021922796 0.022645270
## [5,] 0.012198589 0.009893682 0.009233146 0.021922796 -0.136324417 0.019978806
## [6,] 0.008403911 0.007465135 0.007249620 0.022645270 0.019978806 -0.119320239
## [7,] 0.006573094 0.003856270 0.004084570 0.016649978 0.009219389 0.009873958
## [8,] 0.033096170 0.006496123 0.007885498 0.038995016 0.016868502 0.019723897
## [9,] 0.013495765 0.008866219 0.008606532 0.022541027 0.030984223 0.018980139
## [10,] 0.003846416 0.001815922 0.001912074 0.006631000 0.005317866 0.004460069
## [,7] [,8] [,9] [,10]
## [1,] 0.006573094 0.033096170 0.013495765 0.003846416
## [2,] 0.003856270 0.006496123 0.008866219 0.001815922
## [3,] 0.004084570 0.007885498 0.008606532 0.001912074
## [4,] 0.016649978 0.038995016 0.022541027 0.006631000
## [5,] 0.009219389 0.016868502 0.030984223 0.005317866
## [6,] 0.009873958 0.019723897 0.018980139 0.004460069
## [7,] -0.101062317 0.035849014 0.010662149 0.003862923
## [8,] 0.035849014 -0.207076797 0.027381997 0.018831945
## [9,] 0.010662149 0.027381997 -0.148409509 0.006094688
## [10,] 0.003862923 0.018831945 0.006094688 -0.053007318
- Make a list
delta
such that each element of the list is \(J_t \times J_t\) matrix \(\delta_t\).
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 0 0 0 0 0 0 0 0 0
## [2,] 0 1 0 0 0 0 0 0 0 0
## [3,] 0 0 1 0 0 0 0 0 0 0
## [4,] 0 0 0 1 0 0 0 0 0 0
## [5,] 0 0 0 0 1 0 0 0 0 0
## [6,] 0 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 0 0 0 1 0 0 0
## [8,] 0 0 0 0 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0 0 0 1 0
## [10,] 0 0 0 0 0 0 0 0 0 1
- Write a function
update_price(logp, X, M, V, beta, sigma, mu, omega, delta)
that receives a price vector \(p_t^{(r)}\) and returns \(p_t^{(r + 1)}\) by: \[ p_t^{(r + 1)} = c_t + \Omega_t(p_t^{(r)}, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t^{(r)}, x_t, \xi_t). \] The returned object should be a vector whose row represents the condition for an inside product of each market. To impose non-negativity constraint on the price vector, we pass log price and exponentiate inside the function. Iterate this until \(\max_{jt}|p_{jt}^{(r + 1)} - p_{jt}^{(r)}| < \lambda\), for example with \(\lambda = 10^{-6}\). This iteration may or may not converge. The convergence depends on the parameters and the realization of the shocks. If the algorithm does not converge, first check the code.
# set the threshold
lambda <- 1e-6
# set the initial price
p <- M[M$j > 0, "p"]
logp <- log(rep(1, dim(p)[1]))
p_new <-
update_price(
logp = logp,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega,
delta = delta
)
# iterate
distance <- 10000
while (distance > lambda) {
p_old <- p_new
p_new <-
update_price(
log(p_old),
X,
M,
V,
beta,
sigma,
mu,
omega,
delta
)
distance <- max(abs(p_new - p_old))
print(distance)
}
# save
p_actual <- p_new
saveRDS(
p_actual,
file = "data/a5/A5_price_actual.rds"
)
## [,1]
## [1,] 2.009611
## [2,] 2.024696
## [3,] 2.099574
## [4,] 6.374277
## [5,] 1.664490
## [6,] 1.808518
15.2 Estimate the parameters
- Write a function
estimate_marginal_cost()
that estimate \(c_t\) by the equilibrium condition as: \[ c_t = p_t - \Omega_t(p_t, x_t, \xi_t, \Delta_t)^{-1} s_t(p_t, x_t, \xi_t) \]
Of course, in reality, we first draw Monte Carlo shocks to approximate the share, estimate the demand parameters, and use these shocks and estimates to estimate the marginal costs. In this assignment, we check the if the estimated marginal costs coincide with the true marginal costs to confirm that the codes are correctly written.
# take the logarithm
logp <- log(p_actual)
# estimate the marginal cost
marginal_cost_estimate <-
estimate_marginal_cost(
logp = logp,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega,
delta = delta)
marginal_cost_actual <- M[M$j > 0, ]$c
# plot the estimate vs actual marginal costs
marginal_cost_df <-
data.frame(
actual = marginal_cost_actual,
estimate = marginal_cost_estimate
)
ggplot(
marginal_cost_df,
aes(
x = estimate,
y = actual
)
) +
geom_point() +
theme_classic()
- (Optional) Translate
compute_indirect_utility
,compute_choice_smooth
,compute_derivative_share_smooth
,update_price
into C++ using Rcpp and Eigen. Check that the outputs coincide at the machine precision level. I give you extra 2 points for this task on top of the usual 10 points for this assignment.
15.3 Conduct counterfactual simulation
- Suppose that the firm of product 1 owner purchase the firms that own product 2 and 3. Let
delta_counterfactual
be the relevant ownership matrix. Makedelta_counterfactual
.
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 1 1 0 0 0 0 0 0 0
## [2,] 1 1 1 0 0 0 0 0 0 0
## [3,] 1 1 1 0 0 0 0 0 0 0
## [4,] 0 0 0 1 0 0 0 0 0 0
## [5,] 0 0 0 0 1 0 0 0 0 0
## [6,] 0 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 0 0 0 1 0 0 0
## [8,] 0 0 0 0 0 0 0 1 0 0
## [9,] 0 0 0 0 0 0 0 0 1 0
## [10,] 0 0 0 0 0 0 0 0 0 1
- Compute the counterfactual price using the iteration with
update_price
. You can start the iteration from the equilibrium price. Show the average percentage change in the price for each product. In theory, the price of any product should not drop. But some prices can slightly drop because of the numerical errors.
logp <- log(p_actual)
p_new <-
update_price(
logp = logp,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega,
delta = delta_counterfactual
)
distance <- 10000
while (distance > lambda) {
p_old <- p_new
p_new <-
update_price(
log(p_old),
X,
M,
V,
beta,
sigma,
mu,
omega,
delta_counterfactual
)
distance <- max(abs(p_new - p_old))
print(distance)
}
p_counterfactual <- p_new
saveRDS(
p_counterfactual,
file = "data/a5/A5_price_counterfactual.rds"
)
j | p_change |
---|---|
1 | 0.0501744 |
2 | 0.1179955 |
3 | 0.1479700 |
4 | 0.0013059 |
5 | 0.0004309 |
6 | -0.0002448 |
7 | 0.0008458 |
8 | -0.0033448 |
9 | 0.0004042 |
10 | 0.0015040 |
- Write a function
compute_producer_surplus(p, marginal_cost, X, M, V, beta, sigma, mu, omega)
that returns the producer surplus for each product in each market. Compute the actual and counterfactual producer surplus under the estimated marginal costs. Show the average percentage change in the producer surplus for each product.
# compute actual producer surplus
producer_surplus_actual <-
compute_producer_surplus(
p = p_actual,
marginal_cost = marginal_cost_estimate,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega
)
summary(producer_surplus_actual)
## s
## Min. :0.008247
## 1st Qu.:0.041636
## Median :0.071153
## Mean :0.168761
## 3rd Qu.:0.150643
## Max. :1.607658
# compute counterfactual producer surplus
producer_surplus_counterfactual <-
compute_producer_surplus(
p = p_counterfactual,
marginal_cost = marginal_cost_estimate,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega
)
summary(producer_surplus_counterfactual)
## s
## Min. :0.008212
## 1st Qu.:0.042687
## Median :0.073644
## Mean :0.171809
## 3rd Qu.:0.153746
## Max. :1.607658
j | producer_surplus_change |
---|---|
1 | 0.0231358 |
2 | 0.0131566 |
3 | 0.0073846 |
4 | 0.0534174 |
5 | 0.0449804 |
6 | 0.0563809 |
7 | 0.0372859 |
8 | 0.0072057 |
9 | 0.0421801 |
10 | 0.0385013 |
- Write a function
compute_consumer_surplus(p, X, M, V, beta, sigma, mu, omega)
that returns the consumer surplus for each consumer in each market. Compute the actual and counterfactual consumer surplus under the estimated marginal costs. Show the percentage change in the total consumer surplus.
# compute actual consumer surplus
consumer_surplus_actual <-
compute_consumer_surplus(
p = p_actual,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega
)
summary(consumer_surplus_actual)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.03447 1.10805 4.33269 4.65720 230.98190
# compute counterfactual consumer surplus
consumer_surplus_counterfactual <-
compute_consumer_surplus(
p = p_counterfactual,
X = X,
M = M,
V = V,
beta = beta,
sigma = sigma,
mu = mu,
omega = omega
)
summary(consumer_surplus_counterfactual)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.03054 1.07265 4.31048 4.61989 231.02355
consumer_surplus_change <-
(sum(consumer_surplus_counterfactual) -
sum(consumer_surplus_actual)) /
sum(consumer_surplus_actual)
consumer_surplus_change
## [1] -0.005127025