# Chapter 14 Assignment 4: Demand Function Estimation II

## 14.1 Simulate data

Be carefull that some parameters are changed from assignment 3. We simulate data from a discrete choice model that is the same with in assignment 3 except for the existence of unobserved product-specific fixed effects. There are $$T$$ markets and each market has $$N$$ consumers. There are $$J$$ products and the indirect utility of consumer $$i$$ in market $$t$$ for product $$j$$ is: $u_{itj} = \beta_{it}' x_j + \alpha_{it} p_{jt} + \xi_{jt} + \epsilon_{ijt},$ where $$\epsilon_{ijt}$$ is an i.i.d. type-I extreme random variable. $$x_j$$ is $$K$$-dimensional observed characteristics of the product. $$p_{jt}$$ is the retail price of the product in the market.

$$\xi_{jt}$$ is product-market specific fixed effect. $$p_{jt}$$ can be correlated with $$\xi_{jt}$$ but $$x_{jt}$$s are independent of $$\xi_{jt}$$. $$j = 0$$ is an outside option whose indirect utility is: $u_{it0} = \epsilon_{i0t},$ where $$\epsilon_{i0t}$$ is an i.i.d. type-I extreme random variable.

$$\beta_{it}$$ and $$\alpha_{it}$$ are different across consumers, and they are distributed as: $\beta_{itk} = \beta_{0k} + \sigma_k \nu_{itk},$ $\alpha_{it} = - \exp(\mu + \omega \upsilon_{it}) = - \exp(\mu + \frac{\omega^2}{2}) + [- \exp(\mu + \omega \upsilon_{it}) + \exp(\mu + \frac{\omega^2}{2})] \equiv \alpha_0 + \tilde{\alpha}_{it},$ where $$\nu_{itk}$$ for $$k = 1, \cdots, K$$ and $$\upsilon_{it}$$ are i.i.d. standard normal random variables. $$\alpha_0$$ is the mean of $$\alpha_i$$ and $$\tilde{\alpha}_i$$ is the deviation from the mean.

Given a choice set in the market, $$\mathcal{J}_t \cup \{0\}$$, a consumer chooses the alternative that maximizes her utility: $q_{ijt} = 1\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}.$ The choice probability of product $$j$$ for consumer $$i$$ in market $$t$$ is: $\sigma_{jt}(p_t, x_t, \xi_t) = \mathbb{P}\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\}.$

Suppose that we only observe the share data: $s_{jt} = \frac{1}{N} \sum_{i = 1}^N q_{ijt},$ along with the product-market characteristics $$x_{jt}$$ and the retail prices $$p_{jt}$$ for $$j \in \mathcal{J}_t \cup \{0\}$$ for $$t = 1, \cdots, T$$. We do not observe the choice data $$q_{ijt}$$ nor shocks $$\xi_{jt}, \nu_{it}, \upsilon_{it}, \epsilon_{ijt}$$.

We draw $$\xi_{jt}$$ from i.i.d. normal distribution with mean 0 and standard deviation $$\sigma_{\xi}$$.

1. Set the seed, constants, and parameters of interest as follows.
# set the seed
set.seed(1)
# number of products
J <- 10
# dimension of product characteristics including the intercept
K <- 3
# number of markets
T <- 100
# number of consumers per market
N <- 500
# number of Monte Carlo
L <- 500
# set parameters of interests
beta <- rnorm(K);
beta <- 4
beta
##   4.0000000  0.1836433 -0.8356286
sigma <- abs(rnorm(K)); sigma
##  1.5952808 0.3295078 0.8204684
mu <- 0.5
omega <- 1

Generate the covariates as follows.

The product-market characteristics: $x_{j1} = 1, x_{jk} \sim N(0, \sigma_x), k = 2, \cdots, K,$ where $$\sigma_x$$ is referred to as sd_x in the code.

The product-market-specific unobserved fixed effect: $\xi_{jt} \sim N(0, \sigma_\xi),$ where $$\sigma_xi$$ is referred to as sd_xi in the code.

The marginal cost of product $$j$$ in market $$t$$: $c_{jt} \sim \text{logNormal}(0, \sigma_c),$ where $$\sigma_c$$ is referred to as sd_c in the code.

The retail price: $p_{jt} - c_{jt} \sim \text{logNorm}(\gamma \xi_{jt}, \sigma_p),$ where $$\gamma$$ is referred to as price_xi and $$\sigma_p$$ as sd_p in the code. This price is not the equilibrium price. We will revisit this point in a subsequent assignment.

The value of the auxiliary parameters are set as follows:

# set auxiliary parameters
price_xi <- 1
sd_x <- 2
sd_xi <- 0.5
sd_c <- 0.05
sd_p <- 0.05
1. X is the data frame such that a row contains the characteristics vector $$x_{j}$$ of a product and columns are product index and observed product characteristics. The dimension of the characteristics $$K$$ is specified above. Add the row of the outside option whose index is $$0$$ and all the characteristics are zero.
X
## # A tibble: 11 x 4
##        j   x_1     x_2     x_3
##    <dbl> <dbl>   <dbl>   <dbl>
##  1     0     0  0       0
##  2     1     1  0.975  -0.0324
##  3     2     1  1.48    1.89
##  4     3     1  1.15    1.64
##  5     4     1 -0.611   1.19
##  6     5     1  3.02    1.84
##  7     6     1  0.780   1.56
##  8     7     1 -1.24    0.149
##  9     8     1 -4.43   -3.98
## 10     9     1  2.25    1.24
## 11    10     1 -0.0899 -0.112
1. M is the data frame such that a row contains the price $$\xi_{jt}$$, marginal cost $$c_{jt}$$, and price $$p_{jt}$$. After generating the variables, drop some products in each market. In this assignment, we drop products in a different way from the last assignment. In order to change the number of available products in each market, for each market, first draw $$J_t$$ from a discrete uniform distribution between $$1$$ and $$J$$. Then, drop products from each market using dplyr::sample_frac function with the realized number of available products. The variation in the available products is important for the identification of the distribution of consumer-level unobserved heterogeneity. Add the row of the outside option to each market whose index is $$0$$ and all the variables take value zero.
M
## # A tibble: 633 x 5
##        j     t      xi     c     p
##    <dbl> <int>   <dbl> <dbl> <dbl>
##  1     0     1  0      0      0
##  2     1     1 -0.0779 0.951  1.86
##  3     4     1  0.209  1.03   2.31
##  4     0     2  0      0      0
##  5     1     2 -0.197  0.988  1.90
##  6     3     2  0.550  1.09   2.78
##  7     4     2  0.382  1.000  2.54
##  8     6     2 -0.127  1.01   1.91
##  9     7     2  0.348  0.952  2.32
## 10     8     2  0.278  0.955  2.16
## # ... with 623 more rows
1. Generate the consumer-level heterogeneity. V is the data frame such that a row contains the vector of shocks to consumer-level heterogeneity, $$(\nu_{i}', \upsilon_i)$$. They are all i.i.d. standard normal random variables.
V
## # A tibble: 50,000 x 6
##        i     t  v_x_1   v_x_2  v_x_3     v_p
##    <int> <int>  <dbl>   <dbl>  <dbl>   <dbl>
##  1     1     1 -1.37   0.211   1.65   0.0141
##  2     2     1  1.37   0.378   1.35   0.387
##  3     3     1 -2.06  -0.0662 -2.45  -1.17
##  4     4     1 -0.992 -0.727  -1.33  -1.42
##  5     5     1  0.252  1.87    0.751  0.317
##  6     6     1 -1.06  -0.531   1.34  -0.224
##  7     7     1 -0.217  1.03    0.909 -0.593
##  8     8     1 -0.838 -0.861  -0.612  1.54
##  9     9     1  0.659 -1.43   -1.77   0.340
## 10    10     1  0.452 -0.239   0.138  0.695
## # ... with 49,990 more rows
1. Join X, M, V using dplyr::left_join and name it df. df is the data frame such that a row contains variables for a consumer about a product that is available in a market.
df
## # A tibble: 316,500 x 13
##        t     i     j  v_x_1   v_x_2 v_x_3     v_p   x_1    x_2     x_3
##    <int> <int> <dbl>  <dbl>   <dbl> <dbl>   <dbl> <dbl>  <dbl>   <dbl>
##  1     1     1     0 -1.37   0.211   1.65  0.0141     0  0      0
##  2     1     1     1 -1.37   0.211   1.65  0.0141     1  0.975 -0.0324
##  3     1     1     4 -1.37   0.211   1.65  0.0141     1 -0.611  1.19
##  4     1     2     0  1.37   0.378   1.35  0.387      0  0      0
##  5     1     2     1  1.37   0.378   1.35  0.387      1  0.975 -0.0324
##  6     1     2     4  1.37   0.378   1.35  0.387      1 -0.611  1.19
##  7     1     3     0 -2.06  -0.0662 -2.45 -1.17       0  0      0
##  8     1     3     1 -2.06  -0.0662 -2.45 -1.17       1  0.975 -0.0324
##  9     1     3     4 -2.06  -0.0662 -2.45 -1.17       1 -0.611  1.19
## 10     1     4     0 -0.992 -0.727  -1.33 -1.42       0  0      0
## # ... with 316,490 more rows, and 3 more variables: xi <dbl>, c <dbl>,
## #   p <dbl>
1. Draw a vector of preference shocks e whose length is the same as the number of rows of df.
head(e)
##   0.1917775 -0.3312816  0.2428217  1.0164097  1.4761643  2.8297340
1. Write a function compute_indirect_utility(df, beta, sigma, mu, omega) that returns a vector whose element is the mean indirect utility of a product for a consumer in a market. The output should have the same length with $$e$$. (This function is the same with assignment 3. You can use the function.)
# compute indirect utility
u <-
compute_indirect_utility(
df, beta, sigma,
mu, omega)
head(u)
##               u
## [1,]  0.0000000
## [2,] -1.1415602
## [3,] -1.3798736
## [4,]  0.0000000
## [5,]  1.8935027
## [6,]  0.9258409

In the previous assingment, we computed predicted share by simulating choice and taking their average. Instead, we compute the actual share by: $s_{jt} = \frac{1}{N} \sum_{i = 1}^N \frac{\exp[\beta_{it}' x_j + \alpha_{it} p_{jt} + \xi_{jt}]}{1 + \sum_{k \in \mathcal{J}_t} \exp[\beta_{it}' x_k + \alpha_{it} p_{kt} + \xi_{jt}]}$ and the predicted share by: $\widehat{\sigma}_{j}(x, p_t, \xi_t) = \frac{1}{L} \sum_{l = 1}^L \frac{\exp[\beta_{t}^{(l)\prime} x_j + \alpha_{t}^{(l)} p_{jt} + \xi_{jt}]}{1 + \sum_{k \in \mathcal{J}_t} \exp[\beta_{t}^{(l)\prime} x_k + \alpha_{t}^{(l)} p_{kt} + \xi_{jt}]}.$

1. To do so, write a function compute_choice_smooth(X, M, V, beta, sigma, mu, omega) in which the choice of each consumer is not: $q_{ijt} = 1\{u_{ijt} = \max_{k \in \mathcal{J}_t \cup \{0\}} u_{ikt}\},$ but $\tilde{q}_{ijt} = \frac{\exp(u_{ijt})}{1 + \sum_{k \in \mathcal{J}_t} \exp(u_{ikt})}.$
df_choice_smooth <-
compute_choice_smooth(X, M, V, beta, sigma, mu, omega)
summary(df_choice_smooth)
##        t                i               j             v_x_1
##  Min.   :  1.00   Min.   :  1.0   Min.   : 0.00   Min.   :-4.302781
##  1st Qu.: 23.00   1st Qu.:125.8   1st Qu.: 2.00   1st Qu.:-0.685539
##  Median : 48.00   Median :250.5   Median : 4.00   Median : 0.001041
##  Mean   : 49.67   Mean   :250.5   Mean   : 4.49   Mean   :-0.002541
##  3rd Qu.: 77.00   3rd Qu.:375.2   3rd Qu.: 7.00   3rd Qu.: 0.673061
##  Max.   :100.00   Max.   :500.0   Max.   :10.00   Max.   : 3.809895
##      v_x_2               v_x_3                v_p
##  Min.   :-4.542122   Min.   :-3.957618   Min.   :-4.218131
##  1st Qu.:-0.679702   1st Qu.:-0.672701   1st Qu.:-0.669446
##  Median : 0.000935   Median : 0.003104   Median : 0.001976
##  Mean   : 0.000478   Mean   : 0.003428   Mean   : 0.000017
##  3rd Qu.: 0.673109   3rd Qu.: 0.678344   3rd Qu.: 0.670699
##  Max.   : 4.313621   Max.   : 4.244194   Max.   : 4.074300
##       x_1             x_2               x_3                xi
##  Min.   :0.000   Min.   :-4.4294   Min.   :-3.9787   Min.   :-1.498475
##  1st Qu.:1.000   1st Qu.:-0.6108   1st Qu.: 0.0000   1st Qu.:-0.263684
##  Median :1.000   Median : 0.7797   Median : 1.1878   Median : 0.000000
##  Mean   :0.842   Mean   : 0.3015   Mean   : 0.5034   Mean   :-0.002574
##  3rd Qu.:1.000   3rd Qu.: 1.4766   3rd Qu.: 1.6424   3rd Qu.: 0.278332
##  Max.   :1.000   Max.   : 3.0236   Max.   : 1.8877   Max.   : 1.905138
##        c                p               u                  q
##  Min.   :0.0000   Min.   :0.000   Min.   :-432.189   Min.   :0.000000
##  1st Qu.:0.9376   1st Qu.:1.527   1st Qu.:  -3.045   1st Qu.:0.001456
##  Median :0.9870   Median :1.885   Median :   0.000   Median :0.032012
##  Mean   :0.8425   Mean   :1.809   Mean   :  -1.901   Mean   :0.157978
##  3rd Qu.:1.0282   3rd Qu.:2.324   3rd Qu.:   1.493   3rd Qu.:0.159143
##  Max.   :1.1572   Max.   :8.211   Max.   :  22.343   Max.   :1.000000
1. Next, write a function compute_share_smooth(X, M, V, beta, sigma, mu, omega) that calls compute_choice_smooth and then returns the share based on above $$\tilde{q}_{ijt}$$. If we use these functions with the Monte Carlo shocks, it gives us the predicted share of the products.
df_share_smooth <- compute_share_smooth(X, M, V, beta, sigma, mu, omega)
summary(df_share_smooth)
##        t                j              x_1             x_2
##  Min.   :  1.00   Min.   : 0.00   Min.   :0.000   Min.   :-4.4294
##  1st Qu.: 23.00   1st Qu.: 2.00   1st Qu.:1.000   1st Qu.:-0.6108
##  Median : 48.00   Median : 4.00   Median :1.000   Median : 0.7797
##  Mean   : 49.67   Mean   : 4.49   Mean   :0.842   Mean   : 0.3015
##  3rd Qu.: 77.00   3rd Qu.: 7.00   3rd Qu.:1.000   3rd Qu.: 1.4766
##  Max.   :100.00   Max.   :10.00   Max.   :1.000   Max.   : 3.0236
##       x_3                xi                  c                p
##  Min.   :-3.9787   Min.   :-1.498475   Min.   :0.0000   Min.   :0.000
##  1st Qu.: 0.0000   1st Qu.:-0.263684   1st Qu.:0.9376   1st Qu.:1.527
##  Median : 1.1878   Median : 0.000000   Median :0.9870   Median :1.885
##  Mean   : 0.5034   Mean   :-0.002574   Mean   :0.8425   Mean   :1.809
##  3rd Qu.: 1.6424   3rd Qu.: 0.278332   3rd Qu.:1.0282   3rd Qu.:2.324
##  Max.   : 1.8877   Max.   : 1.905138   Max.   :1.1572   Max.   :8.211
##        q                 s                 y
##  Min.   :  5.094   Min.   :0.01019   Min.   :-3.2018
##  1st Qu.: 19.712   1st Qu.:0.03942   1st Qu.:-1.8590
##  Median : 34.968   Median :0.06994   Median :-1.3580
##  Mean   : 78.989   Mean   :0.15798   Mean   :-1.1288
##  3rd Qu.:119.805   3rd Qu.:0.23961   3rd Qu.:-0.2189
##  Max.   :349.796   Max.   :0.69959   Max.   : 1.0355

Use this df_share_smooth as the data to estimate the parameters in the following section.

## 14.2 Estimate the parameters

1. First draw Monte Carlo consumer-level heterogeneity V_mcmc and Monte Carlo preference shocks e_mcmc. The number of simulations is L. This does not have to be the same with the actual number of consumers N.
V_mcmc
## # A tibble: 50,000 x 6
##        i     t   v_x_1  v_x_2   v_x_3    v_p
##    <int> <int>   <dbl>  <dbl>   <dbl>  <dbl>
##  1     1     1  0.488  -1.51   0.528  -0.468
##  2     2     1  1.16    0.507 -0.527  -0.516
##  3     3     1 -2.49   -0.318 -0.0996 -0.893
##  4     4     1  0.0952 -0.133 -2.05    1.92
##  5     5     1 -1.11    0.103  2.24    0.753
##  6     6     1  0.903   0.496  0.287   1.53
##  7     7     1  0.913  -0.144  0.129  -1.17
##  8     8     1 -1.52    0.357 -0.475  -0.736
##  9     9     1  0.643   0.219  0.815  -1.27
## 10    10     1 -0.358   0.272 -0.650  -2.09
## # ... with 49,990 more rows
head(e_mcmc)
##   0.1006013  2.7039824  1.0540278  2.4697389  1.6721181 -1.0283872
1. Vectorize the parameters to a vector theta because optim requires the maximiand to be a vector.
# set parameters
theta <- c(beta, sigma, mu, omega)
theta
##   4.0000000  0.1836433 -0.8356286  1.5952808  0.3295078  0.8204684
##   0.5000000  1.0000000
1. Estimate the parameters assuming there is no product-specific unobserved fixed effects $$\xi_{jt}$$, i.e., using the functions in assignment 3. To do so, first modify M to M_no in which xi is replaced with 0 and estimate the model with M_no. Otherwise, your function will compute the share with the true xi.
M_no <- M %>%
dplyr::mutate(xi = 0)
## $par ##  3.1292612 0.1834495 -0.9357865 1.5678279 0.3768438 1.1698887 ##  -0.1108147 1.9156776 ## ##$value
##  0.0004291768
##
## $counts ## function gradient ## 297 NA ## ##$convergence
##  0
##
## $message ## NULL ## true estimates ## 1 4.0000000 3.1292612 ## 2 0.1836433 0.1834495 ## 3 -0.8356286 -0.9357865 ## 4 1.5952808 1.5678279 ## 5 0.3295078 0.3768438 ## 6 0.8204684 1.1698887 ## 7 0.5000000 -0.1108147 ## 8 1.0000000 1.9156776 Next, we estimate the model allowing for the product-market-specific unobserved fixed effect $$\xi_{jt}$$ using the BLP algorithm. To do so, we slightly modify the compute_indirect_utility, compute_choice_smooth, and compute_share_smooth functions so that they receive $$\delta_{jt}$$ to compute the indirect utilities, choices, and shares. Be careful that the treatment of $$\alpha_i$$ is slightly different from the lecture note, because we assumed that $$\alpha_i$$s are log-normal random variables. 1. Compute and print out $$\delta_{jt}$$ at the true parameters, i.e.: $\delta_{jt} = \beta_0' x_j + \alpha_0' p_{jt} + \xi_{jt}.$ delta ## # A tibble: 633 x 3 ## t j delta ## <int> <dbl> <dbl> ## 1 1 0 0 ## 2 1 1 -0.918 ## 3 1 4 -3.17 ## 4 2 0 0 ## 5 2 1 -1.15 ## 6 2 3 -4.17 ## 7 2 4 -3.63 ## 8 2 6 -2.48 ## 9 2 7 -2.32 ## 10 2 8 0.924 ## # ... with 623 more rows 1. Write a function compute_indirect_utility_delta(df, delta, sigma, mu, omega) that returns a vector whose element is the mean indirect utility of a product for a consumer in a market. The output should have the same length with $$e$$. Print out the output with $$\delta_{jt}$$ evaluated at the true parameters. Check if the output is close to the true indirect utilities. # compute indirect utility from delta u_delta <- compute_indirect_utility_delta(df, delta, sigma, mu, omega) head(u_delta) ## u ## [1,] 0.0000000 ## [2,] -1.1415602 ## [3,] -1.3798736 ## [4,] 0.0000000 ## [5,] 1.8935027 ## [6,] 0.9258409 summary(u - u_delta) ## u ## Min. :-2.842e-14 ## 1st Qu.:-4.441e-16 ## Median : 0.000e+00 ## Mean : 5.030e-18 ## 3rd Qu.: 3.331e-16 ## Max. : 5.684e-14 1. Write a function compute_choice_smooth_delta(X, M, V, delta, sigma, mu, omega) that first construct df from X, M, V, second call compute_indirect_utility_delta to obtain the vector of mean indirect utilities u, third compute the (smooth) choice vector q based on the vector of mean indirect utilities, and finally return the data frame to which u and q are added as columns. Print out the output with $$\delta_{jt}$$ evaluated at the true parameters. Check if the output is close to the true (smooth) choice vector. # compute choice df_choice_smooth_delta <- compute_choice_smooth_delta(X, M, V, delta, sigma, mu, omega) df_choice_smooth_delta ## # A tibble: 316,500 x 15 ## t i j v_x_1 v_x_2 v_x_3 v_p x_1 x_2 x_3 ## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 1 0 -1.37 0.211 1.65 0.0141 0 0 0 ## 2 1 1 1 -1.37 0.211 1.65 0.0141 1 0.975 -0.0324 ## 3 1 1 4 -1.37 0.211 1.65 0.0141 1 -0.611 1.19 ## 4 1 2 0 1.37 0.378 1.35 0.387 0 0 0 ## 5 1 2 1 1.37 0.378 1.35 0.387 1 0.975 -0.0324 ## 6 1 2 4 1.37 0.378 1.35 0.387 1 -0.611 1.19 ## 7 1 3 0 -2.06 -0.0662 -2.45 -1.17 0 0 0 ## 8 1 3 1 -2.06 -0.0662 -2.45 -1.17 1 0.975 -0.0324 ## 9 1 3 4 -2.06 -0.0662 -2.45 -1.17 1 -0.611 1.19 ## 10 1 4 0 -0.992 -0.727 -1.33 -1.42 0 0 0 ## # ... with 316,490 more rows, and 5 more variables: xi <dbl>, c <dbl>, ## # p <dbl>, u <dbl>, q <dbl> summary(df_choice_smooth_delta) ## t i j v_x_1 ## Min. : 1.00 Min. : 1.0 Min. : 0.00 Min. :-4.302781 ## 1st Qu.: 23.00 1st Qu.:125.8 1st Qu.: 2.00 1st Qu.:-0.685539 ## Median : 48.00 Median :250.5 Median : 4.00 Median : 0.001041 ## Mean : 49.67 Mean :250.5 Mean : 4.49 Mean :-0.002541 ## 3rd Qu.: 77.00 3rd Qu.:375.2 3rd Qu.: 7.00 3rd Qu.: 0.673061 ## Max. :100.00 Max. :500.0 Max. :10.00 Max. : 3.809895 ## v_x_2 v_x_3 v_p ## Min. :-4.542122 Min. :-3.957618 Min. :-4.218131 ## 1st Qu.:-0.679702 1st Qu.:-0.672701 1st Qu.:-0.669446 ## Median : 0.000935 Median : 0.003104 Median : 0.001976 ## Mean : 0.000478 Mean : 0.003428 Mean : 0.000017 ## 3rd Qu.: 0.673109 3rd Qu.: 0.678344 3rd Qu.: 0.670699 ## Max. : 4.313621 Max. : 4.244194 Max. : 4.074300 ## x_1 x_2 x_3 xi ## Min. :0.000 Min. :-4.4294 Min. :-3.9787 Min. :-1.498475 ## 1st Qu.:1.000 1st Qu.:-0.6108 1st Qu.: 0.0000 1st Qu.:-0.263684 ## Median :1.000 Median : 0.7797 Median : 1.1878 Median : 0.000000 ## Mean :0.842 Mean : 0.3015 Mean : 0.5034 Mean :-0.002574 ## 3rd Qu.:1.000 3rd Qu.: 1.4766 3rd Qu.: 1.6424 3rd Qu.: 0.278332 ## Max. :1.000 Max. : 3.0236 Max. : 1.8877 Max. : 1.905138 ## c p u q ## Min. :0.0000 Min. :0.000 Min. :-432.189 Min. :0.000000 ## 1st Qu.:0.9376 1st Qu.:1.527 1st Qu.: -3.045 1st Qu.:0.001456 ## Median :0.9870 Median :1.885 Median : 0.000 Median :0.032012 ## Mean :0.8425 Mean :1.809 Mean : -1.901 Mean :0.157978 ## 3rd Qu.:1.0282 3rd Qu.:2.324 3rd Qu.: 1.493 3rd Qu.:0.159143 ## Max. :1.1572 Max. :8.211 Max. : 22.343 Max. :1.000000 summary(df_choice_smooth$q - df_choice_smooth_delta$q) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -1.110e-15 -6.939e-18 0.000e+00 -4.120e-20 1.735e-18 9.992e-16 1. Write a function compute_share_delta(X, M, V, delta, sigma, mu, omega) that first construct df from X, M, V, second call compute_choice_delta to obtain a data frame with u and q, third compute the share of each product at each market s and the log difference in the share from the outside option, $$\ln(s_{jt}/s_{0t})$$, denoted by y, and finally return the data frame that is summarized at the product-market level, dropped consumer-level variables, and added s and y. # compute share df_share_smooth_delta <- compute_share_smooth_delta(X, M, V, delta, sigma, mu, omega) df_share_smooth_delta ## # A tibble: 633 x 11 ## t j x_1 x_2 x_3 xi c p q s ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 0 0 0 0 0 0 0 222. 0.444 ## 2 1 1 1 0.975 -0.0324 -0.0779 0.951 1.86 219. 0.438 ## 3 1 4 1 -0.611 1.19 0.209 1.03 2.31 58.6 0.117 ## 4 2 0 0 0 0 0 0 0 123. 0.247 ## 5 2 1 1 0.975 -0.0324 -0.197 0.988 1.90 27.5 0.0550 ## 6 2 3 1 1.15 1.64 0.550 1.09 2.78 14.5 0.0291 ## 7 2 4 1 -0.611 1.19 0.382 1.000 2.54 10.4 0.0207 ## 8 2 6 1 0.780 1.56 -0.127 1.01 1.91 18.9 0.0378 ## 9 2 7 1 -1.24 0.149 0.348 0.952 2.32 14.1 0.0282 ## 10 2 8 1 -4.43 -3.98 0.278 0.955 2.16 260. 0.521 ## # ... with 623 more rows, and 1 more variable: y <dbl> summary(df_share_smooth_delta) ## t j x_1 x_2 ## Min. : 1.00 Min. : 0.00 Min. :0.000 Min. :-4.4294 ## 1st Qu.: 23.00 1st Qu.: 2.00 1st Qu.:1.000 1st Qu.:-0.6108 ## Median : 48.00 Median : 4.00 Median :1.000 Median : 0.7797 ## Mean : 49.67 Mean : 4.49 Mean :0.842 Mean : 0.3015 ## 3rd Qu.: 77.00 3rd Qu.: 7.00 3rd Qu.:1.000 3rd Qu.: 1.4766 ## Max. :100.00 Max. :10.00 Max. :1.000 Max. : 3.0236 ## x_3 xi c p ## Min. :-3.9787 Min. :-1.498475 Min. :0.0000 Min. :0.000 ## 1st Qu.: 0.0000 1st Qu.:-0.263684 1st Qu.:0.9376 1st Qu.:1.527 ## Median : 1.1878 Median : 0.000000 Median :0.9870 Median :1.885 ## Mean : 0.5034 Mean :-0.002574 Mean :0.8425 Mean :1.809 ## 3rd Qu.: 1.6424 3rd Qu.: 0.278332 3rd Qu.:1.0282 3rd Qu.:2.324 ## Max. : 1.8877 Max. : 1.905138 Max. :1.1572 Max. :8.211 ## q s y ## Min. : 5.094 Min. :0.01019 Min. :-3.2018 ## 1st Qu.: 19.712 1st Qu.:0.03942 1st Qu.:-1.8590 ## Median : 34.968 Median :0.06994 Median :-1.3580 ## Mean : 78.989 Mean :0.15798 Mean :-1.1288 ## 3rd Qu.:119.805 3rd Qu.:0.23961 3rd Qu.:-0.2189 ## Max. :349.796 Max. :0.69959 Max. : 1.0355 summary(df_share_smooth$s - df_share_smooth_delta$s) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -1.388e-16 -1.735e-17 0.000e+00 -1.644e-19 1.041e-17 2.220e-16 1. Write a function solve_delta(df_share_smooth, X, M, V, delta, sigma, mu, omega) that finds $$\delta_{jt}$$ that equates the actua share and the predicted share based on compute_share_smooth_delta by the fixed-point algorithm with an operator: $T(\delta_{jt}^{(r)}) = \delta_{jt}^{(r)} + \kappa \cdot \log\left(\frac{s_{jt}}{\sigma_{jt}[\delta^{(r)}]}\right),$ where $$s_{jt}$$ is the actual share of product $$j$$ in market $$t$$ and $$\sigma_{jt}[\delta^{(r)}]$$ is the predicted share of product $$j$$ in market $$t$$ given $$\delta^{(r)}$$. Multiplying $$\kappa$$ is for the numerical stability. I set the value at $$\kappa = 1$$. Adjust it if the algorithm did not work. Set the stopping criterion at $$\max_{jt}|\delta_{jt}^{(r + 1)} - \delta_{jt}^{(r)}| < \lambda$$. Set $$\lambda$$ at $$10^{-3}$$. Make sure that $$\delta_{i0t}$$ is always set at zero while the iteration. Start the algorithm with the true $$\delta_{jt}$$ and check if the algorithm returns (almost) the same $$\delta_{jt}$$ when the actual and predicted smooth share are equated. kappa <- 1 lambda <- 1e-3 delta_new <- solve_delta(df_share_smooth, X, M, V, delta, sigma, mu, omega, kappa, lambda) head(delta_new) ## # A tibble: 6 x 3 ## t j delta ## <int> <dbl> <dbl> ## 1 1 0 0 ## 2 1 1 -0.918 ## 3 1 4 -3.17 ## 4 2 0 0 ## 5 2 1 -1.15 ## 6 2 3 -4.17 summary(delta_new$delta - delta$delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -1.776e-15 -4.441e-16 0.000e+00 -6.033e-17 0.000e+00 1.776e-15 1. Check how long it takes to compute the limit $$\delta$$ under the Monte Carlo shocks starting from the true $$\delta$$ to match with df_share_smooth. This is approximately the time to evaluate the objective function. delta_new <- solve_delta(df_share_smooth, X, M, V_mcmc, delta, sigma, mu, omega, kappa, lambda) save(delta_new, file = "data/A4_delta_new.RData") delta_new <- get(load(file = "data/A4_delta_new.RData")) summary(delta_new$delta - delta$delta) ## Warning in delta_new$delta - delta$delta: longer object length is not a ## multiple of shorter object length ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -14.32511 -1.55451 -0.04656 -0.06849 1.45225 13.35118 1. We use the marginal cost $$c_{jt}$$ as the excluded instrumental variable for $$p_{jt}$$. Let $$\Psi$$ be the weighing matrix for the GMM estimator. For now, let it be the identity matrix. Write a function compute_theta_linear(df_share_smooth, delta, mu, omega, Psi) that returns the optimal linear parameters associated with the data and $$\delta$$. Notice that we only obtain $$\beta_0$$ in this way because $$\alpha_0$$ is directly computed from the non-linear parameters by $$-\exp(\mu + \omega^2/2)$$. The first order condition for $$\beta_0$$ is: $\begin{equation} \beta_0 = (X'W \Phi^{-1} W'X)^{-1} X' W \Phi^{-1} W' [\delta - \alpha_0 p], \end{equation}$ where $\begin{equation} X = \begin{pmatrix} x_{11}'\\ \vdots \\ x_{J_1 1}'\\ \vdots \\ x_{1T}' \\ \vdots \\ x_{J_T T} \end{pmatrix} \end{equation}$ $\begin{equation} W = \begin{pmatrix} x_{11}' & c_{11}\\ \vdots & \vdots \\ x_{J_1 1}' & c_{J_1 1}\\ \vdots & \vdots \\ x_{1T}' & c_{1T}\\ \vdots & \vdots \\ x_{J_T T} & c_{J_T T} \end{pmatrix}, \end{equation}$ $\begin{equation} \delta = \begin{pmatrix} \delta_11\\ \vdots\\ \delta_{J_1 1}\\ \vdots\\ \delta_1T\\ \vdots\\ \delta_{J_T T}\\ \end{pmatrix} \end{equation}$, where $$\alpha_0 = - \exp(\mu + \omega^2/2)$$. Notice that $$X$$ and $$W$$ does not include rows for the outwide option. Psi <- diag(length(beta) + 1) theta_linear <- compute_theta_linear(df_share_smooth, delta, mu, omega, Psi) cbind(theta_linear, beta) ## delta beta ## x_1 3.9799719 4.0000000 ## x_2 0.1405106 0.1836433 ## x_3 -0.7821397 -0.8356286 1. Write a function solve_xi(df_share_smooth, delta, beta, mu, omega) that computes the values of $$\xi$$ that are implied from the data, $$\delta$$, and the linear parameters. Check that the (almost) true values are returned when true $$\delta$$ and the true linear parmaeters are passed to the function. Notice that the returend $$\xi$$ should not include rows for the outside option. xi_new <- solve_xi(df_share_smooth, delta, beta, mu, omega) head(xi_new) ## xi ## [1,] -0.07789775 ## [2,] 0.20897078 ## [3,] -0.19714498 ## [4,] 0.55001269 ## [5,] 0.38158787 ## [6,] -0.12668084 xi_true <- df_share_smooth %>% dplyr::filter(j != 0) %>% dplyr::select(xi) summary(xi_true - xi_new) ## xi ## Min. :-4.441e-16 ## 1st Qu.:-9.714e-17 ## Median : 0.000e+00 ## Mean :-9.388e-18 ## 3rd Qu.: 5.551e-17 ## Max. : 8.882e-16 1. Write a function GMM_objective_A4(theta_nonlinear, delta, df_share_smooth, Psi, X, M, V_mcmc, kappa, lambda) that returns the value of the GMM objective function as a function of non-linear parameters mu, omega, and sigma: $\min_{\theta} \xi(\theta)' W \Phi^{-1} W' \xi(\theta),$ where $$\xi(\theta)$$ is the values of $$\xi$$ that solves: $s = \sigma(p, x, \xi),$ given parameters $$\theta$$. Note that the row of $$\xi(\theta)$$ and $$W$$ do not include the rows for the outside options. # non-linear parmaeters theta_nonlinear <- c(mu, omega, sigma) # compute GMM objective function objective <- GMM_objective_A4(theta_nonlinear, delta, df_share_smooth, Psi, X, M, V_mcmc, kappa, lambda) save(objective, file = "data/A4_objective.RData") objective <- get(load(file = "data/A4_objective.RData")) objective ## xi ## xi 0.1368324 1. Draw a graph of the objective function that varies each non-linear parameter from 0, 0.2, $$\cdots$$, 2.0 of the true value. Try with the actual shocks V. ## [] ## ## [] ## ## [] ## ## [] ## ## [] 1. Find non-linear parameters that minimize the GMM objective function. Because standard deviations of the same absolute value with positive and negative values have almost the same implication for the data, you can take the absolute value if the estimates of the standard deviations happened to be negative (Another way is to set the non-negativity constraints on the standard deviations). ##$par
##  0.3635051 0.7502512 1.5852184 0.4223104 0.8757237
##
## $value ##  2.064906e-18 ## ##$counts
## $convergence ##  0 ## ##$message
## NULL
##           true  estimate
## [5,] 0.8204684 0.8757237