Goodness-of-fit for fixed effect logit model using 'bife' package












4














I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.



    ---------------------------------------------------------------                 
Fixed effects logit model
with analytical bias-correction

Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z

Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)

Corrected structural parameter(s):

Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

AIC= 18730.33 , BIC= 20409.89


Average individual fixed effects= 1.6716
---------------------------------------------------------------









share|improve this question




















  • 1




    Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
    – Julius Vainora
    Nov 10 at 16:45












  • Julius Vainora: I prefer chi-square test.
    – Eric
    Nov 11 at 17:45
















4














I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.



    ---------------------------------------------------------------                 
Fixed effects logit model
with analytical bias-correction

Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z

Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)

Corrected structural parameter(s):

Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

AIC= 18730.33 , BIC= 20409.89


Average individual fixed effects= 1.6716
---------------------------------------------------------------









share|improve this question




















  • 1




    Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
    – Julius Vainora
    Nov 10 at 16:45












  • Julius Vainora: I prefer chi-square test.
    – Eric
    Nov 11 at 17:45














4












4








4







I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.



    ---------------------------------------------------------------                 
Fixed effects logit model
with analytical bias-correction

Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z

Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)

Corrected structural parameter(s):

Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

AIC= 18730.33 , BIC= 20409.89


Average individual fixed effects= 1.6716
---------------------------------------------------------------









share|improve this question















I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.



    ---------------------------------------------------------------                 
Fixed effects logit model
with analytical bias-correction

Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z

Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)

Corrected structural parameter(s):

Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

AIC= 18730.33 , BIC= 20409.89


Average individual fixed effects= 1.6716
---------------------------------------------------------------






r logistic-regression goodness-of-fit log-likelihood






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 17:45

























asked Nov 8 at 11:11









Eric

701416




701416








  • 1




    Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
    – Julius Vainora
    Nov 10 at 16:45












  • Julius Vainora: I prefer chi-square test.
    – Eric
    Nov 11 at 17:45














  • 1




    Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
    – Julius Vainora
    Nov 10 at 16:45












  • Julius Vainora: I prefer chi-square test.
    – Eric
    Nov 11 at 17:45








1




1




Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
– Julius Vainora
Nov 10 at 16:45






Exactly what kind of goodness-of-fit measures are you after? It's possible to extract residuals from bife objects and you may also estimate different specifications. So you are not so restricted after all.
– Julius Vainora
Nov 10 at 16:45














Julius Vainora: I prefer chi-square test.
– Eric
Nov 11 at 17:45




Julius Vainora: I prefer chi-square test.
– Eric
Nov 11 at 17:45












1 Answer
1






active

oldest

votes


















6





+50









Let the DGP be



n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)


so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.



modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())


One way to perform the LR test is with



library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682


But we may also do it manually,



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207


Now let's proceed with bife.



library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)


Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let



logLik.bife <- function(object, ...) object$logl_info$loglik


for loglikelihood extraction. Then we may compare modBife0 with modBife as in



1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1


while modGLM0 and modBife can be compared by running



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))), 
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207


which gives the same result as before, even though with bife we, by default, have bias correction.



Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.



enter image description here






share|improve this answer



















  • 1




    The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
    – Julius Vainora
    Nov 11 at 21:24






  • 1




    I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
    – Julius Vainora
    Nov 11 at 21:31






  • 1




    In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
    – Julius Vainora
    Nov 11 at 21:42








  • 2




    @Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
    – Julius Vainora
    Nov 11 at 22:29






  • 1




    @Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
    – Julius Vainora
    Nov 12 at 18:30











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206564%2fgoodness-of-fit-for-fixed-effect-logit-model-using-bife-package%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









6





+50









Let the DGP be



n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)


so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.



modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())


One way to perform the LR test is with



library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682


But we may also do it manually,



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207


Now let's proceed with bife.



library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)


Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let



logLik.bife <- function(object, ...) object$logl_info$loglik


for loglikelihood extraction. Then we may compare modBife0 with modBife as in



1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1


while modGLM0 and modBife can be compared by running



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))), 
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207


which gives the same result as before, even though with bife we, by default, have bias correction.



Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.



enter image description here






share|improve this answer



















  • 1




    The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
    – Julius Vainora
    Nov 11 at 21:24






  • 1




    I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
    – Julius Vainora
    Nov 11 at 21:31






  • 1




    In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
    – Julius Vainora
    Nov 11 at 21:42








  • 2




    @Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
    – Julius Vainora
    Nov 11 at 22:29






  • 1




    @Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
    – Julius Vainora
    Nov 12 at 18:30
















6





+50









Let the DGP be



n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)


so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.



modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())


One way to perform the LR test is with



library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682


But we may also do it manually,



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207


Now let's proceed with bife.



library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)


Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let



logLik.bife <- function(object, ...) object$logl_info$loglik


for loglikelihood extraction. Then we may compare modBife0 with modBife as in



1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1


while modGLM0 and modBife can be compared by running



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))), 
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207


which gives the same result as before, even though with bife we, by default, have bias correction.



Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.



enter image description here






share|improve this answer



















  • 1




    The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
    – Julius Vainora
    Nov 11 at 21:24






  • 1




    I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
    – Julius Vainora
    Nov 11 at 21:31






  • 1




    In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
    – Julius Vainora
    Nov 11 at 21:42








  • 2




    @Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
    – Julius Vainora
    Nov 11 at 22:29






  • 1




    @Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
    – Julius Vainora
    Nov 12 at 18:30














6





+50







6





+50



6




+50




Let the DGP be



n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)


so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.



modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())


One way to perform the LR test is with



library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682


But we may also do it manually,



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207


Now let's proceed with bife.



library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)


Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let



logLik.bife <- function(object, ...) object$logl_info$loglik


for loglikelihood extraction. Then we may compare modBife0 with modBife as in



1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1


while modGLM0 and modBife can be compared by running



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))), 
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207


which gives the same result as before, even though with bife we, by default, have bias correction.



Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.



enter image description here






share|improve this answer














Let the DGP be



n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)


so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.



modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())


One way to perform the LR test is with



library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682


But we may also do it manually,



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207


Now let's proceed with bife.



library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)


Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let



logLik.bife <- function(object, ...) object$logl_info$loglik


for loglikelihood extraction. Then we may compare modBife0 with modBife as in



1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1


while modGLM0 and modBife can be compared by running



1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))), 
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207


which gives the same result as before, even though with bife we, by default, have bias correction.



Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.



enter image description here







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 11 at 19:47

























answered Nov 11 at 18:53









Julius Vainora

31.8k75978




31.8k75978








  • 1




    The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
    – Julius Vainora
    Nov 11 at 21:24






  • 1




    I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
    – Julius Vainora
    Nov 11 at 21:31






  • 1




    In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
    – Julius Vainora
    Nov 11 at 21:42








  • 2




    @Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
    – Julius Vainora
    Nov 11 at 22:29






  • 1




    @Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
    – Julius Vainora
    Nov 12 at 18:30














  • 1




    The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
    – Julius Vainora
    Nov 11 at 21:24






  • 1




    I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
    – Julius Vainora
    Nov 11 at 21:31






  • 1




    In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
    – Julius Vainora
    Nov 11 at 21:42








  • 2




    @Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
    – Julius Vainora
    Nov 11 at 22:29






  • 1




    @Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
    – Julius Vainora
    Nov 12 at 18:30








1




1




The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
– Julius Vainora
Nov 11 at 21:24




The vector modBife$par$beta contains all the beta coefficients (not fixed effects, no intercept). When testing modBife0 (full model) vs. modBife (only fixed effects), it is exactly those beta coefficients that we set to zero. So, if I understand your question correctly, length(modBife$par$beta) in the the regular output would correspond to the number of variables: 5 in your example (X1, ..., X5).
– Julius Vainora
Nov 11 at 21:24




1




1




I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
– Julius Vainora
Nov 11 at 21:31




I'll also jump ahead to explain length(modBife$par$beta) + length(unique(id)) - 1. Here we are testing the full model against only the intercept. Then the reason for length(modBife$par$beta) remains the same. Next, we set all the fixed effects to zero, and there are length(unique(id)) of them. But in the full model we also don't have the intercept. So, from length(unique(id)) non-beta coefficients we go to 1 (intercept), hence the length(unique(id)) - 1 degrees of freedom.
– Julius Vainora
Nov 11 at 21:31




1




1




In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
– Julius Vainora
Nov 11 at 21:42






In my example, for instance, we have a1*id1 + a2*id2 + b1*x in the full model (where a1 and a2 are fixed effects, id1 and id2 are individual dummy variables). Then the minimal model would be just intercept*1. So, the number of degrees of freedom = 2 = 1 (beta) + 2 (fixed effects) - 1 (intercept is coming back). In other words, while there are length(unique(id)) fixed effects, we lose one degree of freedom due to the restriction that all those dummy variables always sum to 1.
– Julius Vainora
Nov 11 at 21:42






2




2




@Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
– Julius Vainora
Nov 11 at 22:29




@Eric, re 1st comment: length(unique(Z)) being 207 should make sense (I'll add test simulation results today or tomorrow in this case). Re 2nd comment: right, that's a value to report, and indeed chi square takes larger value with more degrees of freedom. My id is just like your Z: they are fixed effects (dummy variables for each individual or, in your case, each time period) with estimated values given at modBife$par$alpha. Re R^2: in logistic regression there no longer is a clear R^2; there are multiple proposals. One is McFadden's R^2, given by c(1- logLik(modBife) / logLik(modGLM0)).
– Julius Vainora
Nov 11 at 22:29




1




1




@Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
– Julius Vainora
Nov 12 at 18:30




@Eric, coming back to length(unique(Z)), everything is fine, except you need to keep in mind the ratio n / length(unique(Z)). The larger it is the better. Otherwise the test may perform poorly.
– Julius Vainora
Nov 12 at 18:30


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206564%2fgoodness-of-fit-for-fixed-effect-logit-model-using-bife-package%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Full-time equivalent

Bicuculline

さくらももこ