Use commands rather than point-and-click. Type set scheme s1mono, permanently then Enter.
Stata commands serial#
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Lastly, I thank the authors of the following articles which I benefit from:Ī more formal and complete econometrics book is Belsley, D. Next, re-run the regression with appropriate vce parameters: We predict a specific residual, namely Cook’s distance, and then delete any data points with Cook’s distance greater than 4/N (Cook’s distance is always positive). This method is similar to studentized residuals. The last step is to re-run the regression, but this time we can add appropriate vce parameters to address additional issues such as heteroskedasticity: Now use the following command to drop “outliers” based on the critical value of 2: Some papers use the critical value of 3, which corresponds to 0.27% significance level, and seems to me not very reasonable.
That’s why in literature we often see that data points with absolute values of studentized residuals greater than 2 will be deleted.
Such a dummy variable would effectively absorb the observation and so remove its influence in determining the other coefficients in the model.” To be honest, I do not fully understand this explanation, but since rstu is a t statistics, the critical value for a traditional significance level should be applied, for example, 1.96 (or 2) for 5% significance level.
Stata commands manual#
Stata’s manual indicates that “studentized residuals can be interpreted as the t statistic for testing the significance of a dummy variable equal to 1 in the observation in question and 0 elsewhere. If the absolute value of rstu exceed certain critical values, the data point will be considered as an outlier and be deleted from the final sample. Suppose the dependent variable is y, and independent variables are x1 and x2. The first step is to run a regression without specifying any vce parameter in Stata (i.e., not using robust or clustered error terms).
Stata commands how to#
To install these two user-written commands, you can type:Īfter the installation, you can type help truncateJ or help winsorizeJ to learn how to use these two commands. I will save time to explain why, but simply highly recommend his work. In my opinion, the best Stata commands to do truncate and winsorize are truncateJ and winsorizeJ written by Judson Caskey. That said, this post is not going to answer that messy question instead, the purpose of this post is to summarize the Stata commands for commonly used methods of dealing with outliers (even if we are not sure whether these methods are appropriate-we all know that is true in accounting research!). In my opinion, only outliers resulting from apparent data errors should be deleted from the sample. I discuss in this post which Stata command to use to implement these four methods.įirst of all, why and how we deal with potential outliers is perhaps one of the messiest issues that accounting researchers will encounter, because no one ever gives a definitive and satisfactory answer. The commonly used methods are: truncate, winsorize, studentized residuals, and Cook’s distance. In accounting archival research, we often take it for granted that we must do something to deal with potential outliers before we run a regression.