This is currently my best regression result that I have obtained so far. For those not familiar with what I'm currently working on, I'm interested in seeing what the relation is between firm specific human capital and employment outcomes following job termination. Basically, I would expect for those individuals with large amounts of job training to experience more time in unemployment following termination.
wksVOTECH is a proxy for firm specialization, and is the number of weeks spent in any training or vocational programs for the employee's job. Tenure is the number of weeks spent at the job, hisp is a dummy for Hispanic race, black is a dummy for black, and male is a dummy controlling for gender. numWKSunemp is the total number of weeks spent in unemployment following termination from a primary job, and is being treated as the response variable in this model.
At first glance it would appear that wksVOTECH is positively correlated with numWKSunemp. However, two problems exist. First, the r-squared is exceptionally low, below 0.01. Thus, this model is not a good fit of the data at all. Why is this the case? Unfortunately, it's the sample size. It's the entire NLSY79 cohort, which is the data set that I'm working from. Therefore, there are several non-responses, and response errors in general that are affecting my results. Several data points are thus unnecessary. My attempts at filtering them out simply did not work. In short, these results are meaningless.
Going forward, I would like to (correctly) filter out the unnecessary data that is unintentionally affecting my results, attempt to create new variables that are summations of others (not summation as in addition, but placing one variable and another together to create one variable with a larger number of observations). After this, I believe I can include my remaining variables that I have not yet included; an industry breakdown and educational attainment.
wksVOTECH is a proxy for firm specialization, and is the number of weeks spent in any training or vocational programs for the employee's job. Tenure is the number of weeks spent at the job, hisp is a dummy for Hispanic race, black is a dummy for black, and male is a dummy controlling for gender. numWKSunemp is the total number of weeks spent in unemployment following termination from a primary job, and is being treated as the response variable in this model.
At first glance it would appear that wksVOTECH is positively correlated with numWKSunemp. However, two problems exist. First, the r-squared is exceptionally low, below 0.01. Thus, this model is not a good fit of the data at all. Why is this the case? Unfortunately, it's the sample size. It's the entire NLSY79 cohort, which is the data set that I'm working from. Therefore, there are several non-responses, and response errors in general that are affecting my results. Several data points are thus unnecessary. My attempts at filtering them out simply did not work. In short, these results are meaningless.
Going forward, I would like to (correctly) filter out the unnecessary data that is unintentionally affecting my results, attempt to create new variables that are summations of others (not summation as in addition, but placing one variable and another together to create one variable with a larger number of observations). After this, I believe I can include my remaining variables that I have not yet included; an industry breakdown and educational attainment.
Although you did not achieve the results you were expecting, its good to know that you recognize the problems with your regression and are attempting to correct them. I'm looking forward to seeing the results of your future regressions with the inclusion of the remaining variables.
ReplyDelete