Spring, 2005 (Roos, Soc. 502)
Assignment 2: Regression and Correlation (due Wednesday, February 16th)
This week's exercise is designed to make the computer work for you in carrying out your own analyses. This is your chance to start initial data analyses for your final project.
Let's start simply. Choose several variables from your data (a dependent variable and two or three interval-level independent variables). On later assignments you can get more complicated by adding more independent variables and categorical variables as necessary, but for this assignment keep it simple and concentrate on interpreting your results.
Propose a theory relating your variables that can be estimated using multiple regression, recode or transform the variables as necessary, code any missing values, and estimate the equation using your statistical program's regression procedure.
Also, generate a correlation matrix among all of your variables (including the dependent variable). Use listwise deletion of missing values for both your regression and correlation runs (in SAS, this is the default for PROC REG, but you'll need the NOMISS parameter for PROC CORR). Ask for the SIMPLE option in PROC REG to get descriptive statistics. If you're using SPSS, STATA, or GSS, find the equivalents.
Write up no more than 4-5 double-spaced pages describing your results. In the discussion of your results, address the following (not point by point, but as if you were writing a journal article):
(1) describe your theory, and what you expect to find in your analyses;
(2) describe how your variables are measured, how you handled missing values, and how you transformed the variables for the analysis. For example, if you have dichotomous variables, transform them to a 0,1 format (i.e., if the values for sex are "1" for male and "2" for female, recode (2=0) for easier interpretability). In this example, in interpreting the dichotomous variable sex, the mean would be the proportion of the population that was male (since male is the value coded "1"). As you're thinking about which value to code "0" and which to code "1", think ahead to your analysis. I always code "1"=male because I'm interested in issues like sex differences in occupational and earnings attainment. By coding "1"=male, I'm ensuring that I'll get positive associations between sex and earnings (since in this world, men always earn more than women). And positive correlations are always easier to interpret than negative correlations.
(3) the means, standard deviations, and correlations; the standardized and unstandardized regression coefficients (don't talk about every number; be selective; think substantively);
(4) the amount of variance in the dependent variable "explained" by all the independent variables;
(5) you should have at least two tables: (1) the means, standard deviations, and correlations; and (2) the regression results; feel free to do any additional crosstabulations to further explore your data; you might also want to include a table that summarizes your coding (see Table 1 in Clarke and Estes, for an example).
Turn in a copy of your final log and output with your write-up. Have fun!