File: Open Data File: house.txt Basic: Bivariate: Price as "Y", Assess and Months as "X" Hot Spot - Fit Line Note Assessed price is significant, but months on market is not R-squares are .925 and .064 Model: Fit Model: Price as "Y", Add Assess and Months in "Construct Model Effects" Fitted regression equation is -44.99 + 1.75*Assess + 0.37*Months R-square is .943, adjusted R-square is .938 overall model is significant (F-test, under "Analysis of Variance") both Assess and Months are significant (individual t-tests) residual plot? - look for trends, don't make decisions based on possible outliers predicted sale price for an assessed value of 80 on the market for 6 months: -44.99 + 1.75*80 + 0.37*6 = 97.23 File: Open Data File: speed.txt Basic: Bivariate: MPG as "Y", MPH as "X" Hot Spot - Fit Line terrible fit, low R-square, not statistically significant Hot Spot - Plot Residuals clearly non-linear Model: Fit Model: MPG as "Y", Contruct Model Effects: MPH "Macro" "Polynomial to Degree" (default degree is 2, a quadratic) fit clearly better R-square is .919, Adjusted R-square is .912 F-test of overall model shows significance both terms are significant by t-tests residual plot? try "Polynomial to Degree" with Degree=4 now R-square is .977, Adjusted R-square is .973 overall model is significant leading term is significant (keep lower order terms for polynomials) Recordings of levels of a photochemical pollutant (some kind of oxidant) made hourly at several recording stations in LA (by the LA Pollution Control District). Interest lies in the relationship of pollutant levels to one or more meteorological variables (This is a small sample from a very large data set). Data recorded are daily maximum levels of the oxidant over a series of days, together with morning averages of wind speed, temperature, humidity and insolation (a measure of the amount of sunlight) during one summer in the early 1990s. File: Open Data File: oxide.txt Multivariate: Multivariate: all as "Y" makes a scatterplot matrix and computes correlations Which variables look most helpful in predicting the oxidant level? Are any of the "independent" variables highly correlated? Need to do model building Mode: Fit Model: Oxidant.Level as "Y", others as effects R-square is .798, Adjusted R-square is .756 overall model is significant but not all slopes are significant day is least significant, so try again without it R-square is .798, Adjusted R-square is .756 humidity and insolation still not significant (although humidity's p-value is much smaller, since it was correlated with day) remove insolation (largest p-value) and try again R-square is .796, Adjusted R-square is .773 (higher) humidity still not significant, so now remove it and try again R-square is .777, Adjusted R-square is .761 overall model is significant all effects are significant Adjusted R-square is slightly lower, but it is better to have all effect significant residual plot looks fine regression equation: oxidant-hat = -5.2 - 0.43*wind + 0.52*temp predicted oxidant level for wind speed of 50mph and temp of 80 degrees: -5.2 - .43*50 + .52*80 = 14.9