When I retired from teaching, I lost my access to SPSS and could no longer play around with data sets. Having time on my hands, it seemed like a good time to get a new statistical package — STATA 17 — and get back into the game (thanks to Westmont sociologist Blake Victor Kent for the endorsement!). I’ve always found that the best way to learn a new statistical package was just to dive in. Thankfully, the Association of Religion Data Archives has a treasure-trove of data sets available for further exploration.

A recent addition to the ARDA archive was a 2017 survey PRRI conducted with support from MTV. The survey focused on youth in their teens and early 20s. It asked interesting questions about discrimination and support for same-sex marriage. After some initial exploration of the data, I focused in on white self-identified evangelicals. This subgroups is almost evenly divided between young Millennials and older GenZ. There is a measurable difference between these two groups of young people when it comes to support for same-sex marriage: while 31.5% of the millennials support SSM, the figure for GenZ rises to 52.4%.

I began by testing a version of the standard Contact Hypothesis as it relates to intergroup relations. In the 1950s, Gordon Allport argued that bias can be combatted through equal contact between differing groups. The PRRI data set had questions about the nature of differing friendships (including having gay friends or family members) and it was possible to contrast those with attitudes toward same-sex marriage. The initial results of this exploration are below.

Those with gay friends are more than twice as likely to support same-sex marriage as those who have no gay friends of family. Interestingly, only three in ten of those with gay family members support SSM. Granted, the n is very small and the question doesn’t distinguish between immediate and extended family.

After exploring a number of factors in a similar two-dimensional fashion, it became clear that I needed to use a multivariate approach given the ways the various factors might interact. To do so, I had to teach myself how to do logistic regression on a dichotomous variable. The results for logistic regression are given in “odds ratios” for each variable independent of the effects of other variables in the equation. An odds ratio less than one has a diminishing impact on the dependent variable and anything over one increases the likelihood of the dependent variable.

Following an instructional video for Stata logistic regression processes, I added batches of variables that plausibly would relate to support for or opposition to same-sex marriage. A number of factors I thought would be of interest washed out as not being statistically significant: gender, region, generation. I also played around with various ways of splitting categorical variables – weekly attendance wasn’t significant nor was some college or being a democrat (largely because the number of democrats was so small in the WEV population). With all my playing around, I was able to come up with a pretty robust equation that speaks to levels of support for same-sex marriage. The results follow.

I’m still learning how to interpret this stuff, but I find it pretty interesting. Overall, the equation does a decent job of explaining the variability of support for same-sex marriage (the R squared for survey data is pretty good). Looking at the odds ratios, three factors significantly decrease support for same-sex marriage: attending church two or more times a week (only 1/8 the level of support), being a republican (cutting support by nearly 3/4ths), and believing that evangelicals face social discrimination (decreasing support by nearly 2/3rds). Conversely, earning a BA or higher more than triples the likelihood of supporting same-sex marriage. Having a gay friend quadruples support. I should note here that this is very consistent with earlier pieces I have written about anecdotal support for same-sex marriage at Christian colleges.

In addition to the equation and odds ratios, additional output allows examination of how well the predicted variables perform relative to actual responses. As the bottom of the chart shows, the equation correctly classified over 3/4ths of the cases. I kept trying new things to see if I could drive this figure up, but am fairly satisfied with progress to date.

I’m trusting my statistically savvy readers to correct any errors I’ve made in logic or statistical analysis. For now, I think it is pretty interesting data and look forward to further testing with other data sets.

I appreciate your research, John! So interesting. I’m jealous of your statistics acumen!

Great to see you back!