mining reddit's love life

Click for github files with code and analysis


/r/relationships, is part kvetching-fest, part touching, and occasionally just bizzare. But aren't most real-life relationships? From the subreddit's sidebar:

"/r/Relationships is a community built around helping people and the goal of providing a platform for interpersonal relationship advice between redditors. We seek posts from users who have specific and personal relationship quandaries that other redditors can help them try to solve."

So I scraped 888 of the top posts to better understand these quandaries.

Research Questions

  • What are the topics of r/relationships? I have always wanted to see a typology or statistical distribution of relationship issues. Perhaps this project can shed light on that.
  • What do people say about different romantic partners? Do different partners (male/female, married/unmarried, etc) express different concerns?

Post Distributions / Frequencies

Posts are fairly evenly distributed by time of day and by month, but 2015 and 2016 saw a lot more top posts than 2017-2019. Part of me wonders if this peak is explained by intra-couple political conflicts.

countYear (1).png

By looking at the titles of posts, I could tell who a post was about. 60% of posts about a romantic subject were about a husband or boyfriend.

All posts in this dataset are popular, but some are really, really popular. The median post had a score of over 2,333. But you can see from the long right tail there were a few posts that blew that number away. The top post had a score of 14,4888 when I scraped this dataset. The OP (original poster) of that post was suspicious of the amount of time her boyfriend was spending with a female coworker. It turns out the coworker helped OP's boyfriend plan his proposal!

distScore (1).png

Subreddit topics

Method

Alright, so what can we learn about relationships from this subreddit? Let's look at some of the latent topics in posts. To find topics in a large set of documents, you can use a statistical method called "topic modeling". Topic modeling is the activity of finding groups of words ("topics") that best represent documents.

The specific topic modeling algorithm I used is called Non-Negative Matrix Factorization. Here's the intuition. Suppose you have a document vector, A. The columns are words contained in specific posts. Now we want to create two new vectors -- one vector containing topics (H), and another containing weights (W). Hopefully, we can then recreate the original document vector by A = W * H. That is, find topics and weights such that every document is some weighted combination of topics. "Topics", in this situation, have a specific meaning. They are sets of co-occuring terms.

Results

After running a topic model analysis, I found 5 key topics of the subreddit.

Sometimes people are just at a loss. The first topic, Can't Even, is about relationship bewilderment. These are situations where you literally don't know what to do or if the thing on your mind is really a problem. These posts revolve around uncertainty, disbelief, and confusion.

The second set of topics revolve around boyfriend concerns. These concerns ranged from a boyfriend who is too risky, to possible suspicions of an affair. But the commonality is that OP is concerned about their boyfriend.

Just as OPs are concerned with their boyfriends, the third set of topics is about husband-related concerns. But relative to boyfriend concerns, husband concerns are more likely to be suspicions of XYZ as opposed to reactions to ABC. So either wives are more suspicious than girlfriends, or husbands are more secretive than boyfriends.

Even if your relationship is not new, you still might find yourself in crazy situations. The fourth set of topics is about unexpected things happening with a person an OP has known for years. Examples include a wife of 9 years who wants to put a daughter up for adoption, and a finance of 2 years who wants to be a "surrendered wife" after reading a book.

The last set of topics -- and by far the largest -- is about friends. In general, these posts were about what I would call "liminal friends". These friends are heading towards non-friendship. But this can happen two ways. Some friends in these posts were moving into the romantic category. In other situations, the friendship was decaying. But in both cases, these posts are about friends becoming non-friends.

TopicsMono.png

What do people say about different romantic partners?

By assigning a "romantic subject" of a post based on a title, I was able to zoom in on the topics surrounding different romantic partners.

countRomantic Subject.png

Boyfriends

bf.png

The first set of topics is really depressing. These are relationships that OP knows or very strongly suspects are broken. Examples include a boyfriend trying to stop OP from taking antidepressants and another post about a couple that breaks up every year.

The second set of topics is about a boyfriend whose friends are terrible. In one case, OP dumped her boyfriend for his cocaine use and now all of his friends hate her. In a different case, people in OP's friend circle keep implying OP is cheating on her boyfriend.

The third set of topics is about OP's boyfriend in relation to either his family or OP's family. A lot of these posts have to do with brothers. In one post, OP's boyfriend wanted OP to dial back her relationship with her brother. In a very different case, OP realized she slept with her boyfriend's brother.

The final set of topics is about communication technology. In some of these posts, OP's boyfriend just cut off contact. And in other posts, OP saw her boyfriend was on some dating websites and didn't know how to broach this.

Husbands

hus.png

The first set of topics is about an improvement in a relationship since the last time OP posted. Some of these improvements are because the relationship actually improved. Other improvements are just in OP's mental state. But either way, OP feels better about their relationship with their husband. The fact that "improvement" topics are over-represented in the husband category may signal that (a) wives are resilient or (b) husbands are amenable to feedback. In either case, it seems reasonable to think married partners have a larger incentive than non-married partners to either improve their marriage or their conception of it.

The second set of topics is about sisters. Some posts are about OP and OP's husband not wanting to spend time with OP's sister. Other posts are about the husband of OP's sister. It's interesting that very few posts about husbands also talk about brothers.

The third set of topics is about friends. In one post, the friends of OP's husband are classist and it really bothers OP. In another post, OP's friends ignore her husband because he's foreign.

The final set of topics are about divorce and cheating -- basically, the breakdown of marriages. In one of these posts, OP found out her husband hired an escort while she was in the hospital. There is another (very interesting) post about OP's husband coming out as gay and divorcing her: While he is lauded for coming out, she feels he ruined a large part of her life.

Wives

wife.png

The first topic is updates of tense situations where the OP informs r/relationships that he talked about the issue with his wife. In itself, this isn't that interesting. But remember how "improvement" was over-represented among wives? This is marriage therapy in action.

The second topic dealt with the strained relationship between wives and daughters. In some cases, the wife was smothering the daughter. While in one case, OP's wife kicked the daughter out of his house while he was away. I found that posts about wives (classified based on title), relative to posts about husbands, were more likely to also reference daughters in their post title (P = 0.04).

The last topic is mainly about the parents of OP's wife. Sometimes these parents make unreasonable requests. Other times the parents just have insufferable personalities.

Girlfriends

gf.png

The first topic is about mounting tension between OP and his girlfriend. In one case, OP's girlfriend posted a long public note on social media after a fight. And OP was very pissed off. Overall, these posts are about tension arising from a specific dramatic act (contrast this with the third topic).

The second topic is about sisters. In these posts, OP's sister and his girlfriend are not getting along. In one case, OP's girlfriend is threatening to tell the bosses of OP's sister that the sister was a sugar baby. I found that posts about girlfriends (classified based on title), relative to boyfriends, were more likely to mention sisters in the body of the post (P = 0.04).

The third topic is about red flags. These are habits of OP's girlfriend that he finds distributing or disconcerting. For example, an OP's girlfriend does chores until she has a meltdown. She does this often. Another OP is concerned that his girlfriend never cuddles.

The fourth topic is about parents. In particular, this topic s about Christmas and holiday conflicts. The general theme is that OP's girlfriend and his parents do not get along, and it's all going to blow up during the holiday.

Takeaways

Friendship

It's interesting that so many of the top posts on r/relationships are about friendship. My interpretation of this is that "friendship" is an under-specified relationship. A friend with benefits can be called a "friend". The HR acquaintance who handles your benefits might be a "friend". Something very sneaky can happen. Your relationship with a person can change drastically, while your label for that relationship stays the same. As a result, changes in friendships don't register as much as changes in more highly specified relationships. And this allows tension to build and build without one recognizing it.

Gender

If there is a post about a romantic partner that also mentions a family member, that family member is often the same gender as the romantic partner. There are two possible explanations.

Explanation 1 (Affinity): Consider a boyfriend (BF) who posts about his girlfriend (GF) and his girlfriend's sister. If siblings of the same gender are closer than siblings of different genders, GF's sister is likely to play a larger role in her life than GF's brother. (If the same logic applies to children, then it makes sense that husbands mentioning wives also mention daughters, not sons).

Explanation 2 (Jealousy) : Consider a girlfriend (GF) who posts about her boyfriend (BF) and her boyfriend's brother. It might be that siblings are jealous of their siblings' relationship partners, causing needless drama for these partners.

Further Directions

I was thinking of looking at more targeted subreddits like /r/weddings to see relationship topics for specific relationship types or stages.

It would also be interesting to look at posts occurring during election cycles versus non election circles, and whether there is a difference in content.

And if you have any cool ideas, feel free to use the data and code from my Github repo!

Joshua Ashkinaze