clusters of democratic candidates

Click for Github files with code and analysis


Which democratic candidates are similar?

There are many ways to cluster candidates. I chose to go with a data-driven, middle school-ish approach. Here's the intuition. If all the conversations about Jack also involve Jill (and vice versa), they're probably pretty similar. And conversely, if gossip about Jack never mentions Jill (and vice versa) then they're probably pretty different.

In other words, look at who is talked about together and who isn't.

I Data & research questions

data

Normalized Google Distance (NGD) is a measure of semantic similarity based on webpage results. If keywords have many pages in common relative to their independent results, then these keywords are thought to be semantically similar.

Importnatly, NGD is indexing more or less the internet.

I calculated the NGD for 24 candidates to see which democrats the internet tends to group together.

Research questions

My questions are:

  • What are the clusters of candidates?
  • Who is the most unique candidate?

2 Overall Clusters

how to read a dendrogram

To view overall clusters, I made a "dendrogram". This is a tree diagram used to view hierarchical clusters. Each square is a distance (darker squares imply lower NGD) between two candidates. We use these distances to put candidates into clusters. The height of cluster joinings gives the distance between clusters (a greater height = further distance).

To view who a candidate is most similar to, we read from the bottom up to look at the first branch the candidate joins. Reading from the bottom up, we can see that the first branch Warren joins is a branch containing Sanders, Biden, and Harris. So we say she's most similar to them.

dendro25.png

Clusters

Clusters that emerge

  • Stars (Biden, Warren, Sanders, Harris)
  • Upstarts (Mayor Pete, Beto, Booker)
  • Non-Central Women (Gabbard, Klobuchar, Gillibrand)
  • Economic Pragmatists (Delaney, Ryan, Bullock)

Note: For the correlation graphs that follow, shades are normalized to the cluster(s) in question.

Cluster 1: the stars

Biden, Warren, Sanders, and Harris are talked about together. These are clearly the "Stars". As of August 2nd, they were in 1st, 2nd, 3rd, and 4th place respectively (by national polling average).

Within the Stars, Biden and Harris are closest to one another. This is probably because of Harris's attack on Biden during the first debate. Warren is the most peripheral Star, having a higher average distance to other stars than Biden or Sanders.

Of the Stars, Warren is the closest to the Economic Pragmatists. This is pretty interesting because I think she is viewed by many as very far to the left. During a debate, she explicitly sparred with Delaney over him being too pragmatic and small-minded. Though we see she is talked about with the Economic Pragmatists moreso than Sanders or Biden. I attribute this to her having more concrete economic policies, and a more visible wonkish streak, than the other Stars.

Cluster 2: The Upstarts

The next group is the Upstarts. This is Mayor Pete, Beto, and Booker. Beto is about equally close to Booker and Mayor Pete. But Mayor Pete and Booker are further from each other than they are to Beto.

This group is young and they're doing well. As of August 2nd, they were in 5th (Mayor Pete), 7th (Beto), and 8th (Booker) place. Mayor Pete is the most peripheral Upstart. He has the highest average distance to the other upstarts. Beto, by contrast, is the most central upstart.

Cluster Comparison: Stars and Upstarts

Stars + Upstarts.png

Since both Upstarts and Stars are leading in the polls, it's worth looking at how each group relates to each other.

Of the Upstarts, Beto is the least connected to the Stars.

Between Booker and Mayor Pete, the answer is less clear. Booker is very close to Kamela Harris, but Mayor Pete is reasonably close to all the Stars. So by median, Mayor Pete is the closest Upstart to the Stars; By mean, Booker is the closest to the Stars.

cluster 3: Non-Star Women

Non-Star Women.png

To be fair, this cluster technically includes Seth Moulton, and I am leaving him out because he's polling at 0% and will be out of the race soon.

The Non Star Women cluster is made up of female candidates who are not in the Stars (Harris or Warren) and are reasonably moderate (not Willamson). Of these candidates, Klobuchar is doing the best - in 9th place.

Interestingly, Klobuchar and Gilibrand have the lowest mean and median distance to other candidates. This suggests that they are what I call "relative centrists", not incredibly far removed from anyone in the race.

cluster 4: Economic Pragmatists

Economic Pragmatists.png

This cluster includes economic pragmatists. Delaney is a former businessman. Bullock and Ryan are Midwest democrats. None of these candidates are doing very well, though. If anything, the pragmatist niche is filled by Amy Klobuchar.

This actually begs the question of why Klobuchar isn't talked about with this bunch rather than Gabbard and Gillibrand. It could be that her gender was more salient than her policies. It will be interesting to see if her cluster changes as the field narrows, and more attention is spent on each individual candidate.

3. The most unique candidate

There are a couple of ways to measure the most unique candidate. But one way is just to look at the candidate who, on average, is further from all other candidates. This is a candidate that is not talked about with others.

Perhaps unsurprisingly, that candidate is Andrew Yang.

Funny enough, the cluster he is most similar to is the Economic Pragmatist cluster. This seems strange at first, but it's kind of fitting. He is uniquely absurd and pragmatic.

During one of the debates, he stated his view on climate change: The climate is screwed. So let's get our people to higher ground.

Absurd and pragmatic.

4. Takeaways

There are a couple of interesting things here.

First, we see that there is a clear "frontrunner" cluster. This cluster (Stars) leads in the polls and is talked about together. And similarly, the group behind the frontrunners (Upstarts) in the polls is also talked about together. Of the Upstarts, Mayor Pete and Booker are the most connected to the Stars.

Second, Andrew Yang is the most unique candidate. He had the highest average NGD to other candidates. Yet if we had to place him in a cluster based on NGD, he is most similar to the Economic Pragmatist cluster.

Joshua Ashkinaze