SCOTUS Case Embedding Visualization (35,781 cases)

I made this visualization to give access to SCOTUS at a glance. The visualization was made using UMAP and Spectral Clustering using data from Kaggle. NB! This is a work in progress-- I am working on applying better summarization algorithms:-). A mini-version of this visualization (5000 cases) is available here: Mini SCOTUS Visualization

What Does This Mean?

The Supreme Court of the United States (SCOTUS) decides cases that shape legal precedent across the country. This visualization aims to give a rough overview of SCOTUS opinions by clustering cases based on their textual similarity. Extracts from opionins that are deemed important by another clustering algorithm operating on sentence embeddings are displayed as summaries (this algorithm needs some work:-D).

How to Interpret This?

- **Close points** → Cases that share semantics.
- **Different clusters** → Rough aggregation of cases based on neighborhood.
- **Hover to read summaries** → See short summary of the case when you hover over a point.

About This Visualization

I used HuggingFace Transformers and KMeans (sklearn) clustering to summarize SCOTUS opinions. This type of summarization is extractive, meaning that it takes sentences from the original text without modifying them. This is why the summaries aren't fluent, but the idea here is to give a very brief overview without modifying it (language models that transform text can hallucinate). I used OpenAI API to embed the summaries in a vector space. I used UMAP to reduce the dimensionality of the embeddings to 3D and Spectral Clustering to cluster the cases. Each point represents a case, colored by Spectral Clustering. Hover over a point to see a brief summary of the case.

The code is available here.