SCOTUS Case Embedding Mini-Visualization (only 5000 cases)

I made this visualization to give access to SCOTUS at a glance. The visualization was made using UMAP and Spectral Clustering using data from Kaggle. This visualization shows only 5000 cases to avoid overplotting. NB! This is a work in progress-- I am working on applying better summarization algorithms:-).The full dataset has 35,781 cases. The (crowded) visualization of the full dataset is available here: Full SCOTUS Visualization

What Does This Mean?

The Supreme Court of the United States (SCOTUS) decides cases that shape legal precedent across the country. This visualization aims to give a rough overview of SCOTUS opinions by clustering cases based on their textual similarity. Extracts from opionins that are deemed important by another clustering algorithm operating on sentence embeddings are displayed as summaries (this algorithm needs some work:-D).

How to Interpret This?

- **Close points** → Cases that share semantics.
- **Different clusters** → Rough aggregation of cases based on neighborhood.
- **Hover to read summaries** → See short summary of the case when you hover over a point.

About This Visualization

I used HuggingFace Transformers and KMeans (sklearn) clustering to summarize SCOTUS opinions. This type of summarization is extractive, meaning that it takes sentences from the original text without modifying them. This is why the summaries aren't fluent, but the idea here is to give a very brief overview without modifying it (language models that transform text can hallucinate). I used OpenAI API to embed the summaries in a vector space. I used UMAP to reduce the dimensionality of the embeddings to 3D and Spectral Clustering to cluster the cases. Each point represents a case, colored by Spectral Clustering. Hover over a point to see a brief summary of the case.

The code is available here.