Lode

A rich vein. Mine your voices.

Open the curator →
Source
arXiv
Published
Runtime
0:00
Snippets
4

A conversation between

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Waveform of the source interview with highlighted segments per snippet.
0:00 0:00

§02

Snippets

  1. LOCUS provides machine-readable access to nearly all publicly available U.S. municipal and county ordinance codes from 9,239 cities and counties.

    Local ordinances govern everyday life but have been largely inaccessible for AI research due to fragmented vendor platforms.

  2. A county-harmonized access layer covers the largest 2,309 U.S. counties, representing the majority of the American population.

    Enables researchers to study local law at national scale while maintaining geographic coherence and practical utility.

  3. OCR was applied to handle diverse document formats that previously prevented local law from becoming a public digital resource.

    Overcomes a major technical barrier that has historically kept local ordinances locked in unstructured formats.

  4. ModernBERT-based classifiers enable analysis of U.S. local law across dimensions like opacity and paternalism not previously studied at scale.

    Opens new research avenues into the characteristics and quality of local regulation across jurisdictions.

§03

Synthesis

## Local Law, Finally Machine-Readable

Local ordinances shape daily life—where you can build, what businesses can operate, noise limits, pet regulations—yet they've been invisible to legal AI research. While machine-readable corpora exist for federal and state law, local codes remain scattered across fragmented vendor websites designed for humans to browse one city at a time. The authors introduce LOCUS, a comprehensive, machine-readable corpus of U.S. municipal and county ordinance codes, filling a critical gap that has kept local law out of AI's reach.

## Building the Corpus

The raw LOCUS dataset covers ordinance codes from 9,239 cities and counties across the United States. Creating this required solving a practical problem: local ordinances arrive in dozens of incompatible formats—PDFs, HTML, scanned images, inconsistent layouts—because they're maintained by thousands of independent jurisdictions with no standardization mandate. The authors used OCR (optical character recognition) to convert these heterogeneous documents into machine-readable text.

A smaller but more polished "county-harmonized" access layer covers the largest 2,309 of 3,144 U.S. counties, representing the majority of the American population. This curated version addresses a key challenge: local law is deeply hierarchical and overlapping. Cities sit within counties, which sit within states, and conflicts arise. The harmonized layer normalizes these relationships, making the data usable for systematic research.

The authors released the corpus with metadata documenting coverage, format quality, and processing steps—crucial for reproducibility and for others to incrementally improve machine-readable local law access over time.

## Demonstrating What's Possible

To show why this matters, the authors trained ModernBERT-based classifiers (neural language models) to analyze local law along dimensions previously unstudied at scale: opacity (how difficult the text is to parse), paternalism (how much rules constrain individual choice), and other regulatory characteristics. These models can now measure how local governance varies across regions and identify patterns invisible to manual review.

## Why This Matters

Legal AI has lagged behind other domains partly because authoritative text is locked away. LOCUS opens local law to researchers, enabling new questions: How do zoning rules differ by neighborhood wealth? Do housing codes cluster certain constraints? What patterns emerge across thousands of jurisdictions? Beyond research, opening local law as a public digital resource serves democratic values—citizens should be able to read and analyze the rules governing their communities without vendor lock-in or paywalls.

The corpus represents not a finished product but an infrastructure foundation. The authors' release of raw and harmonized versions, along with trained models and coverage metadata, invites the research community to build on it. Local ordinances affect housing affordability, business formation, public health, and equity—questions that machine learning can now address at the scale and granularity they deserve.

Mine your own.

Lode is a workbench, not a feed. Paste a YouTube URL. The model proposes a transcript, a set of quote-grounded snippets, a synthesis essay, and the fan-out. You decide what stays.

Open the curator