# LLM Law Library — Complete Documentation

## Overview

Open legal research infrastructure. Caselaw, statutes, and statutory definitions across all US jurisdictions — searchable via MCP, REST API, or web interface. No Westlaw subscription. No hallucinated citations. No authentication required.

**Web:** https://implausible.enterprises/projects/law-library/
**API base:** https://implausible.enterprises/api/caselaw/
**MCP source:** https://github.com/giblfiz/llm-law-lib
**MCP file:** https://raw.githubusercontent.com/giblfiz/llm-law-lib/master/caselaw-mcp.py

## Corpus Contents

### Case Law
- **10.7 million court opinions**, 1685 to present
- **77 million embedded passages** (384-dim, all-MiniLM-L6-v2)
- **All 50 US states + federal courts** (3,229 courts total)
- **6.3 million parentheticals** — court-written one-sentence case summaries
- **3.7 million Shepardizer treatments** — how later courts treated each case (16 types)
- **76 million citation links** between opinions
- Source: CourtListener / Free Law Project. CC0 public domain.

### Statutes
- **1,933,573 statutory sections** across **52 jurisdictions**
- All 50 US states + District of Columbia + federal (US Code)
- Federal: 60,909 sections from OLRC USLM XML (release point PL 119-73)
- States: scraped from official state legislative websites
- BM25 deterministic search with citation boosting

### Definitions
- **876,000 definition pointers** — where each legal term is statutorily defined
- **210,000 unique legal terms** across 52 jurisdictions
- **715,000 extracted definitions** with full definition text
- **6,600+ red flags** — terms used in statutes but NOT defined in that jurisdiction
- **135,000 definition embeddings** for semantic alias discovery
- **160 LLM-confirmed cross-state term aliases** (e.g., DE "police officer" = CA "peace officer")

### NOT Included
- Briefs, motions, and party filings (live in PACER/RECAP)
- Oral argument transcripts
- Code of Federal Regulations (planned)
- Building codes and standards (ICC, NFPA, etc.)
- Non-US law

---

## Getting Started

### Option 1: MCP for Claude Code (recommended)

The MCP server is a single Python file. No dependencies beyond Python 3 standard library.

```bash
curl -O https://raw.githubusercontent.com/giblfiz/llm-law-lib/master/caselaw-mcp.py
claude mcp add llm-law-library python3 $(pwd)/caselaw-mcp.py
```

This gives Claude 9 tools: `caselaw_search`, `caselaw_case`, `caselaw_shepardize`, `statute_search`, `statute_section`, `term_search`, `jurisdiction_terms`, `law_library_help`, `law_library_stats`.

### Option 2: MCP for Claude Desktop

Download [caselaw-mcp.py](https://raw.githubusercontent.com/giblfiz/llm-law-lib/master/caselaw-mcp.py) and add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "llm-law-library": {
      "command": "python3",
      "args": ["/path/to/caselaw-mcp.py"]
    }
  }
}
```

### Option 3: REST API directly

No authentication. JSON in, JSON out. Base URL: `https://implausible.enterprises/api/caselaw/`

All endpoints below are POST unless noted. Send JSON body with `Content-Type: application/json`.

---

## MCP Tools Reference

### caselaw_search

Search the complete US case law corpus.

**Parameters:**
- `query` (string, required) — natural language, citation, or party name
- `mode` (string, default "hybrid") — `"hybrid"`, `"bm25"`, or `"vector"` (see Search Modes below)
- `top_k` (integer, default 10, max 25) — number of results
- `court_id` (string, optional) — filter by court (e.g., "scotus", "ca9", "deld"). BM25/hybrid only.
- `source` (string, optional) — `"opinion"` or `"parenthetical"`. Vector mode only.

**Returns:** Ranked results with case name, date, court, citation count, passage text, and Shepardizer treatment data.

**Examples:**
- Conceptual: `{"query": "qualified immunity police excessive force", "mode": "hybrid"}`
- By name: `{"query": "Miranda v. Arizona", "mode": "bm25"}`
- By citation: `{"query": "410 U.S. 113", "mode": "bm25"}`
- Parentheticals only: `{"query": "automobile exception warrant", "mode": "vector", "source": "parenthetical"}`
- Specific court: `{"query": "stare decisis", "court_id": "scotus"}`

### caselaw_case

Load the full text of a court opinion.

**Parameters:**
- `opinion_id` (integer, required) — from search results' `source_id` or `opinion_id` field

**Returns:** Case name, date, judges, citation count, Shepardizer summary, syllabus, and complete opinion text.

### caselaw_shepardize

Check whether a case is still good law.

**Parameters:**
- `opinion_id` (integer, required) — the opinion to shepardize

**Returns:** Treatment summary showing how subsequent courts treated this case (affirmed, distinguished, overruled, etc.) with counts. Includes both total and LLM-verified counts for negative treatments.

### statute_search

Search US statutory codes. BM25 deterministic lexical search.

**Parameters:**
- `query` (string, required) — citation, topic, or keyword
- `jurisdiction` (string, optional) — filter to one jurisdiction (see Jurisdiction Codes below). Omit to search all 52.
- `top_k` (integer, default 10, max 25)
- `status` (string, optional) — filter by status: `"active"`, `"repealed"`, `"renumbered"`, etc.

**Returns:** Ranked results with citation, catch line, jurisdiction, code name, status, snippet, and definition count (how many defined terms appear in each section).

**Examples:**
- By citation: `{"query": "42 U.S.C. § 1983"}`
- By topic in one state: `{"query": "negligence", "jurisdiction": "state:CA"}`
- Federal only: `{"query": "freedom of speech", "jurisdiction": "federal"}`

### statute_section

Load the full text of a statute section, plus definitions and red flags.

**Parameters:**
- `citation` (string) — official citation (e.g., "42 U.S.C. § 1983", "11 Del. C. § 101")
- `statute_version_id` (integer) — alternative to citation, from search results

Provide one of `citation` or `statute_version_id`.

**Returns:**
- Full section text, catch line, status, effective date
- **Definitions**: all statutory definitions found in this section (term, scope, signal type, caselaw citation count, snippet)
- **Red flags**: terms used in this section that have NO statutory definition in this jurisdiction

### term_search

Look up where a legal term is statutorily defined across US jurisdictions.

**Parameters:**
- `term` (string, required) — the legal term (e.g., "person", "fiduciary", "motor vehicle")
- `jurisdiction` (string, optional) — filter to one jurisdiction
- `include_red_flags` (boolean, default false) — also return sections that USE this term but have no definition for it
- `top_k` (integer, default 20, max 50)

**Returns:**
- How many jurisdictions define this term
- Definition pointers: where the term is defined, with jurisdiction, section citation, scope hint, signal type, caselaw citation count, and snippet
- Similar terms (if few exact matches found)
- Red flags (if `include_red_flags` is true): jurisdictions/sections where the term is used but not defined

**Examples:**
- All jurisdictions: `{"term": "fiduciary"}`
- One state: `{"term": "person", "jurisdiction": "state:DE"}`
- With red flags: `{"term": "motor vehicle", "include_red_flags": true}`

### jurisdiction_terms

List all statutorily defined terms for a jurisdiction, ranked by caselaw importance.

**Parameters:**
- `jurisdiction` (string, required) — e.g., "federal", "state:CA"
- `top_k` (integer, default 50, max 200)

**Returns:** Total defined terms, red flag count, and top terms with definition count and max caselaw citations.

### law_library_help

Returns comprehensive built-in documentation (no parameters). Useful for orienting an LLM about what tools are available and how to use them.

### law_library_stats

Returns detailed index statistics: passage counts, statute coverage, definition counts, Shepardizer treatment counts, and per-jurisdiction section counts.

---

## REST API Reference

Base URL: `https://implausible.enterprises/api/caselaw/`

All POST endpoints accept JSON body with `Content-Type: application/json`.

### Case Law Endpoints

**POST /search** — Search court opinions
```json
{"query": "qualified immunity", "mode": "hybrid", "top_k": 10}
```

**POST /case** — Load full opinion
```json
{"opinion_id": 12345}
```

**POST /context** — Get surrounding text for a passage (expand-in-place)
```json
{"source_type": "opinion", "source_id": 12345, "passage_text": "matched text", "context_chars": 8000}
```

**POST /caselaw/verify** — Verify a single legal citation
```json
{"citation": "410 U.S. 113"}
```
Returns: whether the citation exists in the corpus, the matched case, and Shepardizer treatment data.

**POST /caselaw/verify-brief** — Extract and verify ALL citations in a block of text
```json
{"text": "In Roe v. Wade, 410 U.S. 113 (1973), the Court held..."}
```
Returns: every citation found via Eyecite, whether each exists, and treatments for each.

### Statute Endpoints

**POST /statute/search** — Search statutory codes
```json
{"query": "negligence", "jurisdiction": "state:CA", "top_k": 10}
```

**POST /statute/section** — Load full statute section
```json
{"citation": "42 U.S.C. § 1983"}
```

### Definition Endpoints

**POST /term/search** — Look up a term
```json
{"term": "fiduciary", "jurisdiction": "state:DE", "include_red_flags": true}
```

**POST /term/jurisdiction** — List defined terms for a jurisdiction
```json
{"jurisdiction": "state:CA", "top_k": 50}
```

### Utility

**GET /stats** — Index statistics (caselaw, statutes, definitions, per-jurisdiction breakdown)
**GET /health** — Health check with cache statistics

---

## Search Modes (Case Law)

| Mode | Best For | Deterministic? | How It Works |
|------|----------|---------------|-------------|
| `hybrid` | General legal research | No | Fuses BM25 lexical + semantic vector via Reciprocal Rank Fusion (k=60). Best of both. |
| `bm25` | Exact citations, party names, specific phrases | **Yes** | ParadeDB pg_search. Same query always returns same results. |
| `vector` | Conceptual/semantic queries | No | 384-dim all-MiniLM-L6-v2 embeddings, pgvector IVFFlat (8,000 clusters). |

**Why determinism matters:** If a lawyer testifies "I searched for X and got Y," BM25 mode guarantees reproducibility. This is a malpractice defense requirement. Hybrid and vector modes involve approximate nearest neighbor search and may return slightly different results across queries.

**BM25 field boosts:**
- `citations_str`: 10x (exact citation matches dominate)
- `case_name` / `case_name_full`: 5x (party name queries)
- `parentheticals_agg`: 3x (court-written summaries)
- `body`: 1x (full opinion text)

Statute search is BM25-only with similar boosting: `official_citation` 10x, `citation_variants_str` 8x, `catch_line` 5x, `section_text` 1x.

---

## Jurisdiction Codes

Use these codes in the `jurisdiction` parameter for statute and definition tools.

| Code | Jurisdiction |
|------|-------------|
| `federal` | United States Code (federal statutes) |
| `state:AL` | Alabama |
| `state:AK` | Alaska |
| `state:AZ` | Arizona |
| `state:AR` | Arkansas |
| `state:CA` | California |
| `state:CO` | Colorado |
| `state:CT` | Connecticut |
| `state:DE` | Delaware |
| `state:DC` | District of Columbia |
| `state:FL` | Florida |
| `state:GA` | Georgia |
| `state:HI` | Hawaii |
| `state:ID` | Idaho |
| `state:IL` | Illinois |
| `state:IN` | Indiana |
| `state:IA` | Iowa |
| `state:KS` | Kansas |
| `state:KY` | Kentucky |
| `state:LA` | Louisiana |
| `state:ME` | Maine |
| `state:MD` | Maryland |
| `state:MA` | Massachusetts |
| `state:MI` | Michigan |
| `state:MN` | Minnesota |
| `state:MS` | Mississippi |
| `state:MO` | Missouri |
| `state:MT` | Montana |
| `state:NE` | Nebraska |
| `state:NV` | Nevada |
| `state:NH` | New Hampshire |
| `state:NJ` | New Jersey |
| `state:NM` | New Mexico |
| `state:NY` | New York |
| `state:NC` | North Carolina |
| `state:ND` | North Dakota |
| `state:OH` | Ohio |
| `state:OK` | Oklahoma |
| `state:OR` | Oregon |
| `state:PA` | Pennsylvania |
| `state:RI` | Rhode Island |
| `state:SC` | South Carolina |
| `state:SD` | South Dakota |
| `state:TN` | Tennessee |
| `state:TX` | Texas |
| `state:UT` | Utah |
| `state:VT` | Vermont |
| `state:VA` | Virginia |
| `state:WA` | Washington |
| `state:WV` | West Virginia |
| `state:WI` | Wisconsin |
| `state:WY` | Wyoming |

For caselaw search, use `court_id` instead (e.g., `scotus`, `ca9`, `ca2`, `deld`, `nysd`). Court IDs follow CourtListener conventions.

---

## Shepardizer

Named after Shepard's Citations (1873). The Shepardizer tracks how later courts treated earlier decisions. Built from the CourtListener citation graph: 3.7 million classified treatments across 16 granular types.

### Treatment Types (severity order, highest first)

| Type | Category | Meaning |
|------|----------|---------|
| `overruled` | Negative | Later court explicitly overruled this precedent. Case is likely no longer good law. |
| `abrogated` | Negative | Precedent undermined by a later case without being explicitly overruled. |
| `reversed` | Negative | Higher court reversed this specific decision on appeal. |
| `vacated` | Negative | Decision was vacated (set aside), often for procedural reasons. |
| `superseded` | Negative | Superseded by a later statute or regulation. |
| `questioned` | Cautionary | Later court questioned the reasoning but didn't overrule. |
| `distinguished` | Cautionary | Later court distinguished this case on the facts — applied different rule. |
| `limited` | Cautionary | Later court limited the holding to narrower circumstances. |
| `criticized` | Cautionary | Later court criticized the reasoning. |
| `modified` | Cautionary | Later court modified the holding in some way. |
| `affirmed` | Positive | Higher court affirmed this decision on appeal. |
| `followed` | Positive | Later court followed this precedent. |
| `upheld` | Positive | Decision was upheld. |
| `adopted` | Positive | Another court adopted this reasoning. |
| `approved` | Positive | Court approved of this precedent's reasoning. |
| `reaffirmed` | Positive | Court explicitly reaffirmed this precedent. |

### Interpreting Shepardizer Results

The Shepardizer response includes:
- `treatments`: dict of treatment_type -> count (all treatments)
- `verified_treatments`: dict of treatment_type -> count (LLM-verified subset)
- `worst`: the most severe negative treatment found (e.g., "overruled")

**Key guidance:**
- If `worst` is "overruled" or "reversed", the case may no longer be good law. **Investigate before citing.**
- "Distinguished" is common and often benign — it means a different court applied a different rule to different facts.
- High counts of "followed" and "affirmed" are strong positive signals.
- The Shepardizer is automated (citation graph + regex + LLM verification). It is NOT a replacement for Shepard's Citations or KeyCite for cases you intend to cite in court filings.

---

## Definition Pointers

We store WHERE terms are defined, not the definitions themselves. This is a deliberate design choice.

"Person" might have 40+ different statutory definitions across 30 jurisdictions, each with a different scope ("as used in this chapter," "for purposes of this title," "in this section"). Extracting and presenting a single "definition" would be misleading. Instead, we point you to the right section so you can read the definition in its full statutory context.

### Signal Types (trust hierarchy, highest confidence first)

| Signal Type | Source | Confidence | Count |
|-------------|--------|-----------|-------|
| `xml_structured` | Authoritative XML source (federal USLM) | Highest | 11,500 |
| `caselaw_cite` | Found in court opinions citing the statute as defining this term | High | 15,000 |
| `bm25_pattern` | Found via BM25 phrase search using linking phrases ("means", "shall include", "is defined as") | Medium | 720,000 |
| `flash_classification` | LLM-classified definition section (Gemini Flash) | Medium | 124,000 |
| `red_flag` | Term used in statute but NOT defined in this jurisdiction | N/A (gap signal) | 6,600 |

### Scope Hints

Each definition pointer includes a `scope_hint` indicating how broadly the definition applies:
- `section` — "in this section" / "for purposes of this section"
- `chapter` — "as used in this chapter" / "in this chapter"
- `title` — "for purposes of this title"
- `code` — "in this code"
- `unknown` — scope not determinable from surrounding text

**Important:** Scope hints are weak labels extracted from nearby text. They are not authoritative. Always read the full statutory section to confirm the actual scope of a definition.

---

## Red Flags (Used-But-Undefined)

The most dangerous gap in legal research is not a definition you disagree with — it's a definition you don't know exists. A lawyer briefs the ordinary meaning of a term, but the statute defines it to include or exclude key categories.

Red flags detect this: a term is heavily used in a jurisdiction's statutes but has NO statutory definition found in that jurisdiction.

### Four Possible Causes

1. **Cross-reference gap** — the definition exists in a general definitions chapter (e.g., "terms used in this title have the meanings given in § 101") but we haven't linked it
2. **Cross-reference we missed** — "fiduciary has the meaning given in 15 U.S.C. § 80a-2"
3. **Common-law reliance** — the legislature intentionally relies on the common-law meaning (legitimate, but worth knowing)
4. **Extraction recall gap** — we missed the definition in our extraction pipeline

### How to Use Red Flags

When you see a red flag on a statute section or term lookup:
- **Don't panic.** Many red flags are case 3 (common-law reliance) — normal and expected.
- **Check the general definitions section** for that title/chapter. Many statutes have a § 101 or § 1-101 that defines terms for the entire title.
- **Search for cross-references.** The term might be defined by reference to another statute.
- **If the term is critical to your analysis**, research the common-law meaning in that jurisdiction. The absence of a statutory definition is itself legally significant.

---

## Caselaw Jurisdiction Coverage

| Type | Courts | Opinions | Description |
|------|--------|----------|-------------|
| State Appellate (SA) | 108 | 3.9M | State intermediate appeals courts |
| State Supreme (S) | 55 | 2.8M | State highest courts |
| Federal Appellate (F) | 127 | 2.0M | US Circuit Courts (1st-11th, DC, Federal) |
| Federal District (FD) | 125 | 1.2M | Federal trial courts |
| State Trial (ST) | 2,595 | 268K | State trial courts (limited coverage) |
| Federal Special (FS) | 39 | 245K | Tax Court, Court of Claims, etc. |
| Federal Bankruptcy (FB) | 95 | 71K | Bankruptcy courts |
| State Special (SS) | 86 | 70K | State specialty courts |

Total: 3,229 courts, 10.7 million opinions. Roughly 2:1 state-to-federal.

---

## Quality and Limitations

1. **General-purpose embeddings.** all-MiniLM-L6-v2 was not trained on legal text. Works well for common legal concepts but may not distinguish closely related legal theories as well as a domain-specific model.

2. **Corpus is a snapshot.** Caselaw from CourtListener 2025-12-31 bulk data. No automatic updates currently.

3. **Shepardizer is automated.** 3.7M treatments from citation graph + regex + LLM verification. False positives exist, especially in "overruled" (many are procedural "overruled the objection" not precedential). NOT a replacement for Shepard's or KeyCite on cases you intend to cite in court.

4. **Definition coverage varies.** 52 jurisdictions loaded, 7 fully "cooked" (DE, TX, CA, NY, VA, NJ, FL) with full LLM-verified extraction. Others have BM25 pointers but not full verified extractions.

5. **Statute text is unofficial.** State statutes are scraped from legislative websites, not from official codification publishers. Federal statutes are from OLRC USLM XML (authoritative). Always verify against the official published code for court filings.

6. **This is not legal advice.** This is a search tool. Results should be verified against primary sources before reliance in any legal proceeding.

---

## Data Provenance

- **Caselaw:** CourtListener / Free Law Project (501(c)(3) nonprofit). CC0 public domain. https://www.courtlistener.com/
- **Federal statutes:** Office of the Law Revision Counsel, USLM XML. https://uscode.house.gov/
- **State statutes:** Official state legislative websites (scraped). Public law.
- **Embeddings:** all-MiniLM-L6-v2 (384 dimensions). https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- **BM25 engine:** ParadeDB pg_search (Tantivy/Rust). https://www.paradedb.com/
- **Vector index:** pgvector IVFFlat (8,000 clusters, 117 GB).

---

## Contact

- fauna@pottash.net — Fauna, the entity that built this
- https://github.com/giblfiz/llm-law-lib — Source code, MCP server, issues
- https://fauna.implausible.enterprises — Fauna's blog
