# OpenContracts - Full MCP Documentation

> OpenContracts is an open-source document analytics platform for analyzing, annotating, and querying complex documents. It provides a Model Context Protocol (MCP) server for AI agent access to public data.

## MCP Server Overview

OpenContracts exposes a read-only MCP server so that AI assistants can access public corpuses, documents, annotations, and discussion threads without authentication.

- Global endpoint: http://portal.glitnirhealth.com/mcp/
- Corpus-scoped endpoint: http://portal.glitnirhealth.com/mcp/corpus/{corpus_slug}/
- Protocol: JSON-RPC 2.0 (MCP specification 2025-03-26)
- Transport: Streamable HTTP (recommended), SSE (deprecated)
- Authentication: None required (public data only)
- Rate limit: 100 requests/minute per IP
- Security: Read-only, slug-based identifiers, no internal IDs exposed

## Connecting

### Claude Desktop (Global Access)

Add to `~/.config/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "opencontracts": {
      "command": "npx",
      "args": ["mcp-remote", "http://portal.glitnirhealth.com/mcp/"]
    }
  }
}
```

### Claude Desktop (Corpus-Scoped)

```json
{
  "mcpServers": {
    "my-corpus": {
      "command": "npx",
      "args": ["mcp-remote", "http://portal.glitnirhealth.com/mcp/corpus/MY_CORPUS_SLUG/"]
    }
  }
}
```

### Direct HTTP (curl)

```bash
curl -X POST http://portal.glitnirhealth.com/mcp/ \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
```

## Tools Reference

### list_public_corpuses

List public corpuses visible to anonymous users.

Parameters:
- limit (int, default 20, max 100): Number of results
- offset (int, default 0): Pagination offset
- search (string, optional): Filter by title or description

Returns: { total_count, corpuses: [{ slug, title, description, document_count, created }] }

Example request:
```json
{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "list_public_corpuses",
    "arguments": { "limit": 10 }
  },
  "id": 1
}
```

### list_documents

List documents in a public corpus.

Parameters:
- corpus_slug (string, required): Corpus identifier
- limit (int, default 50, max 100): Number of results
- offset (int, default 0): Pagination offset
- search (string, optional): Filter by title or description

Returns: { total_count, documents: [{ slug, title, description, file_type, page_count, created }] }

### get_document_text

Retrieve full extracted text from a document.

Parameters:
- corpus_slug (string, required): Corpus identifier
- document_slug (string, required): Document identifier

Returns: { document_slug, page_count, text }

### list_annotations

List annotations on a document with optional filtering.

Parameters:
- corpus_slug (string, required): Corpus identifier
- document_slug (string, required): Document identifier
- page (int, optional): Filter by page number
- label_text (string, optional): Filter by label text
- limit (int, default 100, max 100): Number of results
- offset (int, default 0): Pagination offset

Returns: { total_count, annotations: [{ id, page, raw_text, annotation_label: { text, color, label_type }, structural, created }] }

### search_corpus

Semantic vector search within a corpus. Falls back to text search if embeddings are unavailable.

Parameters:
- corpus_slug (string, required): Corpus identifier
- query (string, required): Search query text
- limit (int, default 10, max 50): Number of results

Returns: { query, results: [{ type, slug, title, similarity_score }] }

### list_threads

List discussion threads in a corpus or document.

Parameters:
- corpus_slug (string, required): Corpus identifier
- document_slug (string, optional): Filter to a specific document
- limit (int, default 20, max 100): Number of results
- offset (int, default 0): Pagination offset

Returns: { total_count, threads: [{ id, title, message_count, is_pinned, is_locked }] }

### get_thread_messages

Retrieve all messages in a thread.

Parameters:
- corpus_slug (string, required): Corpus identifier
- thread_id (int, required): Thread identifier
- flatten (bool, default false): Return flat list instead of tree

Returns: { thread_id, title, messages: [{ id, content, author, created_at, replies? }] }

## Resources Reference

Resources use URI patterns for direct content access via the `resources/read` method.

### corpus://{corpus_slug}

Corpus metadata including title, description, document count, label set, and timestamps.

### document://{corpus_slug}/{document_slug}

Document metadata and full extracted text. Returns JSON with fields: slug, title, description, file_type, page_count, text_preview (first 500 characters of extracted text), full_text (complete extracted text), created (ISO 8601 timestamp), corpus (corpus slug). The text_preview field is useful for quick inspection without consuming the full text, which can be large.

### annotation://{corpus_slug}/{document_slug}/{annotation_id}

Annotation details including raw text, label, page number, bounding box coordinates, and created timestamp.

### thread://{corpus_slug}/threads/{thread_id}

Discussion thread with hierarchical message tree.

Example:
```json
{
  "jsonrpc": "2.0",
  "method": "resources/read",
  "params": { "uri": "document://my-corpus/contract-2024" },
  "id": 1
}
```

## REST Search API

A lightweight JSON search endpoint is available at `http://portal.glitnirhealth.com/api/search/` for crawlers and integrations that prefer simple HTTP GET over GraphQL or MCP.

### GET /api/search/

Parameters (query string):
- q (string, required): Search query text
- corpus (string, optional): Corpus slug to scope the search
- limit (int, optional, default 10, max 50): Number of results

Example:
```bash
curl 'http://portal.glitnirhealth.com/api/search/?q=indemnification&corpus=my-corpus&limit=5'
```

Returns: { query, corpus?, results: [{ type, slug, title, description, similarity_score }] }

When a corpus is specified, semantic vector search is attempted first and falls back to text matching. Without a corpus, the endpoint searches across all public corpus titles/descriptions and document titles/descriptions.

## Corpus-Scoped Endpoints

When using `http://portal.glitnirhealth.com/mcp/corpus/{corpus_slug}/`, the `corpus_slug` parameter is automatically injected into all tool calls. The `list_public_corpuses` tool is replaced by `get_corpus_info` which returns detailed information about the scoped corpus.

Scoped endpoints are ideal for sharing - the URL contains the corpus context, so collaborators do not need to know the corpus slug.

## Architecture

```
MCP Client  <--JSON-RPC 2.0-->  ASGI Router (/mcp/*)
                                     |
                            +--------+--------+
                            |                 |
                    Global Server     Corpus-Scoped Server
                    (all corpuses)    (single corpus, cached)
                            |                 |
                            +--------+--------+
                                     |
                              Django ORM
                          visible_to_user()
                           (AnonymousUser)
```

## Links

- [Source code](https://github.com/Open-Source-Legal/OpenContracts)
- [Project site](https://contracts.opensource.legal)
- [MCP specification](https://modelcontextprotocol.io)