# πŸ“¦ MVP Patent Ingestion Web Portal

This document outlines the architecture and design for a minimal web-based interface to ingest, validate, and inspect folders of patent files. This tool is intended to be hosted on a local or remote Linux server and provide visibility into the ingestion results for large sets of manually collected patent files.

β€”

## 🧩 Components

Layer Tech Purpose
————–——————–——————————————–
Frontend HTML + JS Upload interface, feedback display
Backend Python (Flask/FastAPI) Handle uploads, run ingestion, serve pages
Storage Filesystem Save uploaded and processed data
Viewer HTML templates Display patent data and validation results

β€”

## πŸ› οΈ Step-by-Step Architecture and Flow

### 1. Web Portal (Frontend) - Simple HTML form with:

  1. File/folder upload (drag-and-drop or browse)
  2. Submit/upload button

```html <form action=β€œ/upload” method=β€œpost” enctype=β€œmultipart/form-data”>

<input type="file" name="patent_zip" webkitdirectory directory multiple />
<button type="submit">Upload Patent Folder</button>

</form> ```

β€”

### 2. Backend (Python Flask or FastAPI)

#### Key Endpoints

Route Description
β€”β€”β€”β€”β€”β€”β€”β€”-————————————————–
`GET /` Homepage with upload form
`POST /upload` Accepts .zip/folder, unpacks, runs validation
`GET /patents` Lists ingested patents
`GET /patents/<id>` Displays structured data for specific patent

#### `/upload` Flow 1. Save uploaded `.zip` to `/uploads` 2. Unpack into `/data/patents/[patent_id]/` 3. Run ingestion script on the folder:

  1. Validate filenames and content
  2. Normalize structure
  3. Extract structured JSON
  4. Log errors/warnings

4. Save results into `/data/patents/[patent_id]/result.json` 5. Redirect to detail view

β€”

## πŸ“‚ Storage Layout

``` /uploads/

[raw uploaded folders or zips]

/data/patents/

US_10101845_B2/
  - bibliographic_data.md
  - description.md
  - claims.md
  - legal_events.md
  - patent_family.md
  - original.pdf
  - result.json
  - log.txt

/data/index.json (optional) ```

β€”

## πŸ‘€ Viewing Ingested Patents

- `/patents`: List of all patents ingested - `/patents/<id>`: Summary view of:

  1. Extracted metadata
  2. File presence checklist
  3. Ingestion log
  4. LLM summary (future)
  5. Option to download source

### Example Detail View ```html <h1>Patent: US_10101845_B2</h1> <p>Status: βœ… Validated</p> <ul>

<li><b>Inventor:</b> Kate Stone</li>
<li><b>Assignee:</b> Novalia</li>
<li><b>Files Present:</b> Bibliographic, Claims, Description</li>

</ul> <a href=β€œ/files/US_10101845_B2.zip”>Download Folder</a> ```

β€”

## 🐍 Serving the Script

- Ingestion script will be a Python module (`patent_ingestor.py`) - Backend imports and calls: `validate_folder(path)` - Output is saved as JSON + log

β€”

## 🧱 Future (Optional)

- SQLite database for structured queries - LLM-generated summaries with tagging - Cross-patent linking via family info - Search, filter, and tagging UI

β€”

## πŸ” Permissions & Security

- SSL via reverse proxy (Apache/Nginx) - Basic Auth if exposed beyond LAN - Upload limits + type validation - Auto-clean old temp files

β€”

## βœ… MVP Milestone Checklist

- [ ] File upload works via web form - [ ] Upload is unzipped and normalized - [ ] Ingestion script parses and validates content - [ ] JSON and logs are saved - [ ] `/patents` lists available records - [ ] `/patents/<id>` shows detail view - [ ] Missing files or malformed content is logged - [ ] Tests with 2–3 real data sets are successful

β€”