# π¦ MVP Patent Ingestion Web Portal
This document outlines the architecture and design for a minimal web-based interface to ingest, validate, and inspect folders of patent files. This tool is intended to be hosted on a local or remote Linux server and provide visibility into the ingestion results for large sets of manually collected patent files.
β
## π§© Components
| Layer | Tech | Purpose |
| βββββ | βββββββ | βββββββββββββββ |
| Frontend | HTML + JS | Upload interface, feedback display |
| Backend | Python (Flask/FastAPI) | Handle uploads, run ingestion, serve pages |
| Storage | Filesystem | Save uploaded and processed data |
| Viewer | HTML templates | Display patent data and validation results |
β
## π οΈ Step-by-Step Architecture and Flow
### 1. Web Portal (Frontend) - Simple HTML form with:
```html <form action=β/uploadβ method=βpostβ enctype=βmultipart/form-dataβ>
<input type="file" name="patent_zip" webkitdirectory directory multiple /> <button type="submit">Upload Patent Folder</button>
</form> ```
β
### 2. Backend (Python Flask or FastAPI)
#### Key Endpoints
| Route | Description |
| ββββββββ- | βββββββββββββββββ |
| `GET /` | Homepage with upload form |
| `POST /upload` | Accepts .zip/folder, unpacks, runs validation |
| `GET /patents` | Lists ingested patents |
| `GET /patents/<id>` | Displays structured data for specific patent |
#### `/upload` Flow 1. Save uploaded `.zip` to `/uploads` 2. Unpack into `/data/patents/[patent_id]/` 3. Run ingestion script on the folder:
4. Save results into `/data/patents/[patent_id]/result.json` 5. Redirect to detail view
β
## π Storage Layout
``` /uploads/
[raw uploaded folders or zips]
/data/patents/
US_10101845_B2/ - bibliographic_data.md - description.md - claims.md - legal_events.md - patent_family.md - original.pdf - result.json - log.txt
/data/index.json (optional) ```
β
## π Viewing Ingested Patents
- `/patents`: List of all patents ingested - `/patents/<id>`: Summary view of:
### Example Detail View ```html <h1>Patent: US_10101845_B2</h1> <p>Status: β Validated</p> <ul>
<li><b>Inventor:</b> Kate Stone</li> <li><b>Assignee:</b> Novalia</li> <li><b>Files Present:</b> Bibliographic, Claims, Description</li>
</ul> <a href=β/files/US_10101845_B2.zipβ>Download Folder</a> ```
β
## π Serving the Script
- Ingestion script will be a Python module (`patent_ingestor.py`) - Backend imports and calls: `validate_folder(path)` - Output is saved as JSON + log
β
## π§± Future (Optional)
- SQLite database for structured queries - LLM-generated summaries with tagging - Cross-patent linking via family info - Search, filter, and tagging UI
β
## π Permissions & Security
- SSL via reverse proxy (Apache/Nginx) - Basic Auth if exposed beyond LAN - Upload limits + type validation - Auto-clean old temp files
β
## β MVP Milestone Checklist
- [ ] File upload works via web form - [ ] Upload is unzipped and normalized - [ ] Ingestion script parses and validates content - [ ] JSON and logs are saved - [ ] `/patents` lists available records - [ ] `/patents/<id>` shows detail view - [ ] Missing files or malformed content is logged - [ ] Tests with 2β3 real data sets are successful
β