# 📦 MVP Patent Ingestion Web Portal
This document outlines the architecture and design for a minimal web-based interface to ingest, validate, and inspect folders of patent files. This tool is intended to be hosted on a local or remote Linux server and provide visibility into the ingestion results for large sets of manually collected patent files.
---
## 🧩 Components
| Layer | Tech | Purpose |
|--------------|--------------------|--------------------------------------------|
| Frontend | HTML + JS | Upload interface, feedback display |
| Backend | Python (Flask/FastAPI) | Handle uploads, run ingestion, serve pages |
| Storage | Filesystem | Save uploaded and processed data |
| Viewer | HTML templates | Display patent data and validation results |
---
## 🛠️ Step-by-Step Architecture and Flow
### 1. Web Portal (Frontend)
- Simple HTML form with:
- File/folder upload (drag-and-drop or browse)
- Submit/upload button
```html
```
---
### 2. Backend (Python Flask or FastAPI)
#### Key Endpoints
| Route | Description |
|-------------------------|--------------------------------------------------|
| `GET /` | Homepage with upload form |
| `POST /upload` | Accepts .zip/folder, unpacks, runs validation |
| `GET /patents` | Lists ingested patents |
| `GET /patents/` | Displays structured data for specific patent |
#### `/upload` Flow
1. Save uploaded `.zip` to `/uploads`
2. Unpack into `/data/patents/[patent_id]/`
3. Run ingestion script on the folder:
- Validate filenames and content
- Normalize structure
- Extract structured JSON
- Log errors/warnings
4. Save results into `/data/patents/[patent_id]/result.json`
5. Redirect to detail view
---
## 📂 Storage Layout
```
/uploads/
[raw uploaded folders or zips]
/data/patents/
US_10101845_B2/
- bibliographic_data.md
- description.md
- claims.md
- legal_events.md
- patent_family.md
- original.pdf
- result.json
- log.txt
/data/index.json (optional)
```
---
## 👀 Viewing Ingested Patents
- `/patents`: List of all patents ingested
- `/patents/`: Summary view of:
- Extracted metadata
- File presence checklist
- Ingestion log
- LLM summary (future)
- Option to download source
### Example Detail View
```html
Patent: US_10101845_B2
Status: ✅ Validated
- Inventor: Kate Stone
- Assignee: Novalia
- Files Present: Bibliographic, Claims, Description
Download Folder
```
---
## 🐍 Serving the Script
- Ingestion script will be a Python module (`patent_ingestor.py`)
- Backend imports and calls: `validate_folder(path)`
- Output is saved as JSON + log
---
## 🧱 Future (Optional)
- SQLite database for structured queries
- LLM-generated summaries with tagging
- Cross-patent linking via family info
- Search, filter, and tagging UI
---
## 🔐 Permissions & Security
- SSL via reverse proxy (Apache/Nginx)
- Basic Auth if exposed beyond LAN
- Upload limits + type validation
- Auto-clean old temp files
---
## ✅ MVP Milestone Checklist
- [ ] File upload works via web form
- [ ] Upload is unzipped and normalized
- [ ] Ingestion script parses and validates content
- [ ] JSON and logs are saved
- [ ] `/patents` lists available records
- [ ] `/patents/` shows detail view
- [ ] Missing files or malformed content is logged
- [ ] Tests with 2–3 real data sets are successful
---