# 📦 MVP Patent Ingestion Web Portal This document outlines the architecture and design for a minimal web-based interface to ingest, validate, and inspect folders of patent files. This tool is intended to be hosted on a local or remote Linux server and provide visibility into the ingestion results for large sets of manually collected patent files. --- ## 🧩 Components | Layer | Tech | Purpose | |--------------|--------------------|--------------------------------------------| | Frontend | HTML + JS | Upload interface, feedback display | | Backend | Python (Flask/FastAPI) | Handle uploads, run ingestion, serve pages | | Storage | Filesystem | Save uploaded and processed data | | Viewer | HTML templates | Display patent data and validation results | --- ## 🛠️ Step-by-Step Architecture and Flow ### 1. Web Portal (Frontend) - Simple HTML form with: - File/folder upload (drag-and-drop or browse) - Submit/upload button ```html
``` --- ### 2. Backend (Python Flask or FastAPI) #### Key Endpoints | Route | Description | |-------------------------|--------------------------------------------------| | `GET /` | Homepage with upload form | | `POST /upload` | Accepts .zip/folder, unpacks, runs validation | | `GET /patents` | Lists ingested patents | | `GET /patents/` | Displays structured data for specific patent | #### `/upload` Flow 1. Save uploaded `.zip` to `/uploads` 2. Unpack into `/data/patents/[patent_id]/` 3. Run ingestion script on the folder: - Validate filenames and content - Normalize structure - Extract structured JSON - Log errors/warnings 4. Save results into `/data/patents/[patent_id]/result.json` 5. Redirect to detail view --- ## 📂 Storage Layout ``` /uploads/ [raw uploaded folders or zips] /data/patents/ US_10101845_B2/ - bibliographic_data.md - description.md - claims.md - legal_events.md - patent_family.md - original.pdf - result.json - log.txt /data/index.json (optional) ``` --- ## 👀 Viewing Ingested Patents - `/patents`: List of all patents ingested - `/patents/`: Summary view of: - Extracted metadata - File presence checklist - Ingestion log - LLM summary (future) - Option to download source ### Example Detail View ```html

Patent: US_10101845_B2

Status: ✅ Validated

Download Folder ``` --- ## 🐍 Serving the Script - Ingestion script will be a Python module (`patent_ingestor.py`) - Backend imports and calls: `validate_folder(path)` - Output is saved as JSON + log --- ## 🧱 Future (Optional) - SQLite database for structured queries - LLM-generated summaries with tagging - Cross-patent linking via family info - Search, filter, and tagging UI --- ## 🔐 Permissions & Security - SSL via reverse proxy (Apache/Nginx) - Basic Auth if exposed beyond LAN - Upload limits + type validation - Auto-clean old temp files --- ## ✅ MVP Milestone Checklist - [ ] File upload works via web form - [ ] Upload is unzipped and normalized - [ ] Ingestion script parses and validates content - [ ] JSON and logs are saved - [ ] `/patents` lists available records - [ ] `/patents/` shows detail view - [ ] Missing files or malformed content is logged - [ ] Tests with 2–3 real data sets are successful ---