OmniParse Introduction
OmniParse is a platform designed to ingest and parse any unstructured data into structured, actionable data that is optimized for GenAI (LLM) applications. Whether you're working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured, and ready for AI applications such as RAG, fine-tuning, and more.
OmniParse Features
Key Features of OmniParse
- Completely Local: No external APIs required.
- Resource Efficient: Fits in a T4 GPU.
- Supports Multiple File Types: Supports approximately 20 file types.
- High-Quality Structured Output: Converts documents, multimedia, and web pages to high-quality structured markdown.
- Versatile Parsing: Includes table extraction, image extraction/captioning, audio/video transcription, and web page crawling.
- Easy Deployment: Deployable using Docker and Skypilot.
- Colab Friendly: Easily usable in Google Colab environments.
- Interactive UI: Powered by Gradio for a user-friendly experience.
Detailed Feature List
- Document Parsing: Parses PDF, PowerPoint, or Word documents.
- Media Parsing: Transcribes audio and video files using the Whisper model.
- Website Parsing: Sets up a selenium crawler for parsing websites.
OmniParse Usage
Installation
To install OmniParse, follow these steps:
- Clone the repository:
git clone https://github.com/adithya-s-k/omniparse cd omniparse
- Create a Virtual Environment:
conda create --name omniparse-venv python=3.10 conda activate omniparse-venv
- Install Dependencies:
poetry install # or pip install -e .
Docker Usage
To use OmniParse with Docker:
- Pull the OmniParse API Docker image from Docker Hub:
docker pull savatar101/omniparse:0.1
- Run the Docker container, exposing port 8000:
docker run --gpus all -p 8000:8000 savatar101/omniparse:0.1 # or docker run -p 8000:8000 savatar101/omniparse:0.1
Running the Server
To run the server, use the following command:
python server.py --host 0.0.0.0 --port 8000 --documents --media --web
--documents
: Load models for parsing and ingesting documents.--media
: Load the Whisper model for transcribing audio and video files.--web
: Set up selenium crawler for website parsing.
Supported Data Types
OmniParse supports a wide range of data types:
- Documents: .doc, .docx, .pdf, .ppt, .pptx
- Images: .png, .jpg, .jpeg, .tiff, .bmp, .heic
- Video: .mp4, .mkv, .avi, .mov
- Audio: .mp3, .wav, .aac
- Web: Dynamic webpages, http://.com
API Endpoints
OmniParse provides various API endpoints for parsing different types of data:
Document Parsing
- Parse Any Document: Endpoint
/parse_document
(Method: POST) - Parse PDF: Endpoint
/parse_document/pdf
(Method: POST) - Parse PowerPoint: Endpoint
/parse_document/ppt
(Method: POST) - Parse Word Document: Endpoint
/parse_document/docs
(Method: POST)
Media Parsing
- Parse Image: Endpoint
/parse_image/image
(Method: POST) - Process Image: Endpoint
/parse_image/process_image
(Method: POST) - Parse Video: Endpoint
/parse_media/video
(Method: POST) - Parse Audio: Endpoint
/parse_media/audio
(Method: POST)
Website Parsing
- Parse Website: Endpoint
/parse_website
(Method: POST)
OmniParse FAQs
What is the goal of OmniParse?
The final goal of OmniParse is to replace all the different models currently being used with a single MultiModel Model to parse any type of data and extract the necessary information.
Under which license is OmniParse released?
OmniParse is licensed under the GPL-3.0 license.
Are there any upcoming features?
Yes, upcoming features include integrations with LlamaIndex, Langchain, and Haystack, as well as batch processing data, dynamic chunking, and structured data extraction based on specified Schema.
Acknowledgements
This project builds upon the Marker project created by Vik Paruchuri. Special thanks to Surya-OCR and Texify for the OCR models, and to Crawl4AI for their contributions.