Prepare your unstructured data for training/augmenting an LLM.DocDat is a python package which uses object detection, OCR, and LLMs to transform raw documents into structured data which can be used in RAG or Fine-Tuning systems.