Package ragindexer
RAG Indexer
Usage
Documentation
https://ydethe.github.io/ragindexer/ragindexer/
Testing
Run the tests
To run tests, just run:
pytest
Test reports
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased
Added
- Added publication of qdrant snapshot on github pages (e54301c by Yann de The).
- Added link to doc pages (080304e by Yann de The).
- Added github pages publication (3fb26fb by Yann de The).
- Added MIN_EXPECTED_CHAR env var (c5011fc by Yann de The).
- Added test documents (9dc3019 by Yann de The).
- Added logging msg to tell if OCR cache is reused (168a97d by Yann de The).
- Added emails watcher (9de852c by Yann de The).
- Added emails observer (511da4a by Yann de The).
- Added API keys and remove of deprecated method QdrantClient.search (12d727f by Yann de The).
- Added page progress (fd78e72 by Yann de The).
- Added new_ocr for later integration (c56d9d7 by Yann de The).
- Added a cache for OCR (c4ce7d9 by Yann de The).
- Added OCR cache (8dc4205 by Yann de The).
- Added unit test to check embedding relevance (d9e072f by Yann de The).
- Added progress bars for file analysis. pdf tested OK (be49834 by Yann de The).
- Added sqlite db to keep track of the indexed files (8f5c67f by Yann de The).
Fixed
- Fixed Exception raised when no text read (88c078f by Yann de The).
- Fixed empty chunks (a72e49a by Yann de The).
- Fixed pid computation (e3840c1 by Yann de The).
- Fixed OCR cache determination (91dbc4d by Yann de The).
- Fixed qdrant hostname (def96ad by Yann de The).
- Fixed NLTK downloads (10e1dae by Yann de The).
- Fixed bug where a whole pdf file is skipped if one page has no text (301e5eb by Yann de The).
- Fixed image name (eadcf30 by Yann de The).
- Fixed pdf error (cff80bb by Yann de The).
- Fixed action (694bde7 by Yann de The).
- Fixed docker compose stack (814adf6 by Yann de The).
- Fixing pipeline (00107b2 by Yann de The).
Removed
Sub-modules
ragindexer.DocumentIndexer
ragindexer.QdrantIndexer
ragindexer.config
ragindexer.documents
ragindexer.index_database
ragindexer.models