Package ragindexer

RAG Indexer

Usage

Documentation

https://ydethe.github.io/ragindexer/ragindexer/

Testing

Run the tests

To run tests, just run:

pytest

Test reports

See test report

See coverage

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

Compare with latest

Added

  • Added publication of qdrant snapshot on github pages (e54301c by Yann de The).
  • Added link to doc pages (080304e by Yann de The).
  • Added github pages publication (3fb26fb by Yann de The).
  • Added MIN_EXPECTED_CHAR env var (c5011fc by Yann de The).
  • Added test documents (9dc3019 by Yann de The).
  • Added logging msg to tell if OCR cache is reused (168a97d by Yann de The).
  • Added emails watcher (9de852c by Yann de The).
  • Added emails observer (511da4a by Yann de The).
  • Added API keys and remove of deprecated method QdrantClient.search (12d727f by Yann de The).
  • Added page progress (fd78e72 by Yann de The).
  • Added new_ocr for later integration (c56d9d7 by Yann de The).
  • Added a cache for OCR (c4ce7d9 by Yann de The).
  • Added OCR cache (8dc4205 by Yann de The).
  • Added unit test to check embedding relevance (d9e072f by Yann de The).
  • Added progress bars for file analysis. pdf tested OK (be49834 by Yann de The).
  • Added sqlite db to keep track of the indexed files (8f5c67f by Yann de The).

Fixed

  • Fixed Exception raised when no text read (88c078f by Yann de The).
  • Fixed empty chunks (a72e49a by Yann de The).
  • Fixed pid computation (e3840c1 by Yann de The).
  • Fixed OCR cache determination (91dbc4d by Yann de The).
  • Fixed qdrant hostname (def96ad by Yann de The).
  • Fixed NLTK downloads (10e1dae by Yann de The).
  • Fixed bug where a whole pdf file is skipped if one page has no text (301e5eb by Yann de The).
  • Fixed image name (eadcf30 by Yann de The).
  • Fixed pdf error (cff80bb by Yann de The).
  • Fixed action (694bde7 by Yann de The).
  • Fixed docker compose stack (814adf6 by Yann de The).
  • Fixing pipeline (00107b2 by Yann de The).

Removed

  • Removed openvino backend (173f987 by Yann de The).
  • Removed frontend (ae2d107 by Yann de The).
  • Removed cuda whl files (da171ab by Yann de The).
  • Removed openapi key from ingestion image (a56abcb by Yann de The).

Sub-modules

ragindexer.DocumentIndexer
ragindexer.QdrantIndexer
ragindexer.config
ragindexer.documents
ragindexer.index_database
ragindexer.models