LangChain~ Retrieval Augmentation Generation~~

Build RAG from Scratch~

There are several steps to build a RAG system from scratch.

Loading documents.
Chunk the documents into splits and embed the splits.
Store the embedded splits into a vector store, and make it a retriever for finding the top ‘k’ relevant splits by calculating the similarity between the question and the document splits.
Combine the retrieved document splits and the question into the prompt.
Load the LLM.
Create Chain with prompt and llm, then use chain.invoke to generate the answer.

Section 1: Query Translation

Modify the questions from the users to make them more suitable to retrieval from the indexes(documents).

General approaches:

Step-back question (Step-back prompting)
Question Re-written (RAG-Fusion, Multi-Query)
Sub-Question (‘Least to Most’ from Google)

Multi-Query

Use an LLM to transform a question into multiple perspectives. Multi-Query Intuition.png Parallelized Retrieval with Multi-Query.png

Section 2: Routing

Route the questions to the right data source (relation DB, graph DB, vector store).

Section 3: Query Construction

Taking natural language and converting it into the DSL (Domain Specific Language) necessary for whatever data source you want to work with.

Construction Examples:

text to SQL (Relational DBs)
text to Cypher (GraphDBs)
self-query retriever (VectorDBs)

Section 4: Indexing (VectorStores Implementation)

“Indexing makes the documents easier to be retrieved.”

Indexing Process:

The documents are split into small chunks, embedded and stored in an ‘Index’.
Given a question which is embedded.
The ‘Index’ performs a similarity search, and returns the splits relevant to the question.

OpenAI Tokenizer Library: tiktoken Based on BPE(Byte-Pair Encoding)

Numerical Representation for Search

Text Representation

1
Question  --->   Retriever  ---> Ducuments
2
                     ⬆
3
                     |
4
      Load Documents |
5
                     |
6
                 Documents

Numerical Representation

1
Question  --->   Cosine Similarity, etc  ---> [x,y,z...]
2
                             ⬆
3
                             |
4
              Load Documents |
5
                             |
6
                         [x1,y1,z1...]
7
                        [x2,y2,z2...]
8
                       [x3,y3,z3...]

Statistical and Machine Learned Representations

1
                                        Bag of words        Representation      Search
2
               Statistical           [0,0,2,0,3,5,0...]         Sparse           BM25
3

4
Documents
5

6
             Machine Learned         [0.002, -0.004...]         Dense          KNN, HNSW
7
                                         Embedding          Representation      Search

Loading, Splitting and Embedding

1
            embedding
2
Question  ------------> [x,y,z,...] ---->  Index  ---> Relevant Splits
3
                                             ⬆
4
                                             |
5
                                         [x1,y1,z1...]
6
                                        [x2,y2,z2...]
7
                                       [x3,y3,z3...]
8
                                             ⬆
9
                                             |
10
                                             |  Embedding
11
                                             |
12
                                           Splits
13
                                             ⬆            ---> Charactors
14
                                             |             |--> Sections
15
                                             |  spliting ->|
16
                                             |             |--> Semantic Meaning
17
                                             |             |--> Delimiters
18
                                         Documents

RAG_from_scratch