RAG_from_scratch

LangChain~ Retrieval Augmentation Generation~~

Build RAG from Scratch~

There are several steps to build a RAG system from scratch.

  1. Loading documents.
  2. Chunk the documents into splits and embed the splits.
  3. Store the embedded splits into a vector store, and make it a retriever for finding the top ‘k’ relevant splits by calculating the similarity between the question and the document splits.
  4. Combine the retrieved document splits and the question into the prompt.
  5. Load the LLM.
  6. Create Chain with prompt and llm, then use chain.invoke to generate the answer.

Section 1: Query Translation

Modify the questions from the users to make them more suitable to retrieval from the indexes(documents).

General approaches:

  • Step-back question (Step-back prompting)
  • Question Re-written (RAG-Fusion, Multi-Query)
  • Sub-Question (‘Least to Most’ from Google)

Multi-Query

Use an LLM to transform a question into multiple perspectives.
Multi-Query Intuition.png
Parallelized Retrieval with Multi-Query.png

Section 2: Routing

Route the questions to the right data source (relation DB, graph DB, vector store).

Section 3: Query Construction

Taking natural language and converting it into the DSL (Domain Specific Language) necessary for whatever data source you want to work with.

Construction Examples:

  • text to SQL (Relational DBs)
  • text to Cypher (GraphDBs)
  • self-query retriever (VectorDBs)

Section 4: Indexing (VectorStores Implementation)

“Indexing makes the documents easier to be retrieved.”

Indexing Process:

  1. The documents are split into small chunks, embedded and stored in an ‘Index’.
  2. Given a question which is embedded.
  3. The ‘Index’ performs a similarity search, and returns the splits relevant to the question.

OpenAI Tokenizer Library: tiktoken
Based on BPE(Byte-Pair Encoding)

Text Representation

1
2
3
4
5
6
7
Question  --->   Retriever  ---> Ducuments

|
Load Documents |
|
Documents

Numerical Representation

1
2
3
4
5
6
7
8
Question  --->   Cosine Similarity, etc  ---> [x,y,z...]

|
Load Documents |
|
[x1,y1,z1...]
[x2,y2,z2...]
[x3,y3,z3...]

Statistical and Machine Learned Representations

1
2
3
4
5
6
7
                                        Bag of words        Representation      Search
Statistical [0,0,2,0,3,5,0...] Sparse BM25

Documents

Machine Learned [0.002, -0.004...] Dense KNN, HNSW
Embedding Representation Search

Loading, Splitting and Embedding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
            embedding
Question ------------> [x,y,z,...] ----> Index ---> Relevant Splits

|
[x1,y1,z1...]
[x2,y2,z2...]
[x3,y3,z3...]

|
| Embedding
|
Splits
⬆ ---> Charactors
| |--> Sections
| spliting ->|
| |--> Semantic Meaning
| |--> Delimiters
Documents

Section 5: Retrieval

Section 6: Generation

Comments