A modern knowledge management system built with RAG (Retrieval-Augmented Generation) technology that helps organizations efficiently organize, search, and retrieve information.
- 🔍 Semantic search powered by Pinecone vector database
- 🤖 Advanced question answering using Google's Gemini Pro
- 📚 Document ingestion and processing
- 💬 Interactive chat interface
- 🔄 Context-aware conversations
- 🎯 Precise information retrieval
- Python 3.8 or higher
- Pinecone API key (Sign up at Pinecone)
- Google API key for Gemini Pro (Get from Google AI Studio)
-
Clone the repository:
git clone https://github.com/asvpappula/RAGapplication.git cd RAGApplication -
Set up environment:
# For Unix/MacOS chmod +x install.sh ./install.sh # For Windows # Run these commands manually: python -m venv venv .\venv\Scripts\activate pip install -r requirements.txt
-
Configure environment variables:
- Copy
.env.exampleto.env - Add your API keys and configurations:
- Copy
GOOGLE_API_KEY=your_google_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=your_pinecone_environment
```
4. Run the application:
```bash
python app.py
5.Open your browser and navigate to:
http://localhost:5000
- Create a
data/raw_textdirectory - Add your PDF or text documents to this directory
- Run the indexing script:
python backend/pinecone/extract_and_chunk.py
The system prompt defines how the AI processes and responds to questions. You can customize it in backend/llm/response_generation.py:
- Locate the
get_prompt_templatemethod - Modify the template string to:
- Change the AI's persona
- Adjust response formatting
- Add domain-specific instructions
- Customize source citation format
Customize how documents are processed in backend/pinecone/extract_and_chunk.py:
- Modify chunk size and overlap
- Adjust text extraction rules
- Customize metadata extraction
- Change document type handling
rag-knowledge-base/ ├── backend/ │ ├── llm/ # LLM integration │ └── pinecone/ # Vector database operations ├── data/ │ └── raw_text/ # Place your documents here ├── static/ # Frontend assets ├── templates/ # HTML templates ├── app.py # Main Flask application └── requirements.txt # Python dependencies
app.py: Main Flask applicationbackend/llm/response_generation.py: RAG implementationbackend/pinecone/extract_and_chunk.py: Document processingtemplates/index.html: Chat interfacestatic/js/chat.js: Frontend logic
Contributions are welcome! Please feel free to submit a Pull Request.
Apache License Version 2.0 - See LICENSE file for details