LangChain-Step-By-Step

This repository is a step-by-step onboarding assignment for learning how to build LLM applications with LangChain and OpenAI. Interns should implement step_1.py through step_6.py in order and finish with a small AI assistant that can handle both external RAG dataset lookup and general conversation.

This assignment assumes Python 3.11+ or higher, LangChain v1, and an OpenAI API key. For Step 2, the assignment uses the more modern ChatPromptTemplate + MessagesPlaceholder pattern instead of the older ConversationChain style.

Learning Goals

Safely manage .env files and API keys
Run basic LLM calls with ChatOpenAI
Feed conversation history back into prompts
Build a simple RAG workflow using a Hugging Face dataset
Structure LLM outputs with Pydantic
Let an agent decide when to use a tool

Repository Structure

.
├── data/
│   └── questions.txt
├── docs/
│   └── reviewer-rubric.md
├── outputs/
│   └── .gitkeep
├── .env.example
├── requirements.txt
├── step_1.py
├── step_2.py
├── step_3.py
├── step_4.py
├── step_5.py
└── step_6.py

Getting Started

Fork or clone this repository.
Create a virtual environment (you can set up your development environment in any way you prefer). Make sure to use Python 3.11 or higher.
```
python3.11 -m venv .venv
source .venv/bin/activate
```
To deactivate the virtual environment later, run:
```
deactivate
```
Install dependencies.
```
pip install -r requirements.txt
```
Copy .env.example to .env and fill in OPENAI_API_KEY.
```
cp .env.example .env
```
Implement the assignment in order, starting from step_1.py.

Assignment Rules

Each step should extend the result of the previous step.
If you get stuck, read the official docs first.
Open a Pull Request after completing Step 6.

Step Overview

Step	Topic	Goal	What to Check
1	Basic Setup	Read `questions.txt`, generate answers, and save them to `outputs/step_1_output.txt`	API integration, file I/O
2	Conversation Memory	Build a chatbot that remembers earlier turns	Name/context retention
3	Structured Output	Return `intent` and `answer` as JSON/Pydantic	Output parsing, validation
4	Simple RAG	Build dataset-based QA with `neural-bridge/rag-dataset-12000`	Dataset loading, Vector DB insertion, context retrieval, answer generation
5	Final Agent	Build an assistant that connects with the Vector DB to answer dataset questions	Tools, agent flow, reuse
6	Language Filter	Detect language and refuse to answer if not Japanese or English	Guardrails, prompt engineering or routing

Step 1 — First Conversation with an LLM

Mission

Read data/questions.txt line by line.
Send each question to ChatOpenAI.
Save the questions and answers in a readable format to outputs/step_1_output.txt.

Required

Read the API key from .env
Ignore empty lines
Overwrite the output file on each run

Step 2 — A Chatbot with Memory

Mission

Build a chatbot that can continue a multi-turn conversation.
The following flow should work:
- My name is Jimin.
- What is my name?

Required

Use MessagesPlaceholder
Choose either full-history memory or a recent-window memory approach
Manually verify at least 3 turns of conversation

Hint

In modern LangChain flows, ChatPromptTemplate + MessagesPlaceholder is more instructive than the legacy ConversationChain pattern.
Managing history as a list is perfectly fine for this assignment.

Step 3 — Structured Output

Mission

Analyze user input and return it in the following format.

{
  "intent": "question | greeting | request",
  "answer": "..."
}

Required

Use PydanticOutputParser or an equivalent explicit schema validation flow
Handle parse failures with retries or explicit error handling

Step 4 — External Data with Simple RAG

Mission

Use the Hugging Face dataset neural-bridge/rag-dataset-12000 to build a simple RAG pipeline.
Use the dataset's context, question, and answer fields to implement a retrieval + answer generation flow.
At minimum, run retrieval-based answering on a few test questions and compare your output with the provided answers.

Required

Load neural-bridge/rag-dataset-12000 with the datasets library
Use the train split context field as the retrieval corpus
Pick a few test split questions and run retrieval + generation on them
Only place relevant context or chunks into the prompt

Vector DB Implementation (Required)

A Vector DB (e.g., Chroma) is required for retrieval.
Extract the context from exactly train[:200] and convert them into Document objects.
Create embeddings and store them in a Vector DB collection.
If the retrieved context is not enough to answer, say that the answer could not be verified from the context.

Suggested Implementation Order

Load the dataset with load_dataset()
Slice the dataset to train[:200] and test[:20]
Convert train[:200] contexts into Document objects and store them in a Vector DB via embeddings
Retrieve relevant context using similarity search for each test question
Insert the retrieved context into the prompt and generate an answer
Compare your result with the dataset answer or log the differences

Short Chroma Guide

Install these packages in your local Python environment:
```
pip install langchain-chroma chromadb
```
Use persist_directory if you want a local persistent DB.
If you want to try a local server flow, install the Chroma CLI and run chroma run.
For installation details and the latest setup instructions, refer to the official Chroma website and docs:
- https://www.trychroma.com/
- https://docs.trychroma.com/

Step 5 — Autonomous Assistant

Mission

Integrate with the Vector DB from Step 4.
If a user question needs external knowledge retrieval, use the Vector DB lookup tool.
If it is general conversation, let the LLM answer directly.
In other words, let the LLM decide whether it should use the retrieval tool.

Required

Define at least one tool
Make the agent's reasoning or execution log inspectable
Demo both of the following:
- one question selected from the dataset
- one general conversation prompt

Step 6 — Language Routing / Filtering

Mission

Add a guardrail that detects the language of the user's input before answering.
If the question is not in Japanese (日本語) or English (English), the assistant must decline to answer (e.g., "I can only answer in Japanese or English.").
This rule applies to both general conversation and dataset questions.

Required

Use LangChain routing mechanisms or direct prompt engineering to filter based on language.
Demo an allowed language query and a blocked language query (e.g., Korean, Spanish).

Submission Guide

Open a Pull Request to track and submit your progress.
In your PR description, briefly include what you implemented in each step.
AI Coding Tools: You are completely free to use AI coding tools (such as Antigravity, Claude Code, etc.) for this assignment. If you need a license or support for any specific AI coding tool, please let us know and we will provide it for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangChain-Step-By-Step

Learning Goals

Repository Structure

Getting Started

Assignment Rules

Step Overview

Step 1 — First Conversation with an LLM

Step 2 — A Chatbot with Memory

Step 3 — Structured Output

Step 4 — External Data with Simple RAG

Step 5 — Autonomous Assistant

Step 6 — Language Routing / Filtering

Submission Guide

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docs		docs
outputs		outputs
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
step_1.py		step_1.py
step_2.py		step_2.py
step_3.py		step_3.py
step_4.py		step_4.py
step_5.py		step_5.py
step_6.py		step_6.py

Folders and files

Latest commit

History

Repository files navigation

LangChain-Step-By-Step

Learning Goals

Repository Structure

Getting Started

Assignment Rules

Step Overview

Step 1 — First Conversation with an LLM

Step 2 — A Chatbot with Memory

Step 3 — Structured Output

Step 4 — External Data with Simple RAG

Step 5 — Autonomous Assistant

Step 6 — Language Routing / Filtering

Submission Guide

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages