Add Branch B CoT training run by SemyonEpanov · Pull Request #16 · LabARSS/recursive_caft

SemyonEpanov · 2026-03-13T13:03:05Z

No description provided.

# Conflicts: # src/core/training/base_trainer.py # src/experiments/sft_by_complexity_splits/mmlu/llama_3b.py

…istill-branch-b # Conflicts: # src/core/datasets/distillation/distillation_branch_b_cot_dataset.py # src/core/datasets/mmlu/mmlu_cot_response_dataset.py # src/core/training/base_trainer.py # src/experiments/distill/train_branches/train_cleaned_b_full20_prompt1.py # src/experiments/distill/train_branches/train_cleaned_b_new.py

fxlrnrpt · 2026-03-20T12:31:20Z

src/core/training/lora_trainer.py

                use_rslora=self.config.lora_training_args.use_rslora,
            )
            self._model = get_peft_model(model, peft_config)
+            if self.config.training_args.gradient_checkpointing:


Why do we need it? Isn't it handled automatically by transformers?

Agree, will remove.

removed use_cache=False (handled by Trainer), but had to keep enable_input_require_grads(), because training crashes with RE.

fxlrnrpt · 2026-03-20T12:32:03Z

src/experiments/distill/train_branches/train_cleaned_b_full20_prompt1.py

+        eval_split_dir="data/out/splits/single_token_entropy/mmlu/qwen_3b",
+        eval_groups=6,
+        per_device_train_batch_size=1,
+        effective_train_batch_size=120,


Why do you want to change effective batch size?

fxlrnrpt · 2026-03-20T12:32:14Z

src/experiments/distill/train_branches/train_cleaned_b_full20_prompt1.py

+        per_device_train_batch_size=1,
+        effective_train_batch_size=120,
+        num_train_epochs=20,
+        learning_rate=1e-4,


Why change lr?

lr doesn't change (it is just set explicitly)

class LoRATrainingArgs(BaseTrainingArgs): # Sane overrides for LoRA SFT fine-tuning effective_train_batch_size: int = 64 learning_rate: float = 1e-4 warmup_ratio: float = 0.06 weight_decay: float = 0.0

In your code it is effective_train_batch_size=120, typo?

set effective_train_batch_size=64, per_device_train_batch_size=2.

fxlrnrpt · 2026-03-20T12:35:38Z

src/experiments/distill/train_branches/train_cleaned_b_new.py

+        raise FileNotFoundError(f"Train parquet not found: {train_data_path}")
+
+    eval_question_ids = _collect_eval_question_ids(eval_split_dir, eval_groups)
+    train_row_filter = _build_train_row_filter(eval_question_ids)


Instead of applying the filter dynamically, shall we preprocess the data and save it to disk? Just like with other MMLU data splits

Then we could use just CausalDatasetAdapater

fxlrnrpt · 2026-03-20T12:37:04Z

src/core/training/branch_b_training.py

@@ -0,0 +1,193 @@
+"""


What is the difference between src/experiments/distill/train_branches/train_cleaned_b_full20_prompt1.py and this script? Why do we need both?

train_cleaned_b_full20_prompt1.py is the entry point

Oh, I see. Could you move the main script to core/... then? And keep the entry point in experiements

fxlrnrpt · 2026-03-20T12:37:43Z

src/core/datasets/abstract_dataset_adapter.py

            lambda row: self.process_row(row).model_dump(),
            num_proc=4,
            remove_columns=ds.column_names,
+            load_from_cache_file=False,


Why is it needed?

Reverted. Used in debugging process.

fxlrnrpt · 2026-03-20T12:39:13Z

src/core/datasets/distillation/distillation_branch_b_cot_dataset.py

+        return (
+            f"Question: {question.strip()}\n\n"
+            f"Options:\n{opts}\n\n"
+            f"Answer with the option letter first, then provide reasoning inside {THINKING_START}...{THINKING_END} tags."


We do not need to prompt the model to asnwer with reasoning tags, right? Reasoning models should use reasoning by default. Meaning that teh prompt should come without the request to use them

Fair point. The model will learn the answer-first + reasoning format from the training data itself.
Will simplify the prompt to a plain question format without thinking tag instructions.

fxlrnrpt · 2026-03-20T12:40:22Z

src/core/datasets/distillation/distillation_branch_b_cot_dataset.py

+from core.prompts.thinking_markers import THINKING_START, THINKING_END
+
+
+class DistillationBranchBCoTDataset(CausalDataset[CausalDatasetConfig]):


Why do we need it? Could we use MMLUReasoningResponseDataset instead? Just pre-process the data to match the current format of MMLU datasets

Agreed, we'll use MMLURasoningResponseDataset directly

- Remove redundant gradient checkpointing code from LoRATrainer - Revert load_from_cache_file=False from abstract base class - Delete DistillationBranchBCoTDataset, use MMLUReasoningResponseDataset - Remove single_token_sys_prompt_with_answer_first_thinking - Add data preprocessing script (prepare_cleaned_b_data.py) - Rewrite training orchestration (branch_b_training.py) - Use default effective_batch_size=64, remove explicit lr=1e-4 - Delete FilteredCausalDatasetAdapter and train_cleaned_b_new.py

…pointing

Semyon Epanov added 6 commits March 10, 2026 19:08

Add prompt-variant cleaner test scripts

c7e919e

Add branch B distillation run

ef12c11

Add Branch B CoT training run

07b494a

Merge remote-tracking branch 'origin/main' into run-distill-branch-b

71b1faa

# Conflicts: # src/core/training/base_trainer.py # src/experiments/sft_by_complexity_splits/mmlu/llama_3b.py

Refactor training to use post-training MultiCheckpointEvaluator

4bfed9d

fxlrnrpt reviewed Mar 20, 2026

View reviewed changes

Semyon Epanov added 4 commits March 22, 2026 19:29

Fix: add thinking tokens to tokenizer, use base_cluster column

ec817d1

Restore enable_input_require_grads — needed for PEFT + gradient_check…

42a08b1

…pointing

Move branch_b_training.py to core/training/ per review request

237eed5

		from core.prompts.thinking_markers import THINKING_START, THINKING_END


		class DistillationBranchBCoTDataset(CausalDataset[CausalDatasetConfig]):

Conversation

SemyonEpanov commented Mar 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants