GitHub - Luka-998/Open-Reading-Frame-Finder: Interactive CLI tool for analyzing DNA sequences, detecting open reading frames (ORFs), and exporting annotated results in FASTA and summary formats.

ORF_Finder — Prokaryotic DNA

This repository serves as a clean and extensible bioinformatics workflow, suitable for further data science and machine learning applications in genomic and protein function analysis.

The idea is the recreation of already existing open reading frame finders (such as [https://www.ncbi.nlm.nih.gov/orffinder/]) from scratch as my personal project.

Knowledge learned along the way is invaluable for my career development in the field of bioinformatics and data analysis

Minimal usage of AI for code writing. Complete code is written by myself
AI is used only for synthetic sequence and debugging.
Logic of the ORF_finder is related to my knowledge and deep understaning of molecular process in molecular biology.

Project overview

ORF_Finder is a compact, well-documented Python tool to find all possible open reading frames (ORFs) in a multiple prokaryotic FASTA sequence.

`Project Goal`: Detect every start and stop combination (including ORFs that start but have no in-frame stop), capture small ORFs, translate predicted ORFs to protein sequences, and later compare translated products to known proteins (e.g., BLAST+).

Current Phase:

Improving accuracy of codon identification and storage
Testing workflow with long synthetic prokaryotic DNA sequences containing known open reading frames and stop codons

Next Phase: 3. Batch-translation of identified open reading frames 4. Comparing translated proteins to curated sequences from the Swiss-Prot database 5. Identifying potential novel genes and proteins based on sequence alignment and similarity metrics

Data Analysis & ML part:

Compare ORF_finder proteins to currated proteins
Feature engineering and data cleaning
Building machine learning models to predict protein functions based on extracted features

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
results		results
src		src
test_sequences		test_sequences
tests		tests
venv		venv
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ORF_Finder — Prokaryotic DNA

This repository serves as a clean and extensible bioinformatics workflow, suitable for further data science and machine learning applications in genomic and protein function analysis.

The idea is the recreation of already existing open reading frame finders (such as [https://www.ncbi.nlm.nih.gov/orffinder/]) from scratch as my personal project.

Knowledge learned along the way is invaluable for my career development in the field of bioinformatics and data analysis

Project overview

`Project Goal`: Detect every start and stop combination (including ORFs that start but have no in-frame stop), capture small ORFs, translate predicted ORFs to protein sequences, and later compare translated products to known proteins (e.g., BLAST+).

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ORF_Finder — Prokaryotic DNA

This repository serves as a clean and extensible bioinformatics workflow, suitable for further data science and machine learning applications in genomic and protein function analysis.

The idea is the recreation of already existing open reading frame finders (such as [https://www.ncbi.nlm.nih.gov/orffinder/]) from scratch as my personal project.

Knowledge learned along the way is invaluable for my career development in the field of bioinformatics and data analysis

Project overview

Project Goal: Detect every start and stop combination (including ORFs that start but have no in-frame stop), capture small ORFs, translate predicted ORFs to protein sequences, and later compare translated products to known proteins (e.g., BLAST+).

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Project Goal`: Detect every start and stop combination (including ORFs that start but have no in-frame stop), capture small ORFs, translate predicted ORFs to protein sequences, and later compare translated products to known proteins (e.g., BLAST+).

Packages