Skip to content

gnames/sfai

Repository files navigation

sfai

Go Reference Go Report Card

LLM-backed biodiversity informatics utilities via CLI and REST API.

sfai is a tool for parsing and extracting data from biodiversity records (like natural history specimen labels) using local AI models. It provides both a command-line interface (CLI) for batch processing and a REST API server for seamless integration into other systems.

Prerequisites & Setting up the LLM (Ollama)

sfai relies on a running Ollama instance with a specific model fine-tuned for parsing biodiversity data.

  1. Install Ollama: Download and install Ollama for your operating system from ollama.com.

  2. Pull the Model: Start Ollama and pull the required model from Hugging Face: SpeciesFileGroup/ento-model-parse. We recommend selecting a quantization level based on your machine's capabilities:

    • For machines with lower RAM/processing power: We recommend the Q4 quantization (q4_K_M).
      ollama run hf.co/SpeciesFileGroup/ento-model-parse:q4_K_M
    • For more powerful machines: The Q5 or Q8 quantizations provide better accuracy if you have the resources.
      ollama run hf.co/SpeciesFileGroup/ento-model-parse:q5_K_M

    Ensure the model is downloaded and running properly within Ollama before using sfai.

Installation

You can install sfai using go install:

go install github.com/gnames/sfai@latest

Alternatively, you can clone the repository and build it from source:

git clone https://github.com/gnames/sfai.git
cd sfai
just install # Or: go build -o bin/sfai && mv bin/sfai ~/go/bin/

Configuration

sfai is highly configurable. The configuration is evaluated in the following order of precedence:

  1. CLI Flags (e.g., --jobs 4)
  2. Environment Variables (e.g., SFAI_JOBS=4)
  3. Configuration File (~/.config/sfai/config.yaml)
  4. Defaults

Configuration File

On its first run, sfai automatically generates a configuration file at ~/.config/sfai/config.yaml. You can edit this file to persistently configure settings such as your Ollama API URL and the default number of parallel jobs.

Example ~/.config/sfai/config.yaml:

# Number of parallel model workers
jobs: 4

(Note: You will also be able to configure the model name and Ollama URL if they differ from the defaults).

Usage: CLI Mode

The command-line interface is designed for flexible data processing. The primary command is label, which parses natural history specimen labels into structured JSONL format.

Examples

1. Parse a single label string:

sfai label "Collected by J. Smith, Montana, 1923"

2. Parse labels from a file: (Requires one label per line. Encode literal newlines within a single label as \n)

sfai label path/to/labels.txt

3. Process data via standard input (stdin):

cat labels.txt | sfai label

The output for all CLI operations is JSONL (JSON Lines) printed to standard output (stdout). This makes it easy to pipe the results into tools like jq for further processing.

cat labels.txt | sfai label | jq .

Usage: REST API Mode

sfai also includes a built-in REST API server powered by Huma, providing high-performance HTTP endpoints.

Start the server using:

sfai rest

Main Endpoints

  • GET /label?text=...: Parses the provided text as a label.
    • Note: The input text is limited to 4096 bytes. Requests exceeding this limit will return a 400 or 422 error. Ensure your reverse proxy redacts query strings if your labels contain sensitive locality data.

Interactive Documentation

The REST server automatically provides OpenAPI specifications and interactive documentation:

  • GET /: Home page
  • GET /docs: Interactive API documentation (Stoplight Elements)
  • GET /openapi.yaml (or .json): OpenAPI 3.1 specification

Development & Contributing

If you wish to contribute to sfai or build upon it, here are the primary commands for local development:

just test    # Run unit tests
just build   # Build the binary to bin/sfai
just verify  # Run code formatting, tidying, tests, and build

For detailed architectural guidelines and conventions, please refer to the CLAUDE.md and TEMPLATE_GUIDE.md files in the repository root.

License

This project is licensed under the MIT License.

About

sfai includes utilities using fine-tuned models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors