Skip to content

wang2-lat/native-english-tool

Repository files navigation

Native English Tool

Stop writing like a robot. Start writing like a native speaker.

A tool that detects non-native English patterns — AI-overused words, overly formal phrasing, redundant expressions — and suggests natural American English replacements.

This is not an AI detection bypass tool. It's a genuine writing quality tool for anyone who wants their English to sound more natural.

Python 3.10+ MIT License Tests Dictionary

The Problem

Non-native English speakers (and AI-generated text) share the same tells:

What they write What a native would write
We must delve into the issue We need to dig into the issue
Utilize our resources Use our resources
In order to improve To improve
It is worth noting that prices rose Prices rose
A multifaceted and robust approach A complex and solid approach
Due to the fact that it rained Because it rained
This facilitates seamless integration This helps with smooth integration

How It Works

Three-pass scanning engine:

  1. Phrase scan — Catches multi-word patterns first ("in order to""to", "due to the fact that""because"). Regex-powered for flexible matching.
  2. Word scan — Flags individual words from a 154-entry dictionary across 6 categories. Skips words already covered by phrase matches.
  3. Structural analysis — Detects missing contractions, em-dash overuse, uniform paragraph lengths, repeated sentence starters, and excessive hedging.

Dictionary: 154 words + 89 phrases across 10 categories:

Category Examples Count
AI overused verbs delve, leverage, foster, utilize, facilitate, harness, empower 36
Grandiose descriptors meticulous, multifaceted, robust, seamless, unprecedented 31
Latinate verbs investigate, eliminate, substantiate, exacerbate, mitigate 26
Formal verbs commence, ascertain, procure, disseminate, juxtapose 26
Fancy nouns tapestry, landscape, paradigm, nexus, zeitgeist 19
Stiff transitions moreover, furthermore, consequently, henceforth, whereby 16
Filler phrases "it is worth noting that", "in today's fast-paced world" 21
Redundant phrases "in order to", "due to the fact that", "has the ability to" 41
AI clichés "plays a significant role in", "serves as a testament to" 22
Structural rules contractions, em dashes, paragraph uniformity, hedging 22

Quick Start

git clone https://github.com/wang2-lat/native-english-tool.git
cd native-english-tool
pip install -r requirements.txt
uvicorn app:app --reload --port 8000

Open http://localhost:8000 — paste text, hit Analyze.

Chrome Extension

Works standalone — no server needed. The full detection engine runs locally in your browser.

1. Open chrome://extensions/
2. Enable "Developer mode"
3. Click "Load unpacked"
4. Select the chrome-extension/ folder

Features:

  • Popup: paste text and analyze instantly
  • Right-click: analyze any selected text on a webpage
  • Floating panel: see results right on the page

API Endpoints

Endpoint Method Description
/ GET Web UI
/api/analyze POST Analyze text → issues + score
/api/fix-all POST Auto-fix all issues
/api/llm-rewrite POST AI rewrite (sentence-level, DeepSeek)
curl -X POST http://localhost:8000/api/analyze \
  -H "Content-Type: application/json" \
  -d '{"text": "We must leverage our robust capabilities.", "formality": "casual"}'

Nativeness Score

Each issue carries a severity weight:

Severity Weight Examples
High (red) 3 delve, utilize, tapestry, "it is worth noting that"
Medium (orange) 1.5 landscape, robust, nuanced, "in order to"
Low (yellow) 0.5 investigate, "when it comes to", missing contractions
Score = max(0, 100 - (penalty / word_count × 100) × 8)

A clean text scores 90-100. Heavy AI-speak scores below 30.

Security

  • Input validation: max 50K chars, formality enum whitelist
  • XSS protection: all DOM rendering via createElement/textContent, no innerHTML with user data
  • Rate limiting: LLM endpoint capped at 10 req/min per IP
  • CORS: localhost-only by default
  • Security headers: X-Content-Type-Options, X-Frame-Options, X-XSS-Protection
  • .gitignore protects .env with API keys
  • 34 tests including ReDoS, adversarial input, null bytes, and unicode edge cases

For AI Rewrite (Optional)

The rule-based engine works without any API key. For sentence-level AI rewriting:

cp .env.example .env
# Add your DeepSeek API key to .env

Project Structure

native-english-tool/
├── app.py                  # FastAPI app + security middleware
├── engine/
│   ├── detector.py         # Three-pass scanning engine
│   ├── replacer.py         # Back-to-front replacement
│   └── llm_fallback.py     # DeepSeek API fallback
├── dictionary/
│   ├── words.json          # 154 word-level rules
│   ├── phrases.json        # 89 phrase-level rules
│   └── patterns.json       # Structural rules + contractions
├── chrome-extension/       # Standalone Chrome extension
│   ├── manifest.json
│   ├── popup.html
│   ├── engine.js           # Full JS port of detection engine
│   ├── content.js          # In-page highlighting
│   └── dictionary.js       # Auto-generated from Python dicts
├── templates/
│   └── index.html          # Web UI
└── tests/
    └── test_engine.py      # 34 tests (security included)

Who This Is For

  • Non-native English speakers who want to sound more natural
  • Content writers cleaning up AI-assisted drafts
  • Students improving academic writing
  • Developers writing documentation
  • Anyone tired of reading (or writing) corporate buzzword soup

Contributing

PRs welcome. The easiest way to contribute:

  1. Add dictionary entries — open dictionary/words.json or phrases.json, add entries following the existing format
  2. Report false positives — open an issue with the text that was incorrectly flagged
  3. Add language variants — British English, academic English, etc.

License

MIT

About

Stop writing like a robot. Detect non-native English patterns (AI words, formal phrasing, redundant expressions) and get natural American English suggestions. Web app + Chrome extension.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors