Skip to content

Latest commit

 

History

History
100 lines (71 loc) · 1.61 KB

File metadata and controls

100 lines (71 loc) · 1.61 KB

Document AST Studio

Document AST Studio is a Node.js application that extracts text from uploaded documents and generates AST output using two different parsers.

  • MDAST with remark
  • Token-based AST mapping with markdown-it
  • Simplified JSON output for remark (heading, paragraph, list, listItem)

Supported File Types

  • .pdf
  • .docx
  • .md
  • .markdown
  • .txt

Installation

npm install

Run

npm start

Then open the following address in your browser:

http://localhost:3000

Development Mode

npm run dev

API Overview

POST /api/ast

Accepts Markdown text in a JSON body and generates AST output.

Example body:

{
  "markdown": "# Title\n\nHello **world**"
}

POST /api/ast-file

Extracts text from an uploaded file and generates AST output.

  • multipart/form-data
  • Field: document

Response Format (Summary)

{
  "remarkAst": {},
  "simplifiedRemarkAst": [],
  "markdownItAst": {},
  "markdownItTokens": [],
  "fileName": "sample.docx",
  "extractedText": "..."
}

fileName and extractedText are returned for the file upload endpoint.

Folder Structure

.
|- public/
|  |- index.html
|  |- app.js
|  `- styles.css
|- docs/
|  `- api.md
|- examples/
|  `- ornek.md
|- server.js
|- package.json
`- README.md

Common Issues

  • EADDRINUSE: The port is already in use. Start with a different port:
    • PowerShell: $env:PORT=3001; npm start
  • PDF parsing issue: The PDF file may be corrupted or password-protected.

License

This project is licensed under MIT. See the LICENSE file for details.