structured-extraction

Here are 26 public repositories matching this topic...

NameetP / pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

python pdf ocr mcp self-healing structured-extraction rag pdf-to-json pdf-extraction ai-agent llm document-parsing pdf-to-markdown docling opendataloader

Updated May 5, 2026
Python

jndiogo / sibila

Star

Extract structured data from local or remote LLM models

python ai openai structured-data gpt structured-extraction dataclasses local-models pydantic large-language-models llamacpp llm-inference local-ai gguf structured-generation

Updated Jun 21, 2024
Python

ByteStack-Labs / slm-autopsy

Star

Reproducible diagnostic investigation of a fine-tuned SLM that scored 99.75% on evaluation and failed silently on 10% of production inputs. Full pipeline. Every number verified.

machine-learning evaluation diagnostics slm fine-tuning structured-extraction mlops production-ml qlora silent-failure

Updated Apr 6, 2026
Python

sputnicyoji / Structured-Extractor

Star

Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.

python nlp knowledge-graph structured-extraction claude-code post-processing-pipeline

Updated Feb 9, 2026
Python

vikyw89 / llmtext

Star

A simple llm library

python agent async asynchronous openai gpt structured-extraction tool-use instructor large-language-models llm chatgpt prompt-optimization agentic-ai

Updated Apr 5, 2026
Python

chigwell / news-summizr

Sponsor

Star

news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.

pattern-matching data-analysis structured-extraction reporting-tools news-summary key-information-extraction workflow-integration headline-analysis retry-mechanisms reliable-output concise-summarization labeled-summaries

Updated Dec 21, 2025
Python

adi2355 / MCP-Server-Collection

Star

Collection of purpose-built MCP servers for AI agent workflows.

python typescript mcp web-scraping data-extraction jsonpath ai-agents structured-extraction llm deepseek firecrawl model-context-protocol mcp-server codebase-analysis agent-workflows

Updated Apr 7, 2026
HTML

vikyw89 / togetherai-playground

Star

python machine-learning async artificial-intelligence openai structured-extraction instructor openai-api llm togetherai

Updated Apr 5, 2026
Python

chigwell / summaryxtract

Sponsor

Star

A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt

Updated Dec 21, 2025
Python

doctruthhq / DocTruth

Star

Auditable LLM extraction for Java enterprise — every field cites its source page+line, with bi-temporal provenance and W3C PROV-O JSON-LD audit export.

java json-schema provenance citations audit-trail structured-extraction pydantic document-ai llm doctruth

Updated May 9, 2026
Java

BhaveshBytess / Research-Paper-Analyzer

Star

Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.

nlp machine-learning pdf-parsing scientific-papers structured-extraction academic-research evidence-extraction streamlit-app llm research-paper-analysis

Updated Nov 14, 2025
Python

jomen93 / sec-filing-parser

Star

Extract structured data from SEC EDGAR 10-K filings using LLMs (Claude/GPT-4o) + Pydantic v2 validation

python openai financial-data data-pipeline sec-edgar edgar structured-extraction pydantic llm anthropic

Updated Apr 30, 2026
Python

sooperD00 / Connectome

Star

Human-in-the-loop LLM orchestration with structured signal extraction and session persistence. Annotate confusion and curiosity—feedback shapes responses, topology accumulates over time. API-first design, no gamification. FastAPI + Claude + SQLite + D3.

sqlite knowledge-graph d3js human-in-the-loop education-technology feedback-loops structured-extraction fastapi session-persistence llm-orchestration

Updated Jan 22, 2026
HTML

hwang-yh-cto / space-ocr-mcp

Star

Multilingual structured OCR (11+ languages, CJK-tuned) — MCP server with verified per-character bboxes for AI agents

ocr mcp vision-api claude structured-extraction chinese-ocr anthropic japanese-ocr mcp-server korean-ocr multilingual-ocr

Updated Apr 30, 2026
JavaScript

notabotchef / a0-langextract

Star

Agent Zero plugin for structured document extraction — invoices, recipes, prep lists. Powered by google/langextract with source grounding.

nlp structured-extraction restaurant-tech langextract agent-zero

Updated Apr 12, 2026
Python

vstorm-co / blog-content

Star

Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.

retrieval embeddings sparse-vectors structured-extraction rag vector-search vector-database

Updated Oct 30, 2025
Python

citation-cosmograph / citation-astrolabe

Star

AI-agent-driven venue governance database. Extracts editorial boards and program committees from journal websites using local LLMs, with entity resolution against OpenAlex.

entity-resolution web-scraping knowledge-base scientometrics structured-extraction editorial-board ai-agent openalex llm-extraction venue-governance bibliometric-infrastructure

Updated Mar 29, 2026

JLHC-AI-portfolio / community-workshop-packet-structurer

Star

AI-assisted PDF/DOCX packet structuring workflow with source citations, semantic retrieval, deterministic validation, and reviewer-facing run sheets.

python validation openai docx structured-extraction rag pdf-extraction document-automation source-citations human-review

Updated Apr 28, 2026
Python

eeshansrivastava89 / local-llm-bench

Star

Evaluate local LLM accuracy on structured data extraction. Tests models' ability to extract JSON from unstructured text with ground-truth comparison, F1 scoring, and fuzzy matching. Supports MLX and Ollama backends. Generates interactive reports with charts and per-model analysis.

structured-extraction local-llm

Updated Feb 27, 2026
Python

anthonyonazure / signal-forge

Star

Robust extraction of structured signals from messy unstructured text. Hybrid LLM + tool-use schema + source span linking + eval harness.

nodejs nlp typescript information-extraction claude structured-extraction document-extraction llm prompt-engineering anthropic

Updated May 6, 2026
TypeScript

Improve this page

Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structured-extraction

Here are 26 public repositories matching this topic...

NameetP / pdfmux

jndiogo / sibila

ByteStack-Labs / slm-autopsy

sputnicyoji / Structured-Extractor

vikyw89 / llmtext

chigwell / news-summizr

adi2355 / MCP-Server-Collection

vikyw89 / togetherai-playground

chigwell / summaryxtract

doctruthhq / DocTruth

BhaveshBytess / Research-Paper-Analyzer

jomen93 / sec-filing-parser

sooperD00 / Connectome

hwang-yh-cto / space-ocr-mcp

notabotchef / a0-langextract

vstorm-co / blog-content

citation-cosmograph / citation-astrolabe

JLHC-AI-portfolio / community-workshop-packet-structurer

eeshansrivastava89 / local-llm-bench

anthonyonazure / signal-forge

Improve this page

Add this topic to your repo