MinerU MinerU

MinerU was created by opendatalab 1 year(s) ago, and last updated 13 day(s) ago.

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Python 131.44MB AGPL-3.0 Github
Stars
40.8k
Fork
3.4k
Watch
178
Open Issues
113

kkFileView

13.1k Java

Universal File Online Preview Project based on Spring-Boot

1 7 year(s) ago 11 day(s) ago

OpenBB

44.1k Python NOASSERTION

Investment Research for Everyone, Everywhere.

1 4 year(s) ago 19 day(s) ago

markitdown

70.5k Python MIT

Python tool for converting files and office documents to Markdown.

1 9 month(s) ago 4 day(s) ago

OCRmyPDF

30.5k Python MPL-2.0

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

1 11 year(s) ago 13 day(s) ago

browser-use

66.3k Python MIT

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

1 9 month(s) ago 15 day(s) ago

instructor

11.1k Python MIT

structured outputs for llms

1 2 year(s) ago 15 day(s) ago

kokoro-onnx

2k Python MIT

TTS with kokoro and onnx runtime

1 7 month(s) ago 1 month(s) ago

Jobs_Applier_AI_Agent_AIHawk

28.5k Python AGPL-3.0

AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

1 1 year(s) ago 16 day(s) ago

MoneyPrinterTurbo

19.9k Python MIT

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

1 1 year(s) ago 6 month(s) ago

tensorflow

191.1k C++ Apache-2.0

An Open Source Machine Learning Framework for Everyone

1 9 year(s) ago 3 day(s) ago

PDFMathTranslate

24.6k Python AGPL-3.0

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

1 11 month(s) ago 2 month(s) ago

LLMs-from-scratch

60.2k Jupyter Notebook NOASSERTION

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

1 2 year(s) ago 13 day(s) ago

ChatTTS

37.4k Python AGPL-3.0

A generative speech model for daily dialogue.

1 1 year(s) ago 5 day(s) ago

dify

109.8k TypeScript NOASSERTION

Production-ready platform for agentic workflow development.

1 2 year(s) ago 5 day(s) ago

cheerio

29.5k TypeScript MIT

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

1 13 year(s) ago 2 month(s) ago

paperless-ngx

29.9k Python GPL-3.0

A community-supported supercharged document management system: scan, index and archive all your documents

1 3 year(s) ago 5 day(s) ago

MinerU

40.8k Python AGPL-3.0

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

1 1 year(s) ago 13 day(s) ago

yt-dlp

118.6k Python Unlicense

A feature-rich command-line audio/video downloader

1 4 year(s) ago 29 day(s) ago

fastapi

88k Python MIT

FastAPI framework, high performance, easy to learn, fast to code, ready for production

1 6 year(s) ago 5 day(s) ago