Turn any PDF into a deployable RAG corpus package — no terminal, no Python, no manual
file management. On-device machine learning generates embeddings entirely on your Mac.
Built for developers and content curators building knowledge-first apps.
A complete pipeline from raw PDF to production-ready corpus, built on Apple's native frameworks.
📄
PDF Import
Drag in any PDF. CorpusKit Studio extracts text using PDFKit — Apple's native framework, no third-party tools or Python required.
🧩
Smart Text Chunking
Configurable chunk size and overlap in pure Swift. Preview your chunks before embedding to dial in the right settings for your content.
⚡️
On-Device Embeddings
MiniLM-L6-v2 runs via Core ML on your Mac's Neural Engine. Fast, private, and identical to the model used in CorpusKit iOS apps.
🔍
Live Retrieval Testing
Query your corpus in real time before you ship. See ranked results with cosine similarity scores to validate your chunking strategy.
✏️
Highlight & Rate
Read your source document inside the app. Highlight passages and rate their importance — curator signals travel with the exported corpus.
📦
Signed .corpus Export
Export a signed bundle consumable by any CorpusKit iOS app. Includes chunks, embeddings, metadata, and your curation data.
Built for people who work with knowledge — and the developers who build for them.
Developers
Building RAG / AI Apps
Stop wrangling Python pipelines to create embeddings. CorpusKit Studio is a native Mac app that produces corpus bundles your iOS app can consume directly — no backend required.
Content Curators
Researchers & Knowledge Workers
Highlight the passages that matter, rate their importance, and export a corpus that reflects your expertise — not just raw text extraction.
Organizations
Private Document Collections
Process sensitive documents entirely on your Mac. No data leaves your device during embedding. Distribute corpora to your team without a cloud intermediary.
No Python, no heavy dependencies. Native Mac performance throughout.
Privacy Policy
Effective Date: December 2024 · Last Updated: December 2024
Summary: CorpusKit Studio is a privacy-first application. All processing happens on your device, no data is transmitted to any server, and you have complete control over your documents and data.
What We Collect
CorpusKit Studio does not collect, transmit, or store any of the following:
- Personal identification information
- User account data or contact information
- Usage analytics or crash reports
- Device identifiers
Local Data Storage
All data processed by CorpusKit Studio is stored locally on your Mac at:
~/Documents/CorpusKitStudio/
This includes PDF documents you import, extracted text chunks, Core ML embeddings, highlights and annotations, and exported corpus bundles.
Machine Learning
The app uses an on-device machine learning model (MiniLM via Core ML) to generate text embeddings. This processing happens entirely on your Mac — no data is sent to external servers.
Third-Party Services
CorpusKit Studio does not integrate with any third-party services, analytics platforms, or advertising networks.
File Access Permissions
CorpusKit Studio requests the following macOS permissions:
- File Access — to read PDF files you select for import
- Downloads Folder — to save exported corpus bundles
- User-Selected Folders — to save exports to folders you choose
These permissions are required for core functionality and are managed by macOS's security system.
Your Rights
Since all data is stored locally, you have complete control. Delete the app and the CorpusKitStudio folder to remove all data. Use the app's export feature to create portable corpus bundles at any time.
Contact
Questions about this privacy policy: support@robroy.online
Open Source Components
- MiniLM (via Hugging Face) — Apache 2.0 License
- Apple Frameworks (SwiftUI, PDFKit, Core ML, Accelerate) — Apple EULA