CorpusKit Studio icon

CorpusKit Studio

Build semantic search corpora from your documents

Coming Soon — Mac App Store
Turn any PDF into a deployable RAG corpus package — no terminal, no Python, no manual file management. On-device machine learning generates embeddings entirely on your Mac. Built for developers and content curators building knowledge-first apps.

Everything you need. Nothing you don't.

A complete pipeline from raw PDF to production-ready corpus, built on Apple's native frameworks.

📄
PDF Import
Drag in any PDF. CorpusKit Studio extracts text using PDFKit — Apple's native framework, no third-party tools or Python required.
🧩
Smart Text Chunking
Configurable chunk size and overlap in pure Swift. Preview your chunks before embedding to dial in the right settings for your content.
⚡️
On-Device Embeddings
MiniLM-L6-v2 runs via Core ML on your Mac's Neural Engine. Fast, private, and identical to the model used in CorpusKit iOS apps.
🔍
Live Retrieval Testing
Query your corpus in real time before you ship. See ranked results with cosine similarity scores to validate your chunking strategy.
✏️
Highlight & Rate
Read your source document inside the app. Highlight passages and rate their importance — curator signals travel with the exported corpus.
📦
Signed .corpus Export
Export a signed bundle consumable by any CorpusKit iOS app. Includes chunks, embeddings, metadata, and your curation data.

Who it's for

Built for people who work with knowledge — and the developers who build for them.

Developers
Building RAG / AI Apps
Stop wrangling Python pipelines to create embeddings. CorpusKit Studio is a native Mac app that produces corpus bundles your iOS app can consume directly — no backend required.
Content Curators
Researchers & Knowledge Workers
Highlight the passages that matter, rate their importance, and export a corpus that reflects your expertise — not just raw text extraction.
Organizations
Private Document Collections
Process sensitive documents entirely on your Mac. No data leaves your device during embedding. Distribute corpora to your team without a cloud intermediary.
🔒
All Local
Every document, chunk, and embedding stays in ~/Documents/CorpusKitStudio/. Nothing is sent to external servers.
🧠
On-Device ML
MiniLM embeddings run via Core ML on your Neural Engine. No API calls, no cloud inference, no data leaves your Mac.
🗝
No Account Required
Download and use immediately. No sign-up, no email, no subscription. Sensitive settings stored in macOS Keychain.
📦
macOS Sandbox
The app operates within Apple's security sandbox. File access is managed by macOS — only files you explicitly open are accessible.

Built on Apple's Frameworks

No Python, no heavy dependencies. Native Mac performance throughout.

PDF extraction PDFKit
Embeddings Core ML · MiniLM-L6
Vector search Accelerate vDSP
Persistence SwiftData
Export signing CryptoKit
Platform macOS 14+

Requirements

Privacy Policy

Effective Date: December 2024  ·  Last Updated: December 2024

Summary: CorpusKit Studio is a privacy-first application. All processing happens on your device, no data is transmitted to any server, and you have complete control over your documents and data.

What We Collect

CorpusKit Studio does not collect, transmit, or store any of the following:

Local Data Storage

All data processed by CorpusKit Studio is stored locally on your Mac at:

~/Documents/CorpusKitStudio/

This includes PDF documents you import, extracted text chunks, Core ML embeddings, highlights and annotations, and exported corpus bundles.

Machine Learning

The app uses an on-device machine learning model (MiniLM via Core ML) to generate text embeddings. This processing happens entirely on your Mac — no data is sent to external servers.

Third-Party Services

CorpusKit Studio does not integrate with any third-party services, analytics platforms, or advertising networks.

File Access Permissions

CorpusKit Studio requests the following macOS permissions:

These permissions are required for core functionality and are managed by macOS's security system.

Your Rights

Since all data is stored locally, you have complete control. Delete the app and the CorpusKitStudio folder to remove all data. Use the app's export feature to create portable corpus bundles at any time.

Contact

Questions about this privacy policy: support@robroy.online

Open Source Components