The P.I.D.O.R.A.
Collective Dataset Hub
Predictive Integrated Distributed Object Repository Archive
A unified, machine-readable home for high-quality datasets — engineered for researchers, builders, analysts, and autonomous agents working at scale.
A collective dataset hub built for the next era of intelligence
P.I.D.O.R.A. is a shared, neutral repository where researchers, builders, analysts, and autonomous agents discover, curate, and trust datasets together. We treat data as a first-class, governed asset — not an afterthought.
Every object is indexed with responsible practices, complete lineage, and rich machine-readable metadata, so both humans and models can understand exactly what they are using and where it came from.
"object_id": "pid://archive/0x9f3c-aa12", "title": "Global Climate Sensor Mesh", "format": "parquet", "vector_index": true, "agent_readable": true, "lineage": { "source": "verified-contributor", "transforms": 7, "reviewed_by": "human-in-loop" }, "cluster": "eu-curation-03", "availability": "99.97%"
Designed for agents and humans alike
A modern data backbone where discovery, provenance, and routing are native capabilities — not bolted-on extras.
Agent-readable datasets
Structured schemas and consistent interfaces let autonomous agents parse, query, and consume datasets without bespoke glue code.
Vector-native discovery
Semantic embeddings power similarity search across the archive, surfacing relevant objects far beyond keyword matching.
Provenance-aware records
Every record carries verifiable origin and transformation history, so trust travels with the data itself.
Distributed object indexing
A resilient, sharded index spans clusters and regions, keeping objects discoverable and durable at petabyte scale.
Predictive dataset routing
Demand-aware models pre-position hot datasets close to where workloads run, cutting latency for training and inference.
Human-in-the-loop review
Expert reviewers validate sensitive and high-impact datasets, balancing automation with accountable oversight.
Where we are heading
Clear commitments that guide how we build the collective archive.
Make high-quality datasets easier to discover.
Preserve metadata lineage across every transformation.
Support both AI agents and human researchers equally.
Improve dataset interoperability across tools and teams.
Enable secure, collaborative curation at scale.
Choose how you build
Every plan requires registration with a valid invite code. No payment is processed on this preview.
- Browse the public dataset index
- 5 GB curated download quota
- Basic vector discovery
- Read-only lineage view
- Community support
- Everything in Free
- 2 TB high-throughput access
- Full vector-native search
- Provenance API & lineage export
- Collaborative curation workspaces
- Priority routing & support
- Everything in Pro
- Unlimited archive access
- Dedicated dataset clusters
- Predictive routing controls
- Human-in-the-loop review SLAs
- Governance & audit tooling
Join the collective
Register with your invite code to request access, or sign in to your existing workspace.