How FormatSense works & why it exists
FormatSense is a service for automated analysis and decoding of binary data files — unknown, proprietary, and legacy storage and serialization formats — using coordinated LLM agents.
The project grew out of years of experience working with proprietary, legacy, and undocumented data formats. Over years of research — from legacy systems to archives found in specialized communities — we accumulated a set of binary analysis tools and an extensive knowledge base of format signatures. Some of these tools were built for specific tasks that no public solution could handle: non-standard packings, hybrid containers, formats with platform-dependent alignment.
In parallel, we built a collection of test samples — real files from dozens of domains: telemetry files from industrial controllers, configuration dumps from medical equipment, geographic information systems, serialized database structures, financial mainframe exports, game engine archives. This collection became the foundation for algorithm validation and agent benchmarking.
In early 2025, we began experimenting with applying LLM agents to the analysis of unknown binary files. Early results indicated the direction but revealed systemic limitations: agents were reluctant to use complex tools, lost context on long files, and failed to build coherent hypotheses. It took a year of iterative work — redesigning tools into formats comprehensible to agents, designing multi-level orchestration, creating an evaluation pipeline on real samples — to achieve consistently reproducible analysis quality.
The core of the service is a pipeline of coordinated LLM agents, each with access to a set of specialized tools:
The tools are designed specifically for use by LLM agents — their output is optimized for machine-readable interpretation rather than human viewing.
The service operates on a Bring Your Own Key model — you connect your own LLM provider API key. We do not act as an intermediary and do not resell tokens. Keys are encrypted at rest and deleted along with job data after analysis is complete.
Analysis quality depends on the amount of structural information available in the file. The most complete results are achieved on formats with prominent markers: magic bytes, fixed-length headers, repeating record structures, text labels.
More challenging to analyze:
In such cases, the service returns a partial result: discovered patterns, statistical data profile, structural hypotheses with confidence levels — rather than an empty response or an error.
The service is designed for engineers, data analysts, and teams who encounter unknown binary formats: during legacy system migrations, data integration from external partners, digital archiving, security incident investigations, or reverse engineering.