From a single drag-and-drop to batch-processing hundreds of legacy files through the API.
Standard tools came up empty?
—I found a file and have no idea what it is or how to open it
—Our support team gets files from customers they can’t open — they need instant format answers
—I need to know what tool or app can work with this file format
When standard identification tools like TrID and a hex editor come up empty, FormatSense brings in LLM-powered analysis: it identifies the format, encoding, and data category based on content — not signatures. You get the format name (or the most likely hypothesis), recommended tools, and a brief description of the structure.
Need to work with data in an unknown format?
—I have a binary blob from a legacy system and need to understand its structure
—We inherited a codebase that reads proprietary file formats with zero documentation
—Our analysts spend days reverse-engineering vendor exports that could be parsed in minutes
Upload the file — FormatSense will determine the data structure and generate a formal format description in Kaitai Struct (a universal DSL for binary formats) plus a ready-to-run Python parser. Continue development in any language — on your own or with an AI assistant: generate a parser, build a visualization, or write a converter. Instead of hours of manual reverse engineering — results in minutes.
Need to pull data from an unknown file?
—I want to extract usable data from a file I can’t parse myself
—I want a schema so my code can consume this file programmatically
FormatSense analyzes the internal structure and produces an interactive HTML report with a breakdown of the file structure, extracted data in JSON format, and a schema describing the data structure. With JSON and a schema in hand, you can immediately move on to visualization, analysis, or transformation — manually or with an AI assistant.
An archive of files in legacy or proprietary formats?
—Our data pipeline breaks on unexpected file formats and we need automated classification
—We need to catalog thousands of files from an acquisition before we can integrate them
—Our digital preservation team needs format identification at scale for long-term storage planning
Use the FormatSense API to automatically process a collection and get format classification (type, encoding, recommended tools) plus extracted data in JSON for each file. The results can be used to build a search index, catalog, or RAG system — even if the original formats are long unsupported.
Hundreds of files in various formats, documentation lost?
—We’re migrating off a legacy system and nobody remembers what half these file types are
—We have mainframe data exports (EBCDIC, packed decimal) that nobody on staff can interpret anymore
—Government/regulatory archives require us to prove we can still read files from 20 years ago
FormatSense helps at every stage. Reconnaissance — run files through classification via the API; the service identifies the format, encoding, and category of each file, letting you sort the collection into known formats vs. files needing deep analysis. Extraction — for data files, it produces structured JSON and a schema, simplifying the design of target tables in your database. Conversion — if a file uses a legacy encoding or intermediate format (like base64, uuencode, EBCDIC), FormatSense automatically converts it to a readable form. The entire process is automatable via the API: a script of a dozen lines can process a file collection and load the results directly into your target system.
Ready to try it on your own file?
Upload a file