The Windows Prefetch file format across versions

Most people who work with Prefetch never look at the bytes. They run a parser, they read a table, and they move on. That is usually fine. It is not fine when the parser is wrong, and the only way to know whether your parser is wrong is to understand what it is supposed to be doing.

The Windows Prefetch format has changed five or six times in twenty years. Each change broke at least one popular parser at the time. If you are reading a .pf file from a Win11 host with a tool that was last updated in 2019, you should care about this.

SCCA: the magic and the shape

Every legacy Prefetch file starts with the ASCII string SCCA at offset 0x04. The four bytes before it are the version field, little-endian. That is the first thing any parser checks, and the version is the gate to everything else.

The known values:

0x11 (decimal 17) for Windows XP and 2003
0x17 (23) for Vista and Windows 7
0x1A (26) for Windows 8 and 8.1
0x1E (30) for Windows 10 (uncompressed era and the first MAM-compressed builds)
0x1F (31) for later Windows 10/11 builds. Treat this as "Win10+ recent" rather than tied to a specific build.

If the first four bytes of the file are not SCCA and not the MAM signature (more on that in a minute), you do not have a Prefetch file. You have something else with a .pf extension, and the case where I have seen that the most is when somebody copied a file across SMB with an encoding-aware tool that mangled it. Re-acquire.

After the version and magic, you get a small header block: total file size, the executable name encoded as UTF-16LE (up to 30 characters, null-padded), and a path hash (also covered separately in detail). The executable name in the header should match the prefix on the filename. When it does not, you are looking at a renamed file, manual tampering, or, more rarely, a binary whose internal name field was longer than 29 characters and got truncated.

Sections A, B, C, D, and the rest

Inside the file, Prefetch is organized into sections that have stayed roughly identical in purpose across versions, even as offsets and field widths have changed:

Section A is the file metrics array. One entry per file that the binary touched during the trace window. Each entry has a start time, a duration, a filename string offset into Section C, and a file reference into the MFT (the latter only on newer versions).
Section B is the trace chains array. This is the page-fault sequencing data the operating system actually cares about. From a forensic standpoint it is mostly noise; you will rarely parse it explicitly.
Section C is the filename strings. UTF-16LE paths, one after another, referenced by offset from Section A. This is what becomes the "file load list" when you print a report.
Section D is the volume information array. One entry per volume the binary touched. Each entry has a volume path (typically \DEVICE\HARDDISKVOLUMEn), a creation time, a serial number, and an index into the directory strings table.

Following Section D you usually find the directory strings and, on newer versions, additional metadata that varies by build. The eight execution timestamps live in the header on Win8+, in a slightly different layout than on Win7 (which only had one timestamp). The run counter sits next to them.

The two things that move across versions are the offsets to each section (recorded in the header) and the size of individual entries (the file metrics array entry grew from 20 bytes on XP/7 to 32 bytes on Win10 to accommodate the MFT reference). A parser that hardcodes "section A starts at offset 0x98" instead of reading the offset from the header will fail silently on the next OS release. There are several of those still floating around on GitHub.

MAM: the Win10 compression wrapper

Starting somewhere around Windows 10 1607, Microsoft wrapped Prefetch files in MAM compression. The file no longer starts with SCCA. It starts with MAM\x04 (the byte 0x04 is the algorithm identifier, Xpress Huffman). The next four bytes are the uncompressed size, little-endian. Everything after that is the Xpress-Huffman compressed stream containing what would otherwise be the plain SCCA file.

If you decompress correctly, you get back exactly the SCCA structure described above, version 0x1E or 0x1F. The format on the wire is the same; only the wrapper changed.

The relevant Windows API is RtlDecompressBufferEx with COMPRESSION_FORMAT_XPRESS_HUFF. Eric Zimmerman's PECmd calls into the Windows API on Windows hosts and uses a managed Xpress-Huffman implementation when run on Linux. Several Python parsers ship a pure-Python Xpress-Huffman decoder that is slow but correct. The ones that do not handle MAM at all will print "not a Prefetch file" or, worse, dump garbage and exit zero.

Always confirm your parser's MAM support by feeding it a known-good Win10 Prefetch file (any file from C:\Windows\Prefetch\ on a current Win10/11 host will do) and checking that the run count and execution times match what PECmd reports.

What changed between Win10 v30 and Win11 v31

Less than the bump in version number suggests, on paper. The internal layout is nearly identical. What changed in practice:

The way the path hash incorporates context (command-line arguments, "App-V" environment, package full name on UWP apps) was extended. The same EXE in two different sandboxes produces two different .pf files, where on earlier builds it might have collided.
Prefetch trace times became more granular. On v31 you sometimes see sub-second resolution that earlier versions rounded.
The expiry behavior shifted. The eight-timestamp ring is documented as a fixed-size FIFO, but Win10 1709+ aggressively trims older entries during idle maintenance. v31 trims even more aggressively on SSD-backed systems.

A parser that handles v30 will usually handle v31 without modification, because the structure barely moved. What moves underneath the structure is what Windows itself writes into it, and that is the part you cannot fix in a parser.

The file metrics array: what to look at

If you only have time to look at one part of a Prefetch file by hand, look at Section A.

Each entry tells you: when in the ten-second trace window the file was first touched, how long the touch lasted, and the path to the file. On newer versions, the MFT reference number gives you the file ID, which is gold when the path no longer exists. You can cross-reference the file reference to the live MFT and recover the original entry even if the path has been re-used by a different file since.

A specific pattern worth remembering: a .pf file whose load list contains references that resolve to MFT entries marked deleted in the live MFT means those files were on disk when the binary ran and have been removed since. Pair this with the USN journal and you can rebuild the directory at the moment of execution.

Which parsers handle which versions

Operational notes, current as I write this:

Eric Zimmerman's PECmd is the reference. SCCA v17, v23, v26, v30, v31, and MAM-wrapped variants of v30/v31 are all handled correctly. This is the offline tool I default to.
libscca (Joachim Metz) is the C library underneath several other tools. It tracks the format closely and is what I would build against if I were writing my own parser today.
Windows-Prefetch-Parser (PoorBillionaire) is the Python parser most people will find first on GitHub. It is fine on pre-Win10 files. It does not handle MAM-compressed Prefetch out of the box and will silently produce wrong output on Win10/11 hosts unless you patch in a decompressor. Read the README carefully and check the date.
The parser on this site handles SCCA and MAM-wrapped files entirely in the browser. Useful when you want to peek at one or two files without spinning up tooling. For bulk work, use PECmd.

If you have a parser whose name is not on that list and whose last commit was more than two years ago, test it against a known sample before trusting its output.

Why this matters when you write a report

The version field is the first thing you check. The MAM wrapper is the second. The section offsets, read from the header, are the third. Get those three right and you can parse any Prefetch file from any production Windows from XP to current. Get any of them wrong and your tooling will produce a number, but the number will be wrong, and you will not know it is wrong until somebody more careful than you looks at the raw bytes during cross-examination.

I have been that more careful person, on the wrong side of the engagement, more than once.