Over 29 Years ago Phil Katz (PKWare Inc.) and Gary Conway published his great new file format PKZIP under the public domain, and before you ask it was on February 14, 1989! And now, 29 Years later, I had to study his format too to complete and update a really basic ZIP library written in PHP. I know that ZipArchive is the primary API to handle ZIP files under PHP, but the PECL zip package isn’t available in each environment and I had just fun to “play” and code such a fallback option.

My “road to PKZip” was really easy and unproblematic, because the simplest version of the ZIP structure just consists of 2 “containers” per file, the compressed file data itself and a “final archive record” (Btw.: You will find all my sources under this article, if you want to ride this road for youself or just want to see / read a really good guide.).

The basic PKZIP structure

local file header / file
The local file header contains always the basic informations about the ZIP file structure (such as the version number, bit flags and the compression method) and, of course, the meta informations about the single file / file data (like the modification date/time as msDOS timestamp, the CRC-32 checksum, the [un]compressed file size, the [length of the] filename within the ZIP archive and optional / configurable extra fields).
file data / file
The compressed or plain, encrypted or decrypted file data ala gzcompress($data, $compression, ZLIB_ENCODING_DEFLATE), for example. There are also other methods besides DEFLATE, but it’s mostly used and the PHP gzcompress function is also really easy and smart, but also comes with an unexpected issue: The last 4 CRC-32 checksum bytes, as well as the 2-bytes magic number(?) at the start, corrupts your ZIP archives and MUST be stripped like: substr($gzcval, 2, strlen($gzcval) - 6).
central directory / file
The central directory contains ALL the informations which are provided within the local file header too. However, this data is extended with the following ones: The version number / flags where the archive is made by, the file comment (as well as the respective length), the disk / part number where the file data is stored, some internal and external file attribute flags (you will read about that later in this article) and the local header offset number, of course.
end of central directory / disk
While the local file header and the central directory record are always per file, the end of central directory is unique per part / disk. It contains the current disk / part number as well the disk / part number where the central directory starts, the number of items which are located on this disk, the number of all items in general, the complete size of the central directory, the respective offset on the respective disk / part where the central directory starts and the (length of the) ZIP archive comment.

How does this look in PHP?

Step 1: Prepare the data

Before we start to PK-ZIP our files, we need to prepare it and to prepare also some meta infos about the (compressed) file data. Let’s start with the MS-DOS time/date stamp which is completely different to the UNIX timestamp and MUST be packed in 16 bits per value (time and date). To simplify this, we use a small functions, which converts UNIX timestamps into msDOS stamps. Last but not least: Get the CRC-32 checksum, get the length of the file data, DEFLATE the file data itself and fix the CRC-32 issue.

<?php

    $filedata = " your filedata ";

    /*
     |  CONVERT UNIX TIMESTAMP INTO msDOS TIMESTAMP
     |  @source     https://gist.github.com/SamBrishes/2c933ad07bc109c216e9f71d19e7c146
     */
    function unix_to_msDOS($time = 0){
        if(!is_int($time) || $time < 315550800 || $time > 4354837199){
            $time = time();
        }
        $array = getdate($time);
        return (
            (($array["year"]-1980 << 25)) | 
            (($array["mon"]       << 21)) | 
            (($array["mday"]      << 16)) | 
            (($array["hours"]     << 11)) | 
            (($array["minutes"]   <<  5)) | 
            (($array["seconds"]   >>  1))
        );
    }
    $time = msDOS_time(time());
    
    /*
     |  GENERAL FILE DATA
     */
    $path = "in-archive/my-file.ext";
    $crc32 = crc32($filedata);
    $length = strlen($filedata);
    $deflate = gzcompress($filedata, 6, ZLIB_ENCODING_DEFLATE);
    
    /*
     |  FIX CRC-32 ISSUE
     */
    $deflate = substr($deflate, 2, strlen($deflate) - 6);
    $dlength = strlen($deflate);


Step 2: Create the local file header

Each local file header MUST start with (01) the static signature HEX: "\x50\x4b\x03\x04", followed by (02) the version which is requried to extract this PKZIP archive: HEX: "\x14\x00" (which means 20 => 2.0 converted into decimal), and (03) an optional BIT flag which we will leave empty in our example. (04) The respective compression method is our fourth setted option with HEX: "\x08\x00" as value (which means that this file is DEFLATED).

Now we can just pack and write: (05) The msDOS time/date stamp, (06) The CRC-32 checksum, (07) The compressed filesize followed by (08) The uncompressed one, (09) The length of the in-archive filepath (including the filename of course), (10) The length of the used extra fields (which is 0 in our example), (11) The plain in-archive filepath (again with the filename) and the extra fields, if you want to add any.

In our example below we also already add the deflated / compressed file data:

<?php
    
    /*
     |  LOCAL FILE HEADER
     |  01      SIGNATURE
     |  02      Version needed to extract this archive.
     |  03      General purpose bit flag.
     |  04      Compression method.
     |  05      Last modification DOS datetime.
     |  06      CRC32 value.
     |  07      Compressed Filesize.
     |  08      Uncompressed Filesize.
     |  09      Length of the filename inside the archive.
     |  10      Length of the extra fields.
     |  11      The relative path / filename inside the archive.
     |  12      The main file data value.
     */
    $header = 
        /* 01 */   "\x50\x4b\x03\x04" . 
        /* 02 */   "\x14\x00" . 
        /* 03 */   "\x00\x00" . 
        /* 04 */   "\x08\x00" . 
        /* 05 */   pack("V", $time) . 
        /* 06 */   pack("V", $crc32) . 
        /* 07 */   pack("V", $dlength) . 
        /* 08 */   pack("V", $length) . 
        /* 09 */   pack("v", strlen($path)) . 
        /* 10 */   pack("v", 0) . 
        /* 11 */   $path . 
        /* 12 */   $deflate;


Step 3: Create the central directory Record

The central directory record starts also everytime with (01) a specific and static signature value HEX: "\x50\x4b\x01\x02". Instead of (03) the version, which is required to extract this file, we need to add (02) a “Archive-Version made by” bit flag (or just HEX: "\x00\x00") first. Now we can just copy the items 02 to 10 from the local file header above and just paste them as 03 to 11 items to the central directory record.

Now we need to pack and write (12) the length of the file comment (or just 0), (13) The disk number where the file is located, some (14) internal and (15) external attributes (if needed), (16) the offset of the local file header (which is 0, because our archive starts with this local file header), (17) The plain in-archive filepath (again with the filename), the extra fields (which length is defined on 11) and last but not least a file-specific comment (which length is defined on 12).

Sounds easy, ehh?

<?php
    
    /*
     |  CENTRAL DIRECTORY RECORD
     |  01      SIGNATURE
     |  02      MadeBy Version numbers.
     |  03      Version needed to extract this archive.
     |  04      General purpose bit flag.
     |  05      Compression method.
     |  06      Last modification DOS datetime.
     |  07      CRC32 value.
     |  08      Compressed Filesize.
     |  09      Uncompressed Filesize.
     |  10      Length of the filename inside the archive.
     |  11      Length of the extra fields.
     |  12      Length of the file comment.
     |  13      The disk number where the file exists.
     |  14      Internal file attributes.
     |  15      External file attributes.
     |  16      Offset of the local file header.
     |  17      The relative path / filename inside the archive.
     */
    $central = 
        /* 01 */   "\x50\x4b\x01\x02" . 
        /* 02 */   "\x00\x00" . 
        /* 03 */   "\x14\x00" . 
        /* 04 */   "\x00\x00" . 
        /* 05 */   "\x08\x00" . 
        /* 06 */   pack("V", $time) . 
        /* 07 */   pack("V", $crc32) . 
        /* 08 */   pack("V", $dlength) . 
        /* 09 */   pack("V", $length) . 
        /* 10 */   pack("v", strlen($path)) . 
        /* 11 */   pack("v", 0) . 
        /* 12 */   pack("v", 0) . 
        /* 13 */   pack("v", 0) . 
        /* 14 */   pack("v", 0) . 
        /* 15 */   pack("V", 32) . 
        /* 16 */   pack("V", 0) . 
        /* 17 */   $path;


Step 4: The end of central directory record

You need to repeat Step 1 to 3 for each single file data BEFORE you finish up your new archive file with the end of central directory record. This record starts, like all containers, with (01) the static signature: HEX: "\x50\x4b\x05\x06" followed by: (02) The number of this disk / part (or just 0), (03) The number of the disk / part where the central directory starts (or just 0, again), (04) The total number of items which is located on this disk / part and (05) The number of items within all disks / parts.

Now pack and write (06) the length of all central directory records (on this disk / part), (07) the offset where the central directory starts (on the respective disk / part), (08) the length of the PKZIP-unique archive comment and the plain archive comment itself. The following code will also add (0a) the local file header and (0b) the central directory at the beginning of our new variable to directly complete our new ZIP archive file.

<?php
    
    $comment = "Created with PHP and the pytesNET Tutorial!";
    /*
     |  COMPLETE THE ZIP FILE
     |  0a      The file header item.
     |  0b      The central directory item.
     |  01      The signature for the end of the central directory record.
     |  02      The number of this disk / part.
     |  03      The number of the disk / part where the central directory starts.
     |  04      Total number of entries on this disk / part.
     |  05      Total number of entries in general.
     |  06      Length of the central directory.
     |  07      Offset where the central directory starts.
     |  08      The length of the following comment field.
     |  09      The archive comment.
     */
    $zip = 
        /* 0a */   $header . 
        /* 0b */   $central . 
        /* 01 */   "\x50\x4b\x05\x06" . 
        /* 02 */   "\x00\x00" . 
        /* 03 */   "\x00\x00" . 
        /* 04 */   pack("v", 1) . 
        /* 05 */   pack("v", 1) . 
        /* 06 */   pack("V", strlen($central)) . 
        /* 07 */   pack("V", strlen($header)) . 
        /* 08 */   pack("v", strlen($comment)) . 
        /* 09 */   $comment;


Step 5: Finish

Now we just need to output or write our new ZIP Archive variable directly on the browser or into a file (file_put_contents("my-pkzip-file.zip", $zip);) respectively. That’s it, you have successfully written a PK-ZIP archive file, congratulations!

Found an error, want to improve or talk about something? Just visit the comments section below and write down your thoughts, now! AND if you like this article, help us and share it on FaceBook, Twitter, Google+ and your local Coffee filter dealer. Thanks!

Sincerely,
Sam.


Bonus Step: Create empty Directories

Just a small script later and you have a recursive ZIP program, completely written in PHP, to backup your own awesome project. But what’s that? This small script doesn’t backup empty directories! Okay, that’s theoretically not really important, because the directory is empty, still. However, if you love this empty directory as much as your keyboard and if you REALLY don’t want to part with, use the following instructions to keep all of your wonderful empty stuff:

We need to add a local file header as well as an central directory record to include the empty directory. Use just an empty string ("") as file data and complete step 1 and step 2 without any difference / change. The magic happens on (15) the external attribute item on Step 3, so we just use HEX: "\x00\x00\xFF\x41" this time, this attribute tells the respective operating system about the lonely folder existence of this item! And now complete the PKZip archive with Step 4!

Don’t forget: You should zip empty directories ONLY if your really want to keep the folder structure, because such records are otherwise only space-wasters for “real” content!

<?php

    $time = msDOS_time(time());
    
    /*
     |  GENERAL FILE DATA
     */
    $path = "in-archive/my-folder/";
    $crc32 = crc32("");
    $length = strlen("");
    $deflate = gzcompress("", 6, ZLIB_ENCODING_DEFLATE);
    
    /*
     |  FIX CRC-32 ISSUE
     */
    $deflate = substr($deflate, 2, strlen($deflate) - 6);
    $dlength = strlen($deflate);

    /*
     |  LOCAL FILE HEADER
     |  The same as above (STEP 2)!
     */
    $header = 
        "\x50\x4b\x03\x04" . 
        "\x14\x00" . 
        "\x00\x00" . 
        "\x08\x00" . 
        pack("V", $time) . 
        pack("V", $crc32) . 
        pack("V", $dlength) . 
        pack("V", $length) . 
        pack("v", strlen($path)) . 
        pack("v", 0) . 
        $path . 
        $deflate;
    
    /*
     |  CENTRAL DIRECTORY RECORD
     |  ..      Look @ STEP 3
     |  14      Internal file attributes.
     |  15      External file attributes.
     |  16      Offset of the local file header.
     |  ..      Look @ STEP 3
     */
    $central = 
        "\x50\x4b\x01\x02" . 
        "\x00\x00" . 
        "\x14\x00" . 
        "\x00\x00" . 
        "\x08\x00" . 
        pack("V", $time) . 
        pack("V", $crc32) . 
        pack("V", $dlength) . 
        pack("V", $length) . 
        pack("v", strlen($path)) . 
        pack("v", 0) . 
        pack("v", 0) . 
        pack("v", 0) . 
        pack("v", 0) . 
        "\x00\x00\xFF\x41" . 
        pack("V", 0) . 
        $path;