Understanding Serialisation in PHP

php
Table of Contents

Serialisation, or serialization, is a process that takes in a piece of data and generates a storable or transportable representation of the data.

The serialised representation of data can be achieved with various formats. JSON is very common as most languages have some form of JSON encoding and decoding. You could also use XML, YAML or even a raw string of bytes.

You're probably familiar with PHP's serialize() function. This function accepts a single value and returns the serialised version as a string.

What's unique about serialisation in PHP is that it actually uses a special serialisation format to represent various types of data, including arrays and objects.

Each data type that PHP can serialise has it's own representation and notation when serialised. Let's break these down and look at what they represent.

Booleans are very simple.

b:0;
b:1;

The type specifier for boolean values is b. That is then followed by a colon and an integer representation of the boolean itself, so false becomes 0 and true becomes 1.

Serialising null produces the following.

N;

Null doesn't have any additional data to include, so it is represented by a single N character.

Serialising the value 100 with serialize() returns the following string.

i:100;

Serialised integers are represented with the character i, followed by a colon and the integer's value.

NOTE

The value of the integer is always serialised in "decimal" form (base 10). The serialised version does not contain information about the original format of the value (hexadecimal, octal, binary, etc).

Serialising the value 100.5 has a very similar format.

d:100.5;

The only difference is that instead of an i for integer, the type specifier is d for "double".

Serialised strings carry some extra information with them. The code below is the result of serialising Hello, world.

s:12:"Hello, world";

The type specifier is s. This is then followed by a colon and the result of strlen($value). Then that is followed by another colon and the original value wrapped in double quotes.

s:[strlen(value)]:"[value]"

The reason the length of the string is important here is that is tells the piece of code unserialising how many characters or bytes it needs to process to find the string value.

It also helps in lower level languages where strings might be stored as an array of bytes. If you know how many bytes the string is, you can allocate the correct amount and avoid potential memory issues.

Array serialisation is a little more involved, since a PHP array has keys and values. Let's first serialise an empty array [].

a:0:{}

The type specifier for an array is a. The second component in the serialised data is the length of the array. The third component is where the key and values in the array are placed.

a:[count(value)]:{...values}

To see how the values are serialised, we can serialise a simple array with 3 values [1, 2, 3].

a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}

The stuff inside of the curly braces contains information about the keys and values inside of the array.

INFO

Remember that although we didn't specify any keys in the array, PHP will automatically index the values starting from 0.

The generic formation for the values is as follows.

{key;value;key;value;key;value}

Expanding the original array into a more explicit, keyed equivalent.

[
    0 => 1,
    1 => 2,
    2 => 3,
]

We can look at each key-value pair and see that their serialised values are placed inside of the curly braces, in order.

{i:0;i:1;i:1;i:2;i:2;i:3;}
 ^0  ^1  ^1  ^2  ^2  ^3

If we changed the array slightly and used strings for the keys instead.

[
    'a' => 1,
    'b' => 2,
    'c' => 3,
]
a:3:{s:1:"a";i:1;s:1:"b";i:2;s:1:"c";i:3;}
     ^a      ^1  ^b      ^2  ^c      ^3

It's also possible to serialise nested arrays, useful for more complex structures.

a:1:{i:0;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}

The most complex serialisation format can be seen when serialising objects.

Let's take a simple Nothing class, instantiate it and serialise it.

class Nothing
{
    // ...
}

serialise(new Nothing);
O:7:"Nothing":0:{}

The type specifier for serialised objects is O. That is followed by the length of the class name, and the class name itself wrapped in double quotes.

The number following the class name is the number of properties that the object has. The curly braces then holds the serialised properties, in a similar format to the arrays.

Taking a new class User that has 2 properties — $name and $age — we can see how property values are serialised.

class User
{
    public function __construct(
        public $name,
        public $age,
    ) {}
}

serialize(new User(
    name: "Ryan",
    age: 23
));
O:6:"User":2:{s:4:"name";s:4:"Ryan";s:3:"age";i:23;}

The User object has 2 properties, so the name for each property is serialised as a string and the serialised value comes after.

Converting this format into a simplified grammar might look something like below.

O:[strlen(value::class)]:"[value::class]":[count(properties)]:{...property}

property ::= [name];[value]

Non-public properties

The User class only has public properties, but the serialize() function can also serialise protected and private properties.

Starting with protected properties, let's write a new class SensitiveStuff that has a single protected property $password.

class SensitiveStuff
{
    public function __construct(
        protected $password,
    ) {}
}

serialize(new SensitiveStuff(
    password: 'password123'
));
O:14:"SensitiveStuff":1:{s:11:"\0*\0password";s:11:"password123";}

Everything looks very similar, but you'll notice there are some extra characters where the property name is. Instead of the property name being serialised to s:8:"password", there are 3 additional characters: \0*\0.

The \0 character is known as the "null terminator" or "null byte". The * (asterisk) is what marks this property as protected.

If we change the visibility of the property to private and re-serialise it, it will produce a slightly different value.

O:14:"SensitiveStuff":1:{s:24:"\0SensitiveStuff\0password";s:11:"password123";}

This time, the property name has been prefixed with a null byte character, the class name of the object and another null byte. This pattern represents a private property.

It's not possible to serialise resource values in PHP. When you attempt to serialise an object with a resource stored inside of a property, it will be cast to an integer and serialised as 0.

Thanks for reading! Hopefully this has given you a bit of an introduction to how various value types are serialised in PHP.

In another post I'll write about writing your own deserialisation logic in another programming language, most likely JavaScript!

Enjoyed this post or found it useful? Please consider sharing it on Twitter.