PHP: Frankenstein arrays

Posted in PHP, Development, 2 years ago Reading time: 7 minutes
image

One of the core data types in PHP is the array. Mostly unchanged since the early beginnings of the language. The name "array" is a bit unfortunate, as well as the implementation. It is not really an array.

In fact it is some sort of Frankenstein combination of a list and a dictionary, known from other languages. This is quite confusing and can cause unexpected and sometimes nasty effects. Sometimes, it breaks stuff. That happened to me last week. More on that later.

First, to get things clear, let's talk about the difference between lists and dictionaries.

A list, sometimes also known as an array, is like the name suggests, a list of elements of any type. These elements are ordered, and every element has a numeric index, starting with 0.

Example in Javascript:

let myList = [2, 1, 'foo', new Date()];
let myElem = myList[2]; // myElem contains 'foo'

In Python it's called a dictionary, Perl and Ruby call it a hash, in Javascript / JSON it's known as an object. Which is also rather confusing but that's for another time.

Whatever it's name, a dictionary is a collection of key/value pairs. Those key/value pairs don't necessarily a fixed order. The keys are strings, the values can be anything. And every key is unique.

An example in Python:

myDict = {'foo': 'bar', 'boo': 'baz'}
myElem = myDict['foo']  # myElem contains 'bar'
myDict['boo'] = 'bla'
print(myDict)  # output: {'foo': 'bar', 'boo': 'bla'}

Once upon a time, the creators of PHP thought it would be a Good Idea to merge lists and dictionaries into one data type, which, to make things worse, they named "array". With the following effects:

  • elements in a PHP array are always ordened
  • elements in a PHP array can have a string based key, or a numeric index
  • these numeric indexes can be consecutive (spoiler alert: this is essential!)

Hmmm, I wonder if that could lead to problems. Let's see how this works.

$myArray = [
    'element 1',
    'element 2',
    'element 3',
];
print_r($myArray);
print_r($myArray[1]);

returns as output:

Array
(
    [0] => element 1
    [1] => element 2
    [2] => element 3
)
element 2

Looks intuitive, this works just like in many other languages. The elements all get assigned a numeric consecutive index, by which they can be accessed.

With key/value pairs it works in a way you would expect too.

$myArray = [
    'foo' => 'bar',
    'boo' => 'baz',
];
print_r($myArray);
print_r($myArray['boo']);

Output:

Array
(
    [foo] => bar
    [boo] => baz
)
baz

Because the keys are explicitly defined, there are no auto assigned numerical indexes.

It gets slightly more confusing if we combine lists and dictionaries, which is perfectly fine in PHP:

$myArray = [
    'foo' => 'bar',
    'blarp',
    4 => 'elem with numeric index',
    2 => 'another elem with lower numeric index',
    'boo' => 'baz',
];

// add an element to the end of the array
$myArray[] = 'zonk';
print_r($myArray);
$myArray[3] = 'three';
print_r($myArray);

This gives the following output:

Array
(
    [foo] => bar
    [0] => blarp
    [4] => elem with numeric index
    [2] => another elem with lower numeric index
    [boo] => baz
    [5] => zonk
)
Array
(
    [foo] => bar
    [0] => blarp
    [4] => elem with numeric index
    [2] => another elem with lower numeric index
    [boo] => baz
    [5] => zonk
    [3] => three
)
  • if no explicit key is defined, a numeric key is assigned automatically by PHP
  • that key will be the highest numeric key + 1, or 0 if there are none
  • the position of each element is not dependent of the numeric key

As a PHP doesn't necessarily have numeric consecutive keys, it is not always possible to iterate over them this way:

for ($i = 0; $i < count($myArray); $i++) {
    $elem = $myArray[$i];
    // ...
}

This would give unexpected behaviour with the array above, and Undefined array key warnings:

blarp
PHP Warning:  Undefined array key 1 in /home/lennart/Development/php/arrays.php on line 47

another elem with lower numeric index
three
elem with numeric index
zonk
PHP Warning:  Undefined array key 6 in /home/lennart/Development/php/arrays.php on line 47

Therefore, instead of a for loop, it is always better to use a foreach loop:

foreach ($myArray as $key => $elem) {
    // ...
}

Because that will always work like expected.

So, everything's fine then?

No. The world is bigger than PHP alone. Very often you need to exchange data with other languages or applications. This is very often done using JSON. And then it can be an issue.

JSON has become the de facto standard to exchange data between applications and hosts. Every programming language has functions to decode JSON strings to an internal format, and the oter way around, to encode internal data to JSON strings.

For those conversions, it is essential that they work symmetric, in other words: if you convert data to JSON, and back, it should still be the same unchanged data.

With a language where an array can be both a list as a dictionary, this will cause some issues.

$json = '{"0": "No", "1": "Yes"}';
$array = json_decode($json, true);
print json_encode($array);

You would expect the original unchanged JSON string, but in fact you get something else:

["No","Yes"]

A dictionary suddenly turned into a list! That means that the conversion is not symmetric!

It also happens the other way around:

$array = [
    'first',
    'second',
    'third',
];
print json_encode($array) . PHP_EOL;
// remove the second element
unset($array[1]);
print json_encode($array) . PHP_EOL;

Here an array suddenly becomes a dictionary!

["first","second","third"]
{"0":"first","2":"third"}

What is happening here?

An array in PHP is a list if it has consecutive, numerical keys, starting with 0. If you convert it to JSON, it will also become a list.

In all other cases it actually is a dictionary and it will be converted to a JSON object.

Until very recently there was no separate function to test whether an array is a list or not. But, now with PHP8.1 there finally is the function array_is_list. Better late than never, amirite?

But if you use an older PHP version, you could emulate it with this polyfill:

if (!function_exists('array_is_list')) {
    function array_is_list(array $a): bool
    {
        if ($a === []) {
            return true;
        }

        return array_keys($a) === range(0, count($a) - 1);
    }
}

This function can be useful sometimes, but it is not a fix for everything. It couldn't have prevented me from the pit fall from the example above, which is based on a True Event.

$json = '{"0": "No", "1": "Yes"}';
$array = json_decode($json, true);
print json_encode($array);

The above JSON fragment was part of a much bigger JSON document, somewhere in a database. I needed to decode it to PHP to be able to work with it; do some transformations; and then re-encode it to JSON and update the database.

Take note of the second argument for json_decode. The true makes that the return value is an array, instead of a stdClass object. Most of the time I do it like that, because arrays are in general much easier to work with than stdClass objects.

There are quite a lot of array_* functions, but there's almost nothing in that respect for objects. If you want to for example merge two objects, it is easiest to cast them to arrays, throw them through array_merge amd then cast to objects again if needed. So why even use stdClass objects?

Well, now I know! The JSON dictionary {"0": "No", "1": "Yes"} has consecutive numeric keys! Yes, they are strings, but hey, this is PHP! They are silently cast to integer! So the dictionary changed into a PHP list-like array, and then in a JSON list. Cauusing a form somewhere to break. Thank god for backups.

  • Arrays can be deceptive in PHP.

  • They can "act as" a list or a dictionary. Most of the time you don't really notice this at all, until you convert them to a format like JSON.

  • If you are decoding JSON that you need to re-encode back to JSON, it can be a good idea to decode to an object, not an array.

  • If you want to enforce an array to encode to a JSON list (all array keys will be discarded), use:

json_encode(array_values($array));
  • And if you want to enforce an array to encode to a JSON object, use:
json_encode((object)$array);

Related posts

image
How to build a complete Mastodon API Client

Mastodon has an extensive API, but unfortunately no openapi spec. It was quite a challenge to build an API client and implement all methods and entities. Here's how I did it.

Read more →

image
Security tips and best practices for web developers

Security is hard. I compiled a list with tips and best practices that may be useful.

Read more →

image
A new blog platform

It was about time I started writing more about my profession, after all those years. So I started a blog. I built the platform myself. So I wrote a blog post about that.

Read more →

image
Pass, the standard Unix password manager

Many people nowadays are using a password manager, like LastPass, 1Password, Keepass, etc. Not many are familiar with "pass, the unix password manager". That's a shame, because I think it is the best password manager for the tech savvy linux/unix user. Let me tell you why.

Read more →