Convert PHP Strings to Arrays using preg_split

The PHP preg_split function converts strings to arrays in a manner similar to the explode function. The main difference is that preg_split uses a regular expression to specify the delimiter. The preg_split function also includes options that give it more flexibility. We describe and demonstrate these on this page.

In the example below, we demonstrate using preg_split to convert the same string that we demonstrated with explode in our PHP Strings Tutorial. We show how to specify the delimiter string as a regular expression:

$fruits = 'apple, orange, pear, banana, raspberry, peach';
// defining ', ' delimiter as regular expression
$fruits_ar = preg_split('/, /', $fruits);
var_dump($fruits_ar);
/* View Source display:
array(6) {
  [0]=>
  string(5) "apple"
  [1]=>
  string(6) "orange"
  [2]=>
  string(4) "pear"
  [3]=>
  string(6) "banana"
  [4]=>
  string(9) "raspberry"
  [5]=>
  string(5) "peach"
} */

The result is the same as shown for explode. Keep in mind that it is more efficient to use explode when you don't need the regular expression or other capabilities of the preg_split function.

The following example shows a situation where explode would not be sufficient to specify a delimiter. Our strings consists of words separated with either hyphens or underscores. We show how to convert them to arrays using preg_split:

$str = 'a-string-with-hyphens';
$str2 = 'a_string_with_underscores';
$str3 = 'a_string-with_some-of_each';
// hyphen underscore delimiter as regular expression
$regexp = '/[-_]/';
// display arrays returned by preg_split using print_r
print_r( preg_split($regexp, $str) );
print_r( preg_split($regexp, $str2) );
print_r( preg_split($regexp, $str3) );
/* Array
(
    [0] => a
    [1] => string
    [2] => with
    [3] => hyphens
)

Array
(
    [0] => a
    [1] => string
    [2] => with
    [3] => underscores
)

Array
(
    [0] => a
    [1] => string
    [2] => with
    [3] => some
    [4] => of
    [5] => each
) */

We declare three strings that use hyphens and/or underscores to delimit words, and we declare a regular expression to use as a delimiter. We then pass the regular expression delimiter and each of these strings to preg_split and display the returned arrays using print_r.

Limit and Flag Optional Parameters

An optional third argument to preg_split can be used to specify a limit on the number of array elements returned. An optional fourth argument can be used to pass flags which specify whether to:

  • Exclude empty elements in the resulting array: PREG_SPLIT_NO_EMPTY.
  • Hold the delimiter in the returned array: PREG_SPLIT_DELIM_CAPTURE.
  • Hold the offset for each array element, that is, its location within the original string: PREG_SPLIT_OFFSET_CAPTURE.

The flags can be combined using the bitwise operator (|).

As we demonstrated with explode, empty array elements can be returned when converting a string to an array. We perform the same conversion using preg_split with the PREG_SPLIT_NO_EMPTY flag to exclude those empty array elements from the result:

$path = '/home/someuser/Documents/notes/misc/';
// split string on forward slash
$dirs = preg_split('/\//', $path, NULL, PREG_SPLIT_NO_EMPTY);
var_dump($dirs);
/* array(5) {
  [0]=>
  string(4) "home"
  [1]=>
  string(8) "someuser"
  [2]=>
  string(9) "Documents"
  [3]=>
  string(5) "notes"
  [4]=>
  string(4) "misc"
} */

To specify a forward slash in a regular expression, place a backslash in front of it to escape it so that it will not be perceived as the end of the regular expression. You can pass a flag as a fourth argument without specifying a limit by passing NULL as the third argument. Notice that the resulting array contains no empty elements.

Capturing the Delimiter

Whether using explode or preg_split, the delimiter used to split the string is generally discarded. However, when you pass the PREG_SPLIT_DELIM_CAPTURE flag to preg_split, the delimiter is retained in a separate array entry.

We demonstrate with a string that consists of several sentences ending with either a period, a question mark, or an exclamation mark. Our delimiter pattern includes the period, question mark, and exclamation mark in square brackets surrounded by parentheses to enable capturing the character that was used for each sentence. With the PREG_SPLIT_DELIM_CAPTURE flag passed as the fourth argument, the delimiter used for each sentence is held in the array element following the one holding the sentence:

$str = 'This is an example string with several sentences. 
    The purpose is to demonstrate the preg_split function. 
    The function provides optional limit and flag parameters. 
    It is one of several regular expression functions. 
    Don\'t let that scare you.
    Regular expressions can be fun!
    Are you game? Let\'s continue.';

$ar = preg_split('/([.?!])\s*/', $str, NULL, 
        PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
var_dump($ar);
/* array(16) {
  [0]=>
  string(48) "This is an example string with several sentences"
  [1]=>
  string(1) "."
  [2]=>
  string(53) "The purpose is to demonstrate the preg_split function"
  [3]=>
  string(1) "."
  [4]=>
  string(56) "The function provides optional limit and flag parameters"
  [5]=>
  string(1) "."
  [6]=>
  string(49) "It is one of several regular expression functions"
  [7]=>
  string(1) "."
  [8]=>
  string(24) "Don't let that scare you"
  [9]=>
  string(1) "."
  [10]=>
  string(30) "Regular expressions can be fun"
  [11]=>
  string(1) "!"
  [12]=>
  string(12) "Are you game"
  [13]=>
  string(1) "?"
  [14]=>
  string(14) "Let's continue"
  [15]=>
  string(1) "."
} */

Capturing the Offset

When you pass the PREG_SPLIT_OFFSET_CAPTURE flag to the preg_split function, each element of the returned array is itself an array consisting of two elements. The first is the string, the second is its offset in the string being converted. This will become clearer with an example. We borrow from an example above:

$str = 'a-string-with-hyphens';
$regexp = '/[-_]/';
print_r( preg_split($regexp, $str, NULL, PREG_SPLIT_OFFSET_CAPTURE) );
/* Array
(
    [0] => Array
        (
            [0] => a
            [1] => 0
        )

    [1] => Array
        (
            [0] => string
            [1] => 2
        )

    [2] => Array
        (
            [0] => with
            [1] => 9
        )

    [3] => Array
        (
            [0] => hyphens
            [1] => 14
        )

) */

Back to top