Arrays

Main | Arrays | Practice 8 | Solution 8 | Associative Arrays | Practice 9 | Solution 9 | Regular Expressions | Practice 10 | Solution 10 | More Regular Expressions | Practice 11 | Solution 11

Array Basics

Observe the following collection of animals:

Camels Wildebeests Penguins

If I were to ask you what the first item in the above list was, I am imagine that you would say "Camels". I am sure that if I asked you to identify the second and third items of the list that you could also say what they were.

In English, an array can be defined as an orderly arrangement. The above list of animals is orderly since I can refer to the first, second, or third items in a consistent way, e.g. the first item always refers to "Camels", so we could refer to the collection above as an array.

PHP allows us to declare collections, or arrays, like the one above and allows us to refer to those collections by number i.e. first, second, etc. To declare that a variable $a is an array we type the following:

$a = array();

We could then assign the first, second, and third values of the array to represent the array above with the following:

$a[0] = "Camels";
$a[1] = "Wildebeests";
$a[2] = "Penguins";
Each integer that we assign to identify an element of an array is known as an index. Note that PHP starts counting from zero (as opposed to one) so the first, or 0th element of the array is "Camels". Given our use of the term index we could say that "PHP zero indexes its arrays".

PHP also allows us to declare arrays closer to the way that we would do it in English. Given the list in the beginning of this example, PHP lets us say:

$a = array("Camels", "Wildebeests", "Penguins");

However, you should not forget that PHP indexes its arrays with numbers since the numeric references can be very useful. For example, let's suppose that you wanted to print all of the values of the above array. You could type the following:

print "$a[0]";
print "$a[1]";
print "$a[2]";
but if you had a long array doing the above would be tedious. Instead you should use a loop:
for($i = 0; $i < 3; $i++) {
  print "$a[$i]";
}
Note that in the above example that $i will take on the values of 0, 1, and 2 so that it will reference all of the above values of the array.

One question that comes up when using a loop to iterate the elements of an array is how to know when to stop the loop. In the above example we used the condition $i < 3 to stop our loop, since we knew that the array contained only three elements, but what if we didn't know that? PHP contains a function called count (which has an alias called sizeof) which will tell you how many elements there are in an array. Using count() we could redefine our code to print the elements of an array like this:

for($i = 0; $i < count($a); $i++) {
  print "$a[$i]";
}

The above code is more powerful since it can handle any array.

Strings as Arrays

Consider the following string:

"Camels"

If we were to split the above string apart so that we were left with a collection of characters (and if we inserted a new line in between each character to differentiate them) we would have the following:

C
a
m
e
l
s

How would we write a PHP function to do the above to any string?

We have previously defined strings as a collection of characters, where a character can be any letter or symbol that we can type on a keyboard. A string can also be seen as an array of characters. Some languages require you to think of strings in this way, but PHP can let you think of strings this way when it is convenient.

We found in the above examples that we used a function to determine the length of an array so that we could control our loop. If we are considering a string to be an array of characters, then we can use the strlen function to control our loop. We can then refer to a normal string's first character (assuming that the string is called called $string) as $string[0]. We can also refer to the string's second character as $string[1] and so on.

Using the above described procedure we can define a function which takes a string as its argument, and then prints every character of the string (it also prints a <br> tag to distinguish the characters):

function char_of_string($string) {
     for ($i = 0; $i < strlen($string); $i++) {
                $char = $string[$i];
                print "$char<br>";
     } 
}

Looping through a string character by character can be very useful. Especially when it comes to HTML processing. We will now examine a way to have very fine control over strings using the ideas above. The example will be somewhat detailed. It is meant to give you an idea of what you can do by processing a string with such detail. Our future examples will give us higher level tools, but if the higher level tools are not available, it is valuable for you to know how to create them.

Working on sub-strings within arrays of characters

For any string of length greater than two we can define sub-strings. So the sub-strings of the string "to" are "t" and "o". The sub-strings of the word "the" include "th", "he", "t", "h", and "e".

Let's apply the idea of sub-strings to very large strings. A web page is collection of strings consisting of HTML tags and content. Suppose that we wanted to extract the HTML sub-strings of a web page so that all we were left with was the content. Let's think small again. Suppose that I had a string which was defined as the following:

$string = "<i>italics</i>";

and that I had a function called rm_html($string) which would return the following for the above:

italics

How would we write such a function? The concept of substrings will be important for this.

Given the above example of what the function rm_html() would do we will say that it removes any HTML sub-string from a larger string, and that a larger string could consist of an entire website. Let's examine defining a function that will do this. We would have to go through each character of the string and return only the characters that were not inside of HTML tags. To know whether or not a character is inside of an HTML tag we can use the following property:

A string $string is an HTML tag if it starts with "<" and ends with ">".

If we were going through a string character by character we could test each character with a conditional that asked if a character was an "<". If we did find such a character we would want to know that we were inside of an HTML tag so that we could avoid any other character that we encountered until we found the character ">".

To keep track of whether any sub-string of $string was an HTML tag we could create a variable that specified whether or not we were inside of an HTML tag. We will start by calling this variable $in_tag and set it to false (0) in the beginning. As we progress through the string we will set $in_tag to true (1) if we encounter the character "<". We will then only set $in_tag back to false (0) if we encounter the character ">" which indicates the closing of an HTML tag.

If we were to go through the above string ("<i>italics</i>") by its characters the above conditions that we established for setting the value of $in_tag could be seen in the following table as we progressed:

Char we are on Value of $in_tag
< 1
i 1
> 1
i 0
t 0
a 0
l 0
i 0
c 0
s 0
< 1
/ 1
i 1
> 1

We now have the following conceptual tools:

  • A way of progressing a string character by character.
  • A notion of sub-strings.
  • A way to distinguish HTML tags from content.

We will use these ideas to define the following procedure:

  1. Go through the string we want to extract the content for character by character.
  2. If a sub-string is not an HTML tag then it is content, so keep it.
  3. In the end what we saved will be the content.

The above procedure can be expressed in terms of variable assignment with the following:

  1. Create a null string $content.
  2. Create a variable $in_tag and set it to false.
  3. For every character $char in the string
    1. If $char is "<" set $in_tag to true.
    2. If $in_tag is false append $char to $content.
    3. If $char is ">" set $in_tag to false.
  4. Return $content.
The above procedure is not far from the actual PHP code to do this:
function rm_html($string="") {
  $content = "";
  $in_tag = 0;
        for ($i = 0; $i < strlen($string); $i++) {
                $char = $string[$i];
                if ($char == "<") $in_tag = 1;
                if (!$in_tag) $content .= $char;
                if ($char == ">") $in_tag = 0;
        } 
  return $content;
}

Note that in the above example that we use the flexibility of the assignment operator to append to the string $content with the ".=" operator. We could have written:

$content = $content.$char;

But the following is more compact:

$content .= $char;

PHP comes with a function that does everything that rm_html() does and more called stip_tags.

However, the exercise that we did above was not in vain. It is important to be able to process strings to the degree that we did. You might find your self having to write code that does something like the above that there isn't a pre-defined function for. A good PHP programmer should have control over the of strings he/she has to process.


jfulton [at] member.fsf.org
22 Aug 2013