Chapter 6. References
WHAT YOU WILL LEARN IN THIS CHAPTER
Creating array references
Creating hash references
Understanding other reference types
Building complex data structures
Manipulating references
In Perl, we tend to care more about how we organize our data than the kinds of data we have. As a result, Perl allows for rich, complex data structures and imposes very few limits on how you can organize your data. In fact, once you get used to the syntax, you may be pleasantly surprised. Memory management is handled for you, there is no pointer math to get wrong, there are no external libraries to choose from and load. You just use the references.
References 101
In some languages, complex data structures are built up via pointers being stored in other data structures, with perhaps pointers to those data structures in turn being stored in other data structures. Then you get to have fun with pointer math, memory management and obscure compiler errors.
Some languages, on the other hand, offer a bewildering array of different classes to implement a variety of different data structures, depending on what you’re looking for and how much time you’re willing to spend reading obscure documentation.
Perl makes it simple. Put any kind of data in any kind of data structure. You, the programmer, are expected to know what to do with it and Perl will (usually) handle the garbage collection and pointer math for you. Like many things in Perl, it Just Works. A reference in Perl doesn’t directly contain data. It is just a scalar variable that tells Perl where some data is kept. To access that data, you need to dereference it.
There are two ways of creating a reference in Perl. You can take a reference to an existing variable my putting a backslash, \, in front of it. The other way is to create an anonymous reference and assign it to a variable.
Note
There’s actually a third way of taking a reference. It’s called the
*foo{THING} syntax (apparently because Perl doesn’t have enough weird names for things). It accesses the value of a typeglob, but we won’t discuss it here because it’s somewhat advanced magic. See perldoc perlref. Typeglobs should not be confused with the glob() function (Chapter 9, Files and Directories).
Array references
As you will recall, an array is just a container for a list. To assign a reference to that array to a scalar, prepend it with a backslash:
my @fools = qw(jester clown motley); my $fools = \@fools;
The $fools variable now contains a reference to the @fools array. You can copy it to another array by prepending it with the @ sign (the array sigil).
my @copy_of_fools = @$fools;
To access individual elements of the $fools array reference, you use the same syntax as you would to access the original array, but you use the dereferecing operator, ->, between the array name and the square brackets. The following prints jester - motley.
my @fools = qw(jester clown motley); my $aref = \@fools; my $first_fool = $aref->[0]; my $last_fool = $aref->[2]; print "$first_fool - $last_fool";
Note
You will often see Perl programmers refer to array references as arefs. Hash references are hrefs. Subroutines (Chapter 7, Subroutines) are subrefs or coderefs. Sometimes we just say ref when we mean we are talking about references in general. Hence, the $aref and $href variable names used in some of this books examples. Though these are not great variable names, the following is often considered worse:
my $fools = \@fools;
It’s OK in Perl to have multiple variables named $fools, @fools and %fools, but it’s confusing and should be avoided whenever possible.
Naturally, you can iterate over an array reference just like you would an array:
foreach my $fool ( @$aref ) {
print "$fool\n";
}And if you need to iterate over the indexes, you use the relatively obscure $# syntax in front of the array reference. The following code does the same thing as the previous code;
my @fools = qw(jester clown motley);
my $fools = \@fools;
foreach my $i ( 0 .. $#$fools ) {
my $fool = $fools->[$i];
print "$fool\n";
}Though your author generally does not recommend the following (it can be confusing), be aware that you can dereference the value and interpolate it into a string just as you would a regular scalar:
foreach my $i ( 0 .. $#$fools ) {
print "$fools->[$i]\n";
}Hash references
You take a reference to a hash the same way you take a reference to an array. Like an array reference, you access individual elements using the dereference operator after the variable name.
my %words = (
dog => 'chien',
eat => 'manger',
clown => 'clown',
);
my $english_to_french = \%words;
my %copy = %$english_to_french;
my $eat = $english_to_french->{eat};
while ( my ( $english, $french ) = each %$english_to_french ) {
print "The french word for '$english' is '$french'\n";
}The previous code snippet should print something like this:
The french word for 'eat' is 'manger' The french word for 'clown' is 'clown' The french word for 'dog' is 'chien'
Note
Though we state that the proper way to access elements in a reference is to use the dereferencing operator, it’s not the only way. You can prepend a $ sign to the variable and skip the dereferencing operator, optionally wrapping the variable in curly braces:
$foo->[7];
$$foo[7]; # same thing
${$foo}[7]; # same thing
$word_for->{laughter};
$$word_for{laughter}; # same thing
${$word_for}{laughter}; # same thingYou might notice the lack of the dereferencing operators. With these alternate ways of dereferences, Perl can be much harder to read, particularly if the maintenance programmer is not familiar with this syntax or fails to note that something is being dereferenced.
We recommend that you limit your use of this syntax those cases where it’s absolutely needed (as with reference slices, explained later in this chapter).
Anonymous references
Anonymous references are commonly used to create rich data structures in Perl. They seem strange at first, but they’re very easy to use.
When you access an individual array or hash element, you wrap the index value in [] or {} respectively. Those braces are also used to construct anonymous hashes and arrays:
my $stuff = [ 'foo', 'bar', 'baz' ];
my $colors = { red => '#FF0000', green => '#00FF00', blue => '#0000FF' };However, it doesn’t make much sense to construct an anonymous array or hash and assign it directly to a scalar just so you can dereference it again. Instead, they are powerful when you use them inside of other data structures.
Anonymous Arrays
Here’s an array of arrays (sometimes referred to as an AoA). The formatting, as usual, is optional and used primarily to make these easier to read.
my @results = (
[ 12, 19, 4 ],
[ 454, 2, 42 ],
[ 6, 9, 13, 44 ],
);Here we’ve created an array containing three anonymous arrays, the last of which has four elements instead of three. Accessing each of these array references is as easy as you might expect:
my $aref1 = $results[0]; my $aref2 = $results[1]; my $aref3 = $results[2];
And then you can access individual elements with the normal dereferencing syntax:
my $number = $aref2->[2];
By this time, $number should contain 42. However, you can directly access that variable from the @results array by simply dereferencing it directly.
my @results = (
[ 12, 19, 4 ],
[ 454, 2, 42 ],
[ 6, 9, 13, 44 ],
);
my $number = $results[1]->[2]; # number is now 42
my $results = \@results;If you had an array of arrays of arrays (AoAoA), you would repeat this:
my $number = $aoaoa[3]->[1]->[0];
As a shortcut, Perl allows you to omit the derefencing operator if you’re already accessing an individual element in a data structure:
my $number = $aoaoa[3]->[1]->[0]; my $number = $aoaoa[3][1][0]; # same thing
The latter syntax is more common than the former, but be wary of creating data structures too complex as they’re often difficult to read.
When using normal data manipulation builtins, just dereference the array and use it as you normally would:
push @$array, $value;
If you have a more complex data structure, use curly braces to tell Perl exactly what you’re dereferencing:
push @{ $some_array[3][0] }, $some_value;Anonymous hashes
Anonymous hashes work the same way, but we use curly braces instead of square brackets. Here’s a hash of hashes (HoH), but we’ll make the top-level hash an anonymous hash assigned to a scalar.
my $sales = {
monday => { jim => 2, mary => 1 },
tuesday => { jim => 3, mary => 5 },
wednesday => { jim => 7, mary => 3 },
thursday => { jim => 4, mary => 5 },
friday => { jim => 1, mary => 2 },
};As you might expect, these are easier to read. What are Mary’s sales for Friday?
my $num_sales = $sales->{friday}{mary};Note that you must use the dereference operator on the first element, but subsequent elements no longer require said dereferencing. Of course, you can use the dereference operator multiple times, if you prefer:
my $num_sales = $sales->{friday}->{mary};Mixing and matching anonymous data structures allows you to create powerful data structures. Example 6.1, “Working with data structures” is smaller version of the previously shown $sales data structure, but instead of showing the number of sales for Jim and Mary, we provide anonymous array references showing the commission per sale.
Example 6.1. Working with data structures
use strict;
use warnings;
use diagnostics;
my $sales = {
monday => {
jim => [ 3, 4 ],
mary => [ 4 ],
},
tuesday => {
jim => [ 3, 5, 1 ],
mary => [ 1, 1, 1, 1, 9 ],
},
};
my $commissions = $sales->{tuesday}{jim};
my $num_sales = @$commissions;
my $total = 0;
foreach (@$commissions) {
$total += $_;
}
print "Jim made $num_sales sales on Tuesday and earned \$$total commission\n";Note
sales.pl available for download at Wrox.com.
That tells us that Jim isn’t earning a lot of money.
Jim made 3 sales on Tuesday and earned $9 commission
We escaped the first dollar sign on $total to tell Perl that we should not interpolate it that dollar sign as part of a variable, but merely print it.
As with arrays, data manipulation builtins behave as normal, so long as you dereference the item first.
my @days_of_the_week = keys %$sales;
my @sales_people = keys %{ $sales->{monday} };Other references
Arrays and hashes are the two most common types of references, but there are a variety of other references that can prove useful from time to time. The most popular is a subroutine reference. The following prints the number 9.
my $add_two = sub {
my $number = shift;
return $number + 2;
};
print $add_two->(7);Don’t worry about how that works for now. We’ll be covering subroutine references more in Chapter 7, Subroutines, but we include it here for completeness.
Naturally, we can take a reference to a scalar. The following prints Ovid.
my $name = 'Ovid'; my $ref = \$name; print $$ref;
Scalar references might seem odd, but they do have uses at times.
Working with References
Knowing how to create references and fetch data out of them is one thing. However, many times you’ll need to copy all or part of a reference without changing the original reference. Or perhaps you can’t figure out why you’re not getting the right data, so you need to debug your reference. We’ll cover several ways of handling these issues.
Debugging
In the first Try It Out in this chapter, you saw how to work with references and even print them out. However, sometimes they’re a bit confusing and you’re not sure what you have. For example. let’s say you have the following line as line 23 of your program:
print $aref->[0]{sales};And your program dies with the error message:
Not a HASH reference at some_program.pl line 23.
Now you want to know what you really have in the $aref variable.
One way to handle this is to just print $aref->[0]. In this case, it might print something like ARRAY(0xc51220). When you print a reference, you see the type of reference (an ARRAY in this case) followed by its hexadecimal address in memory.
Another way of dealing with this is the ref() function:
print ref $aref->[0];
For something that is not a reference, ref() returns the empty string. Here’s a handy little program that shows various reference types. You won’t understand all of these yet, but that’s OK. When you’re done with the book, this will be clear.
use strict;
use warnings;
my $foo;
sub handler {}
my $scalar = ref $foo;
my $scalarref = ref \$foo;
my $arrayref = ref \@ARGV;
my $hashref = ref \%ENV;
my $coderef = ref \&handler;
my $globref = ref \*foo;
my $regexref = ref qr//;
my $objectref = ref CGI->new;
print <<"END_REFERENCES";
Scalar: $scalar
Scalar ref: $scalarref
Array ref: $arrayref
Hash ref: $hashref
Code ref: $coderef
Glob ref: $globref
Regex ref: $regexref
Object ref: $objectref
END_REFERENCESAnd that will print:
Name "main::foo" used only once: possible typo at /var/tmp/eval_CqOi.pl line 10. Scalar: Scalar ref: SCALAR Array ref: ARRAY Hash ref: HASH Code ref: CODE Glob ref: GLOB Regex ref: Regexp Object ref: CGI
You see nothing printed for $scalar because ref() returns the empty string if called with an argument that is not a reference. The strange main::foo warning is because we’ve take the reference to something called a typeglob. We won’t cover them much in this book, but you can read perldoc perldata for more information if you’re curious.
The rest of the names should be straightforward, even though we’ve not covered all of the types yet. Globs will be covered (slighty) in Chapter 9, Files and Directories and regular expressions (the $regexref) will be covered in Chapter 8, Regular Expressions. Calling ref() on an object (Chapter 12, Object Oriented Perl) merely returns the name of the object’s class.
Warning
All of the references we’ve used in this chapter have been hard references. Hard references tell Perl where to find some data. However, there’s also a soft reference, sometimes referred to as a symbolic reference. Rather than telling Perl where some data is kept, it contains the name of a variable or subroutine that Perl will then access or call to get the data you want. Soft references are considered dangerous because they’re very easy to get wrong. As a result, they are illegal when you use strict. We won’t discuss them further. See perldoc strict and perldoc perlref for more information.
For large data structures, you might find it very frustrating to keep printing individual elements to find out what they are. This is where the very useful Data::Dumper module comes in handy. Data::Dumper has been shipped with Perl ever since version 5.005 (released July 1998).
You can add the following before the offending line to see what you have:
use Data::Dumper; print Dumper($aref);
That might print out something like this:
$VAR1 = [
[
1,
3
],
[
2,
5
]
];As you can see by reading this data structure, we have an array ref of array refs, not an array ref of hashrefs. Data::Dumper is an invaluable debugging tool when trying to figure out just what went wrong with your code. See perldoc Data::Dumper to understand how to customize its output.
Note that if you want to print out the values of arrays and hashes which are not references, you must pass them by reference to Data::Dumper and your output may look confused.
use Data::Dumper; my @words = qw( this that other ); print Dumper(@words); That prints out: $VAR1 = 'this'; $VAR2 = 'that'; $VAR3 = 'other';
However, when you pass the array by reference, you get a cleaner output, so long as you understand references.
print Dumper(\@words);
$VAR1 = [
'this',
'that',
'other'
];Copying
Sometimes you need to copy a data structure. Ordinarily you can copy a variable like this:
my $x = 3; my $y = $x; $y = 4; print "$x - $y";
That prints 3 - 4. This is because the assignment operator copies the value from one expression to a variable (or variables). However, what happens when that value is a reference?
use Data::Dumper; my $aref1 = [ 1, 3, 7 ]; my $aref2 = $aref1; $aref2->[0] = 9; print Dumper($aref1, $aref2);
That prints:
$VAR1 = [
9,
3,
7
];
$VAR2 = $VAR1;But how can the two variables be the same? We only changed the first value of the second array reference.
That’s because when we did $aref2 = $aref1, we assigned the reference of the first array ref to the second array ref. In Perl, this is automatically a shallow copy. A shallow copy only copies top-level values. The values of any references will be shared between the variables. To do a deep copy of an array reference and not share the values, you must dereference the array. In this case, we will dereference the array and use [] to create a new array reference.
use Data::Dumper;
my $aref1 = [ 1, 3, 7 ];
my $aref2 = [ @$aref1 ];
$aref2->[0] = 9;
print Dumper($aref1, $aref2);
That prints:
$VAR1 = [
1,
3,
7
];
$VAR2 = [
9,
3,
7
];And as you can see, the two variables no longer share the same array reference.
This can particularly confuse programmers who are not aware of this. Here’s some broken code attempting to copy a data structure and clear out the sales in the new structures.
use Data::Dumper;
my %old_sales = (
monday => { jim => 2, mary => 1 },
tuesday => { jim => 3, mary => 5 },
wednesday => { jim => 7, mary => 3 },
thursday => { jim => 4, mary => 5 },
friday => { jim => 1, mary => 2 },
);
my %new_sales = %old_sales;
while ( my ( $day, $sales ) = each %new_sales ) {
$sales->{jim} = 0;
$sales->{mary} = 0;
}
print Dumper(\%old_sales, \%new_sales);And that prints (reformatted for clarity):
$VAR1 = {
'monday' => { 'jim' => 0, 'mary' => 0 }
'tuesday' => { 'jim' => 0, 'mary' => 0 },
'wednesday' => { 'jim' => 0, 'mary' => 0 },
'thursday' => { 'jim' => 0, 'mary' => 0 },
'friday' => { 'jim' => 0, 'mary' => 0 },
};
$VAR2 = {
'monday' => $VAR1->{'monday'},
'tuesday' => $VAR1->{'tuesday'}
'wednesday' => $VAR1->{'wednesday'},
'thursday' => $VAR1->{'thursday'},
'friday' => $VAR1->{'friday'},
};As you can see, we have overwritten the values in the the %old_sales hash. It would be tedious to dereference each hashref and take a reference to each hash, but it’s also error prone. A much simpler way to handle this is to use the Storable 'dclone' (deep clone) function. It does a deep copy of a reference. Example 6.2, “Using dclone to deep copy data structures” shows how it’s done.
Example 6.2. Using dclone to deep copy data structures
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
use Storable 'dclone';
my %old_sales = (
monday => { jim => 2, mary => 1 },
tuesday => { jim => 3, mary => 5 },
wednesday => { jim => 7, mary => 3 },
thursday => { jim => 4, mary => 5 },
friday => { jim => 1, mary => 2 },
);
my %new_sales = %{ dclone(\%old_sales) };
while ( my ( $day, $sales ) = each %new_sales ) {
$sales->{jim} = 0;
$sales->{mary} = 0;
}
print Dumper(\%old_sales, \%new_sales);Note
dclone.pl available for download at Wrox.com.
And running dclone.pl shows that we have the desired result (again, reformatted for clarity);
$VAR1 = {
'monday' => { 'jim' => 2, 'mary' => 1 }
'tuesday' => { 'jim' => 3, 'mary' => 5 },
'wednesday' => { 'jim' => 7, 'mary' => 3 },
'thursday' => { 'jim' => 4, 'mary' => 5 },
'friday' => { 'jim' => 1, 'mary' => 2 },
};
$VAR2 = {
'monday' => { 'jim' => 0, 'mary' => 0 }
'tuesday' => { 'jim' => 0, 'mary' => 0 },
'wednesday' => { 'jim' => 0, 'mary' => 0 },
'thursday' => { 'jim' => 0, 'mary' => 0 },
'friday' => { 'jim' => 0, 'mary' => 0 },
};Remember, when copying references, if it’s a flat data structure like an array or hash, you can just dereference and assign the values (optionally creating a new reference):
my $acopy = [ @$aref ]; my %hcopy = %$href;
But if there are references in there, you will have a shallow copy and possibly unwanted side-effects.
Slices
When working with arrays and hashes, you sometimes want to fetch several items from the array or hash at once. You might recall that the syntax is to prefix the variable name with an @ (array) symbol and provide two or more indexes/keys.
# array slice
my @array = qw(foo bar baz quux);
my ( $var1, $var2 ) = @array[ 1, 2 ];
# hash slice
my %hash = (
this => 'is',
another => 'boring',
example => 'innit?'
);
my ( $first, $second ) = @hash{ 'another', 'example' };
print "$var1, $var2\n";
print "$first, $second\n";And that prints:
bar, baz boring, innit?
When you have references, you must, as expected, dereference the variables first. The following code prints the same output as the previous code. Note how we dereference the variables to get the slices.
# array slice
my $arrayref = [ qw(foo bar baz quux) ];
my ( $var1, $var2 ) = @$arrayref[ 1, 2 ];
# hash slice
my $hashref = {
this => 'is',
another => 'boring',
example => 'innit'
};
my ( $first, $second ) = @$hashref{ 'another', 'example' };
print "$var1, $var2\n";
print "$first, $second\n";However, if you’re trying to take a slice of a complex data structure, you must use curly braces to make it clear what you are taking a slice of.
my ( $jim, $mary, $alice )
= @{ $sales->[12]{tuesday} }{qw/ jim mary alice /};Yes, the syntax is painful and ugly. Taking slices from references is something that often confuses newer programmers. You may wish to avoid this feature.
Summary
References are Perl’s answer to pointers. Instead of containing data, they tell Perl where the data is contained. The syntax is a bit different from using a normal variable, but it’s clear what’s going on once you get used to it. They’re also the key to building up complex data structures. If you want to know far more than you ever wanted to know about references, you can read the following docs included with Perl:
References:
perldoc perlrefReference tutorial:
perldoc perlreftutData structures cookbook:
perldoc perldscLists of lists:
perldoc perllol
Exercises
Create an array called
@firstand assign several values to it. Take a reference to that array and then dereference it into an array named@second. Print both arrays to ensure that you’ve copied it correctly.Write the code to find the number of sales Jim made on Friday and the total of the sales he made on Friday. Assume each number is the total for an individual sale.
my $sales = { monday => { jim => [ 2 ], mary => [ 1, 3, 7 ] }, tuesday => { jim => [ 3, 8 ], mary => [ 5, 5 ] }, wednesday => { jim => [ 7, 0 ], mary => [ 3 ] }, thursday => { jim => [ 4 ], mary => [ 5, 7, 2, 5, 2 ] }, friday => { jim => [ 1, 1, 5 ], mary => [ 2 ] }, };You want to print out the score for Jim and Mary, but the following code is wrong. What’s wrong with it? Show two ways of fixing it.
my $score_for = { jim => 89, mary => 73, alice => 100, bob => 83. }; my ( $jim, $mary ) = %$score_for{ qw{jim mary} }; print "$jim $mary";
WHAT YOU LEARNED IN THIS CHAPTER
Topic | Key Concepts |
|---|---|
Basic References | A shared data structure. Perl’s answer to pointers. |
Anonymous References | The building blocks of complex data structures |
Data::Dumper | A powerful debugging tool to examine variables |
Copying | How to safely copy a reference |
Slices | How to retrieve a subset of items from a reference |
Answers to exercises
Note that many exercises have multiple possible answers. We’ll show one way of arriving at a valid answer. Don’t worry too much if you’ve picked a different way, but make sure you understand why our answers work.
1. Create an array called @first and assign several values to it. Take a reference to that array and then dereference it into an array named @second. Print both arrays to ensure that you’ve copied it correctly.
use strict; use warnings; use Data::Dumper; my @first = 1 .. 5; my $aref = \@first; my @second = @$aref; print Dumper( \@first, \@second );
Note that in the example above, the .. operator binds more tightly than the = operator. This is one of the few cases where you can create a list without using parentheses.
2. Write the code to find the number of sales Jim made on Friday and the total of the sales he made on Friday. Assume each number is the total for an individual sale.
my $sales = {
monday => { jim => [ 2 ], mary => [ 1, 3, 7 ] },
tuesday => { jim => [ 3, 8 ], mary => [ 5, 5 ] },
wednesday => { jim => [ 7, 0 ], mary => [ 3 ] },
thursday => { jim => [ 4 ], mary => [ 5, 7, 2, 5, 2 ] },
friday => { jim => [ 1, 1, 5 ], mary => [ 2 ] },
};
my $friday = $sales->{friday}{jim};
my $num_sales = @$friday;
my $total = 0;
$total += $_ foreach @$friday;
print "Jim had $num_sales sales, for a total of $total dollars\n";The above code is fairly typical for Perl. Don’t worry if you wrote it a different way so long as you arrived at the same numbers. However, pay particular attention to how we calculated the sum of the sales. That technique is fairly common in Perl.
3. You want to print out the score for Jim and Mary, but the following code is wrong. What’s wrong with it? Show two ways of fixing it.
my $score_for = {
jim => 89,
mary => 73,
alice => 100,
bob => 83.
};
my ( $jim, $mary ) = @$score_for{ qw{jim mary} };
print "$jim $mary\n";
$jim = $score_for->{jim};
$mary = $score_for->{mary};
print "$jim $mary\n";This one may have been tricky. You may have noticed that the qw operator used curly braces, which are the same braces used with hash keys. However, this is not a bug. Perl is smart enough to know what you mean here.
The real problem was writing %$score_for{ ... }. Remember, when writing a slice, you use the @ symbol to show that you’re trying to get a list of variables. That’s one way of fixing the issue.
The other way is to forget about using a slice and assigning the variables individually. Many programmers find this solution cleaner.





Add a comment



Add a comment