Sunday, December 19, 2010

Learn Perl in 10 lessons Lesson 4

String comparisons
The =~ operator

Perl provides an operator which you'll find very useful to parseand search files: "=~". If you are not familiar with thisoperator, think of it as a "contains" operator. Forinstance:

"helloworld" =~ "world" returns true, as "helloworld" contains "world".
"helloworld" =~ "o worl" also returns true since "oworl" is included in the string "hello world".
"helloworld" =~ "wrld" returns false because thestring "hello world" does not contain "wrld".

Using the =~ operator you can easily testif a variable contains a particular string, and this will help you alot while parsing text files. You can also use regular expressions inconjonction with the =~ operator. Although it is too early at thisstage to study regular expressions in details, here are sometechniques that you can use with =~.

We replace the doublequotes by forward slashes in order to tell our =~ operator that we'renot simply looking for a string anymore but for a matching pattern(with a bit of logic inside it):

"helloworld" =~ "world" is the same as "helloworld" =~ /world/

Although"world" represents a string and /world/ represents anexpression, these two instructions return true. By adding logic tothe expression, we can refine the meaning of our =~ operator.
=~/^Starts with/

A leading ^ sign changes the meaning of the operator from"contains" to "starts with":

"helloworld" =~ /world/ returns true because "hello world"contains "world".
"helloworld" =~ /^world/ returns false since "hello world"doesn't start with "world".
"helloworld" =~ /^hell/returns true because "hello world" starts with"hell".
=~ /Ends with$/

By adding a $ sign in the end of the expression you can change themeaning of the operator from "contains" to "endswith":

"hello world"=~ /world/ returns true because "hello world"contains "world".
"helloworld" =~ /world$/ also returns true, but this time it'sbecause "hello world" ends with "world".
"helloworld" =~ /hello$/ returns false, because "helloworld" doesn't end with "hello".
The eq and ne operators

You can use both the ^ and $ signsin the same expression, and it would mean that you're looking for astring with which your variable would both starts and end. Forinstance:

"hello world"=~ /^hello world$/ returns true because "hello world"starts and ends with "hello world".
"helloworld" =~ /^hello$/ returns false, because although"hello world" starts with "hello" it doesn't endwith it..

Note that there is no much point using both ^ and $in the same expression. If you're string starts and ends withsomething it is likely to be equal to that something... if you wantto test the equality of two strings, you can simply use the eqoperator:

"hello world"eq "hello world" returns true because the twostrings are identical.

The ne operator tests the non-equalityof two strings. It returns true if the strings are different andfalse otherwise:

"helloworld" ne "good night" returns true.

"helloworld" ne "Hello worlD" returns true (remember that Perlis case-sensitive)

"helloworld" ne "hello world" returns false because bothstrings are the same.

Remember to use the eq and ne operatorsto test the equality of strings in Perl, and their equivalence == and!= to test numerical values.
The !~ operator

The !~ operator is used as a �does not contain� operator. What!= is to ==, ne is to eq and !~ is to =~. For instance:

"hello world" !~ "world"returns false because �hello world� does contain �world�.

"hello world" !~ "wwt"returns true because �hello world� does not contain �wwt�.
Case insensitive search

When you use the =~ operator you test the matching of a stringwithin another, this is always case sensitive. For instance:

"hello world" =~ "world"returns true.

"hello world" =~ "woRld"returns false.

If you want to make the =~ operator insensitive, add an �i�after the expression:

"hello world" =~ /world/ireturns true.

"hello world" =~ /woRld/ialso returns true.

The =~ operator can also be used to find occurrences of a stringwithin a variable and substitute them with another string. Forinstance, if you have a variable which contains text, and you want tochange all occurrences of �aaa� with �aab� within that text,you can simply use the following substitution:

$variable =~ s/aaa/aab/;

All occurrences of �aaa� within $variable will then be changedto �aab�. Note that we prefixed our expression with an �s� tochange the meaning of the operator from �contains� to�substitute�.
Parsing files

There are many ways to parse a text file. In Perl, if the file hasits data organized line by line with delimiters, it is very easy toparse it.

Let's study a simple example. We have a set of employees in a filecalled employees.txt. In this file, each line represents an employee.The information relative to each employee is delimited with tabs, thefirst column is the name of the employee, the second column indicateshis department and the third one his salary. Here is an overview ofthe file:

Mr John Doe R&D 21000
Miss Gloria Dunne HR 23000
Mr Jack Stevens HR 45000
Mrs Julie Fay R&D 30000
Mr Patrick Reed R&D 33000

In order to obtain some statistics, the HR department wants toestablish a list of all male employees who work in the R&Ddepartment and which salary is more than 25000.

To obtain this list, we design a simple Perl script, which:


opens the employees.txt file

loops through each line

identifies the name, department and salary of the employee

ignores and goes to the next line if the employee is female (the name does not start with Mr)

ignores and goes to the next line if the salary is less or equal to 25000.

ignores and goes to the next line if the department is not �R&D�.

prints the name and the salary of the employee on the screen.

To do this, we'll introduce two Perl functions:


�chomp� is used to remove the carriage return found in the end of the line. For instance chomp $variable removes all carriage returns in the variable.

�split� is used to cut the line in different parts where it finds a delimiter. For instance split /o/, �hello world� returns an array containing �hell�, � w� and �rld�. In our example we'll split the lines with the tab delimiter, which in Perl is written �\t�.

Here is the script which establishes the list of male employeesfrom the R&D department with a salary greater than 25000. To makethings a bit clearer, comments were introduced within the scripts(comments in Perl start with a # sign):

#open the employeesfile
open (EMPLOYEES,"employees.txt");

#for each line
while ($line =) {

#remove thecarriage return
chomp $line;

#split the linebetween tabs
#and get thedifferent elements
($name,$department, $salary) = split /\t/, $line;

#go to the nextline unless the name starts with "Mr "
next unless$name =~ /^Mr /;

#go to the nextline unless the salary is more than 25000.
next unless$salary > 25000;

#go to the nextline unless the department is R&D.
next unless$department eq "R&D";

#since allemployees here are male,
#remove theparticle in front of their name
$name =~ s/Mr//;

close (EMPLOYEES);

Study the script carefully and makesure you understand every part of it. Each instruction was eitherexplained in this lesson or in one of the previous ones. If you haveany question, do not hesitate to ask.

In the next lesson we'll look at how tointeract with the filesystem and the Linux operating system from ourPerl scripts.

No comments:

Post a Comment