====== Length of words in a list ======
Task: given a list of words, how many are there consisting of one, two, three... n characters? Additional information: characters belong to the Unicode set.
===== Everyday job: read in a file =====
First we ensure Perl is dealing with Unicode characters, then read in a file, turn the file into an array, chomp the array (i. e. remove the newline):
#!/usr/bin/perl -w
# cntstr.pl -- count characters in Unicode string
# usage: perl cntstr.pl filename
use strict;
use warnings;
use utf8;
binmode STDOUT, ":utf8";
my $filename = $ARGV[0];
open my $fh, "< :encoding(UTF-8)", $filename or die "open: $!";
# file into array:
my @str = <$fh>;
# chomp array:
chomp (@str);
===== Magic: sort words by length =====
We sort words by length to get the range (from shortest to longest); out of curiosity, we print also the shortest, next longest, and the longest word. Recipe [[http://stackoverflow.com/questions/13372784/sorting-by-length-in-perl|found on Stack Overflow]].
# sort by length (sort the list in the elements from the longest string length to the smallest length)
my @sorted = sort { length $a <=> length $b } @str;
print "Shortest word: ", $sorted[0], ", ", length($sorted[0]), "\n";
print "Last longest word: ", $sorted[scalar(@sorted) - 2], ", ", length($sorted[scalar(@sorted) - 2]), "\n";
print "Longest word: ", $sorted[scalar(@sorted) - 1], ", ", length($sorted[scalar(@sorted) - 1]), "\n";
===== Challenge: array of arrays =====
Once we know the range, we want to create a separate list for words with one character, a list for words with two characters, then three, four... all the way to 27. And then we want to count elements in each list.
In Perl, a list of lists is called [[http://www.perlhowto.com/array_of_arrays|array of arrays]]. Quite a challenge --- not so much to understand how it works; it was more difficult to follow how it is actually done.
# we want an array of arrays: 5s, 6s, 7s etc.
# initialize top array:
my @wordlengths = ();
# create top array, holding 27 lists:
foreach my $i ( 0 .. 26 ) {
# loop over what we got from the sorted list: its number of elements, its values:
foreach my $singleword (0.. scalar(@sorted) - 1) {
# test whether the given length fits in the actual category:
if (length($sorted[$singleword]) == $i + 1 ) {
# if so, push it into the current subarray;
# mind the curly brackets!
push @{ $wordlengths[$i] }, $sorted[$singleword];
}
}
}
===== Let's see what we have =====
Finally, we have to print something to see where we are and what we've got.
foreach my $b ( 0 .. 26 ) {
print "Number of words with ", ($b + 1), " characters: ", scalar(@{$wordlengths[$b]}), "\n";
print "First word with ", ($b + 1), " characters: ", $wordlengths[$b][0], "\n";
}
Think I'll dream about foreach loops. And accessing the array of arrays. Have to do it several more times to get used to it.
====== The original script ======
Here's what I've originally written, with Croatian variable names and messages. Ca. 35 lines of code.
#!/usr/bin/perl -w
# cntstr.pl -- count characters in Unicode string
use strict;
use warnings;
use utf8;
binmode STDOUT, ":utf8";
my $filename = $ARGV[0];
open my $fh, "< :encoding(UTF-8)", $filename or die "open: $!";
my @str = <$fh>;
# chomp array:
chomp (@str);
# sort by length (sort the list in the elements from the longest string length to the smallest length)
my @sorted = sort { length $a <=> length $b } @str;
print "Najkraća riječ: ", $sorted[0], ", ", length($sorted[0]), "\n";
print "Predzadnja najduža riječ: ", $sorted[scalar(@sorted) - 2], ", ", length($sorted[scalar(@sorted) - 2]), "\n";
print "Najduža riječ: ", $sorted[scalar(@sorted) - 1], ", ", length($sorted[scalar(@sorted) - 1]), "\n";
# here we should have an array of arrays: 5s, 6s, 7s etc.
# initialize top array
my @brojevi = ();
foreach my $i ( 0 .. 26 ) {
foreach my $duzina (0.. scalar(@sorted) - 1) {
if (length($sorted[$duzina]) == $i + 1 ) {
push @{ $brojevi[$i] }, $sorted[$duzina];
}
}
}
foreach my $b ( 0 .. 26 ) {
print "Broj riječi od ", ($b + 1), " slova: ", scalar(@{$brojevi[$b]}), "\n";
print "Prva riječ s ", ($b + 1), " slova: ", $brojevi[$b][0], "\n";
}