Corpora: kwic concordances with Perl

Christer Geisler (christer.geisler@engelska.uu.se)
Thu, 07 Oct 1999 14:20:21 +0000

The Perl script below (adapted from Dan Malamed's 2kwic.pl) will produce
kwic concordances on a match, but
a) will not detect multiple occurrencences on a line,
b) nor find complex patterns across several lines.

Can someone suggest other ways of writing simple kwic programs in Perl?
Should I split into an array, use Perl's format, etc?

All suggestions are welcome.

Christer Geisler, post-graduate fellow

#----------------------------------------------------
#!/usr/bin/perl
#
#check for correct usage
if ($#ARGV < 2) {
print "usage: kwic <filename1> \t\t<string1>
\t\t<max. total output width>\n";
exit;
};
open(F, $ARGV[0]) || die "Couldn't open $ARGV[0]: $!\n";
shift;
$str1 = shift(@ARGV);
$maxwid = shift;
$len1 = length($str1);
$halfcon1 = int(($maxwid - $len1) / 2);
$pad1 = " " x $halfcon1;
while (<F>) {
$F = $_;
$p1pos = index($F, $str1);
if ($p1pos >= 0) {
chop($F);
$out = substr($pad1 . $F . $pad1,
$p1pos, $maxwid);
print "$.\n";
print "$out\n";
};
};
#-----------------------------------------------------

----------------------------------------------------
Christer Geisler tel. +46 (0)18-471 1268
Department of English fax. +46 (0)18 471 1229
Uppsala University
Box 527, SE-75120 Uppsala, Sweden
----------------------------------------------------