Re: Corpora: code for random selection of concordance lines

From: Alexander Clark (asc@aclark.demon.co.uk)
Date: Fri Mar 22 2002 - 09:43:53 MET

  • Next message: Eric Atwell: "Corpora: LEEDS PhD Studentship MT + Knowledge Management"

    Rosie Jones wrote:
    >
    > On Thu, 21 Mar 2002, Tony Berber Sardinha wrote:
    > > I wonder if anyone has a bit of perl or java code (or unix utilities)
    > > for drawing an x number of lines at random from a concordance?
    > [...]
    >
    >

    An alternative is to use the Fisher-Yates algorithm to shuffle the whole
    file (linear in the number of lines)
    and then take the head. This is more efficient in time if it fits in
    memory.

    shuffle.pl < file | head -n

    #!/usr/bin/perl -w
    # shuffle the lines at random
    # Using Fisher-Yates algorithm

    use strict;
    
    @lines = (<>);
    for ($i = @lines; --$i;){
        $j = int rand($i+1);
        ($lines[$i], $lines[$j]) = ($lines[$j], $lines[$i]);
    }
    print @lines;

    -- 
    Alexander Clark
    asc@aclark.demon.co.uk 
    http://www.issco.unige.ch/staff/clark/index.html
    ISSCO/ETI, University of Geneva
    



    This archive was generated by hypermail 2b29 : Fri Mar 22 2002 - 10:44:16 MET