Re: Corpora: code for random selection of concordance lines

From: Bruce L. Lambert, Ph.D. (lambertb@uic.edu)
Date: Thu Mar 21 2002 - 21:23:35 MET

  • Next message: Dan Melamed: "Corpora: job at NYU: parallel text processing"

    At 04:05 PM 3/21/2002 -0300, Tony Berber Sardinha wrote:
    >Dear list members
    >
    >I wonder if anyone has a bit of perl or java code (or unix utilities) for
    >drawing an x number of lines at random from a concordance?

    #!/bin/sh

    IFILE="$1"
    N="$2"

    gawk 'BEGIN {srand()} {print rand(),$0}' $IFILE | sort | gawk
    '{$1="";print}' | head -$N

    On a Unix system that has gawk: Copy this into a file called 'randomize'.
    At the prompt (~>) type:

    ~> chmod +x randomize

    then

    ~> randomize some_input_file N > some_output_file

    N is the number or lines desired in the output. If your system does not
    have gawk, you can download and install it or try awk (you'll need to
    change gawk to awk in the script).

    -bruce



    This archive was generated by hypermail 2b29 : Thu Mar 21 2002 - 21:25:57 MET