Re:Corpora: MS Word to text

mike_maxwell@sil.org
Fri, 03 Sep 1999 13:31:54 -0400

Several people responded to me asking me to post the Word macro for converting a
large number of MS Word files to text. The attached is a Word VBA (Visual
Basic, I guess) program that does that, assuming you have Word97. The version
of WordBasic included in earlier versions of Word was quite different, and this
program would not work there. So far as I know, Word2000's VBA is similar to
Word97's, so this should work in Word2000--but no guarantees! If the Word
documents were created in an earlier version of Word, I *think* they'll import
into Word97 OK (There were bugs with import when Word97 first came out, but I
think those were fixed in a service release of Word97.)

The attached program reads in all the Word files in a given directory, so it
bypasses the DOS for-loop I thought would be required in my earlier msg.
However, the directory names and the file extension are coded in (see the
variables near the top of the file). You'll need to change at least the
directory names before running the program (the file extension will probably be
correct). Be sure *not* to use the same dir names for both FromDir and ToDir!
It tries to save you from this and other possible errors, but BACKUP YOUR FILES
FIRST!!!

To run, save the text below my signature to a file (preferably one with the .BAS
suffix, e.g. CONVERSIONS.BAS). Then launch Word, and choose the Tools | Macro |
Visual Basic Editor menu choice. From the Visual Basic editor, do File | Import
File, and open the BAS file you just saved. In the left-hand pane of the VB
editor, you should see a folder labeled "Modules." If it isn't already open,
double-click on it, and you should see some kind of icon (don't ask me what it
is) called "Conversions". Double-click on that to open the conversion macro for
editing. Assign appropriate values to the variables FromDir and ToDir (and
WordSuffix, if necessary), as per the previous paragraph, then save. Now go
back to Word, and click on the Tools | Macro | Macros menu choice. You should
see a macro named 'ConvertAllWordFilesToText'. Highlight that, then click on
the 'Run' button in the right-hand side of this dialog box to run it. Go home
for the night, and with any luck when you come back tomorrow, you'll find the
converted files in your ToDir. Count them to make sure they all got converted.

Oh, one other thing. Email may wrap some of the lines below. If that happens,
you'll see some lines that start all the way over at the left-hand margin
(besides the 'Attribute', 'Sub', and 'End Sub' lines). Either join those lines
back with the line before them in the VB editor, or put a space + a '_' char
(underscore) at the end of the previous line. (That marks the continuation. I
hate Basic...)

Mike Maxwell
Mike_Maxwell@sil.org
Summer Institute of Linguistics

--------------------BAS FILE------------------------------
Attribute VB_Name = "Conversions"

Sub ConvertAllWordFilesToText()
'By Mike Maxwell (Mike_Maxwell@sil.org) 3 Sept 99
'Use at your own risk! Values of the following variables may need to be
changed:
FromDir = "C:\Foo\" 'Replace the string in quotes with the directory
containing your Word files
ToDir = "C:\Bar\" 'Replace the string in quotes with the directory
where you want the text files to go
WordSuffix = "DOC" 'Suffix of filenames, as saved by Word (default is
"DOC")

If Not (Right(FromDir, 1) = "\") Then
'Ensure we use a directory, rather than a filename
FromDir = FromDir + "\"
End If
If Not (Right(ToDir, 1) = "\") Then
ToDir = ToDir + "\"
End If

If StrComp(LCase(FromDir), LCase(ToDir)) = 0 Then
MsgBox "FromDir and ToDir must be different; please fix. Aborting."
GoTo Done
End If

If Dir(FromDir, vbDirectory) = "" Then
MsgBox "Directory " + FromDir + " does not exist. Aborting."
GoTo Done
End If
If Not Dir(ToDir) Then
MkDir (ToDir)
End If

'Everything seems to be in order, so run through the Word files:
Set FoundFiles = Application.FileSearch
With FoundFiles
.LookIn = FromDir
.FileName = "*." + WordSuffix
If .Execute > 0 Then
For FileNum = 1 To .FoundFiles.Count
Documents.Open FileName:=.FoundFiles(FileNum)
AFileName = ActiveDocument.Name
ActiveDocument.SaveAs FileName:=ToDir + AFileName,
FileFormat:=wdFormatText
ActiveDocument.Close
Next FileNum
Else
MsgBox "No Word files were found in the directory " + FromDir + ";
change directory name in program?"
End If
End With
Done:
End Sub