Re: an expert question!

[ DelphiLand FAQ ] [ Delphi Tutorials ]

Posted by webmaster Guido on November 23, 2001 at 04:09:46:

In Reply to: an expert question! posted by CrAz 2001 on November 21, 2001 at 20:25:03:

: I need to program an application which reads from one or more external documents, named "dok1.txt","dok2.txt" etc. The program shall print out the words which are found in "dok1.txt","dok2.txt" etc. but if it finds words (one or more words) which are the same in all the documents it reads, it shall only refer to that word one time, but tell the user which document(s) they are to be found

: Example:

: dok1.txt contains the text : hello world! Anybody.

: dok2.txt contains the text : Anybody can sing.

: dok2.txt contains the text : I can if you can.

: The program shall then print out something like this:

: hello : dok1
: world : dok1
: anybody : dok1,dok2
: can : dok2,dok3
: sing: dok2
: i : dok3
: if : dok3
: you : dok3

: Are there anybody out there who can reply with the complete source code on how this could be solved?


A possible solution, although not the "complete" code (I'd be doing your job in that case ;-) is as follows:

1. Store the names and locations of the documents in table 1, together with a unique ID-code for each document. You store 1 record per document.

2. Extract each word out of the first document and save it in table 2, together with the ID of doc 1, but ONLY if the word/ID combination is not yet in table 2.
So you have 1 record per unique word of doc 1.

3. Repeat step 2 for every document.

For your example, this would give:

Table 1 records:
document1 location1 d01
document2 location1 d02
document3 location1 d03

Table 2 records:
anybody d01
anybody d02
can d01
can d02
hello d01
i d03
if d03
sing d02
you d03
world d01

Note that table 2 is sorted alphabetically on the combination of the 2 "fields", as this makes it faster to find out if a word / doc-ID is already stored.

For millions of words, use a full-blown database such as dBase, Paradox, Access, Interbase...
For a few thousands to ten-thousands of words, even text-based tables will do fine, as nowadays PC's are fast and powerful enough to keep the sorted list of words in RAM-memory.

In the last case, why not do some tests with a listbox to keep the words-list sorted? Above a couple of thousands of words though, better keep them in a sorted "stringlist" and only show the results at the end of the operation.

Related Articles and Replies

[ DelphiLand FAQ ]