Re: an expert question!
[ DelphiLand FAQ ] [ Delphi Tutorials ]
Posted by webmaster Guido
on November 23, 2001 at 04:09:46:
In Reply to: an expert question! posted by CrAz 2001 on November 21, 2001 at 20:25:03:
: I need to program an application which reads from one or more external documents, named "dok1.txt","dok2.txt" etc. The program shall print out the words which are found in "dok1.txt","dok2.txt" etc. but if it finds words (one or more words) which are the same in all the documents it reads, it shall only refer to that word one time, but tell the user which document(s) they are to be found
: dok1.txt contains the text : hello world! Anybody.
: dok2.txt contains the text : Anybody can sing.
: dok2.txt contains the text : I can if you can.
: The program shall then print out something like this:
: hello : dok1
: world : dok1
: anybody : dok1,dok2
: can : dok2,dok3
: sing: dok2
: i : dok3
: if : dok3
: you : dok3
: Are there anybody out there who can reply with the complete source code on how this could be solved?
A possible solution, although not the "complete" code (I'd be doing your job in that case ;-) is as follows:
1. Store the names and locations of the documents in table 1, together with a unique ID-code for each document. You store 1 record per document.
2. Extract each word out of the first document and save it in table 2, together with the ID of doc 1, but ONLY if the word/ID combination is not yet in table 2.
So you have 1 record per unique word of doc 1.
3. Repeat step 2 for every document.
For your example, this would give:
Table 1 records:
document1 location1 d01
document2 location1 d02
document3 location1 d03
Table 2 records:
Note that table 2 is sorted alphabetically on the combination of the 2 "fields", as this makes it faster to find out if a word / doc-ID is already stored.
For millions of words, use a full-blown database such as dBase, Paradox, Access, Interbase...
For a few thousands to ten-thousands of words, even text-based tables will do fine, as nowadays PC's are fast and powerful enough to keep the sorted list of words in RAM-memory.
In the last case, why not do some tests with a listbox to keep the words-list sorted? Above a couple of thousands of words though, better keep them in a sorted "stringlist" and only show the results at the end of the operation.
Related Articles and Replies
[ DelphiLand FAQ ]