|
|
Gunter Gerdenitsch's ONLINE COMMUNICATION
Electronic Reading - Let The Computer Sort Out The Pearls!
February 19, 2003
Reading from the Internet is like drinking water from a fire hose. If you don't know how to do it, in the end you are wet all over. But you are still thirsty!" - Electronic Reading is a good way to master that problem.
This adage is true in particular for those people who have to do a lot of reading *professionally*. E.g. for ezine publishers, editors, resume distributors, etc. is just a part of their business to select among numerous texts, to categorize them and to treat them accordingly.
Notice the word "select" above. I didn't say "read" - for a reason: In order to select or categorize a text you are not bound to READ it. You don't even have to look at it. Let the computer "e-read" it! If the outcome indicates that it is worth reading yourself, you still can do that.
PREPARE A LIST OF KEYWORDS
All you have to do for e-reading is to prepare a list of keywords that are likely to appear in those texts interesting to you. If you want to make your e-reading more selectively, you can assign different weights to your keywords. Keywords that might signify a highly relevant text are assigned a higher "weight". In e-reading the computer compares the text, word for word, with each of the keywords. If they are matching, it's a hit . Weights of all "hits" are added up. Because of e-reading, the total weight of each text is displayed.
However, the relevance of a text cannot be gauged by a linear scale. One kind of text might constantly be interesting to you. Another kind of text might be interesting only if the author addressed some points specifically, while scarcely mentioning the others. In the end it might well be that a text comes out at a fairly low "total weight" but you might miss the boat when ignoring it.
Tat's aggravated by the fact that some authors seem to take it as a challenge to their creativity, to come up with the most fanciful titles. To the reader it might be not very meaningful or even misleading. What these authors are aiming at is a sort of surprise when the readers realize that there was some second-meaning hidden in the title. In fine literature this is a good rhetorical trick. In technical texts this is rather irritating, though. Not to mention those texts that cannot have any meaningful title at all, such as resumes.
CATEGORIZATION
It's always a good idea not to solely rely on the "total weight" resulting from an e-reading, but to have a look on the title (if there is one) and perhaps the first few lines. Also the context in which the keywords were encountered might give you an idea of the contents.
Therefore, an e-reading tool like eRead displays not only the text statistics but also the KWIC list - that's "keyword in context". Each line in which one or more keywords were found is displayed, along with the sum of weights of the line. For the user, a quick look into the KWIC-list can be quite telling what that text is about.
Still, that's not the last resort. If you have to select among hundreds of texts per day, you will not be very happy with that solution.
So, categorization was introduced. To keep it user-friendly, keywords are lumped together according to which heading they are written under. In the text statistics resulting from e-reading, the category with the most "hits" is displayed along with the percentage of total hits. At a number of some 10 categories if a category had some 30 to 40+ percents of total hits, a closer look into that text could be justified. But if the suggested category has less than some 30 per cent of hits, the text might rather be wish-wash. (At more categories the respective numbers are smaller. Then, however, it is doubtful if that categorization is still meaningful.)
TEXT STATISTICS
As a general remark, remember that all the following rules are true only for a keyword list that is "perfectly" tailored to a type of text. Which, of course, is never fulfilled in practice. You might have a text you feel to be good for its subject. But after e-reading it, the text statistics are not so brilliant. Then you should think the other way round: "What could be done to strengthen the selectivity of my keyword list?"
But let's assume you know, your keyword list is good:
There are a number of other text statistics helpful to the user. 'Lines' is the total number of lines in a text. 'Hits' is the total number of keywords matching a text word.
An obvious text statistic is hits/line . A text might be quite long, so its 'total weight' would be unexpectedly high. In such a case a savvy user would intuitively have a look at the 'hits/line'. If that number is far below 1, the seemingly high 'total weight' came about simply by amassing a hit now and then. For very short articles, this number can even offset a seemingly low 'total weight'.
hits/category is another useful text statistic. For all categories in your keyword file, the number of hits is displayed. This seems to be unnecessary, if you get a 'suggested category' with a high percentage, i.e. it is clearly outstanding among categories. But if you have a number of categories with fairly similar percentages - it is finally statistical incidence, which one of those "prominent" categories was selected to become 'suggested category'. Quite a different category might have had only one or two fewer hits.
So, a proficient user will also have a quick look on the line 'hits/category'. If the percentage of the 'suggested category' seems low, you should consider to lump together several of your categories, because their separation seems to be not very meaningful. If that percentage is significantly high, it could mean two things:
- You have too few categories. You should try to arrive at a more meaningful categorization.
- he article itself has not very many hits (see "hits" number). So the seemingly high level came about simply statistically: one more hit in any category meant a great jump in its percentage. This in turn could mean either that the article is not very relevant to you. Or the other way round: you choose the wrong categorization. Your best bet will be to try it on some other texts of your collection. (Did you use the right keyword file? In a tool like 'eRead' you can have a number of keyword files right at hand, for different purposes. To select one of them, it's just one click in a combo box.)
Keywords: electronic reading, e-reading, keywords, text statistics Word count: 1143 You might be interested in how this article is scoring in eRead:
total weight: 1520, lines: 137, hits: 211, hits/line: 1.5, weight/hit: 7.2
hits/category: 14/1, 150/2, 1/3, 1/4, 0/5, 0/6, 0/7, 0/8, 6/9, 20/10, 5/11, 14/12, 0/13
suggested category: (2) Reading, Writing [71%]
Gunter Gerdenitsch is an international IT-specialist with focus on Communication. IT service providers - looking for a freelancer for peaks in your workload, want to get your ideas across? Then you should visit http://www.ITspecial.org or mail to gg@ITspecial.org
Copyright © 2003, Gunter Gerdenitsch, All Rights Reserved.
Please share your thoughts and comments regarding this feature. You can do so by posting to our Hot Topics Forum.