Word – A delimited string of characters as it appears in the text.
Term – A “normalized” word (case, morphology, spelling etc); an equivalence class of words.
ex: Same word can be present multiple times, need to consider it all times.
Token – An instance of a word or term occurring in a document.
ex: only time we need to consider how many times the word occurs.
Type – The same as a term in most cases: an equivalence class of tokens.
All the messages
below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.
No comments:
Post a Comment