Search engine
1. Crawler Based
2. Directory
3. Metasearch
Search Types
1. General Search / Horizontal Search : ex : google --> results are very broad and results might not be relevant sometimes.
2. Vertical search - very specific search or specific part of internet ; ex: google images / Amazon product search.
Web challenges for IR
---------------------
1.Distributed Data: Documents spread over millions of different web servers.
2.Volatile Data: Many documents change or disappear rapidly (e.g. dead links).
3.Large Volume: Billions of separate documents.
4.Unstructured and Redundant Data: No uniform structure, HTML errors, up to 30% (near) duplicate documents.
5.Quality of Data: No editorial control, false information, poor quality writing, typos, etc.
6.Heterogeneous Data: Multiple media types (images, video, VRML), languages, character sets, etc.
Modeling the Web
----------------
Heaps’ and Zipf’s laws are also valid in the Web.
»In particular, the vocabulary grows faster (larger) and the word
distribution should be more biased (larger)
Heaps’ Law
» An empirical rule which describes the vocabulary growth as a function of the text size.
» It establishes that a text of n words has a vocabulary of size O(nš½) for
0< š½ <1
Zipf’s Law
» An empirical rule that describes the frequency of the text words.
» It states that the i-th most frequent word appears as many times as
the most frequent one divided by i š½, for some š½ >1
Different types of queries
1. Informational queries : learn about something - 40%
2. Navigational queries : take to a page - 25%
3. Transactional queries : want to do something - 35%
Essential Characteristics for user-friendliness of a website
1.Mobile Compatibility
2.Accessible to All Users
3.Well Planned Information Architecture
4.Well-Formatted Content That Is Easy to Scan
5.Fast Load Times
6.Browser Consistency
7.Effective Navigation
8.Good Error Handling
9.Contrasting Color Scheme
10.Usable forms
Centralized Architecture - Crawler-Indexer Architecture
important components
1.Crawler / spider
2.Indexer
3.Query Engine
Indexing process
1. text acquisition
2. text transformation
3. Index creation
1.User Interaction
2.Ranking
3.Evaluations
Distributed Architecture -
Harvest
Gathers and Brokers
User Interface
query interface
Answer interface
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.