I have previously written about the Top 17 Search Innovations outside of Google. Clearly, Google is not going to take this onslaught lying down. As Alex Iskold wrote in an article on the Read/WriteWeb, these types of changes slowly make their way into the mainstream. Google has already introduced personalized search; it's only a matter of time before many or all of these features get included into the main Google search engine. [Naturally, I will be happy to help with suggestions !]
As more and more features get crammed in, mainstream search engines like Google and Yahoo! will face challenges from the Innovator's Dilemma - if not integrated properly, ongoing relentless addition of features can not only make the user interface cluttered and difficult to use, but can also degrade the architecture by making it horribly complex and difficult to change.
So the key question becomes: what would the overall architecture of Google (or any mainstream web search engine) look like, if it included most of these features? In this post, we will take a speculative look at a unifying architecture - a conceptual look at how a general-purpose search engine like Google or Yahoo! might set up their architecture so that these and other new features could be easily added while maintaining overall architecture coherency. This is a purely intellectual exercise - no doubt each of the major search engines will evolve their own strategy and architecture to deal with these issues.
As the above image shows [click to enlarge it], the overall architecture is split up into sections: the query interface, server components, the results interface, saved-search agents and support for alternative results platforms. Of course, not every search would use all of these features, but the search would optionally be routed through some of these engines as appropriate.
Let us take a quick look at the various sections:
1. Query Interface
One key change to the query interface in the future, is the likely addition of search parameters, which can use the magic of Ajax to appear automatically as needed. Parameters can be classified into two types: General parameters, such as freshness dates and content type, and Domain-specific parameters for vertical search queries.
2. Server components
In future, the simple "search box" on the Google front page could hide a variety of specialized search engines behind it:
- Pre-processing support: Personalization, Natural language processing, semantic analysis
- Algorithmic changes: Rich content search, social input (reputation-based), self-optimization
- Source restrictions: Restricting the scope of the search to trusted sources and/or to a specific vertical
- Post-processing support: Clustering, related tags, support for services
3. Results interface
Long term, the results interface should include support for enhanced types of results visualization, such as clustering and related tags, query refinement (using filters or suggestions), along with support for saving searches (user agents) and alternative results platforms - such as Mobile, RSS feeds, RIAs, Emails and Web Services.
Finally, a big win for the user would be support for Discovery: a process by which the search engine knows enough about you, as a user, to find content of interest specifically for you (articles, news, blog posts and so on) and notifies you about it, preferably using an RSS feed.
That concludes our look at a conceptual architecture for search. Is this simple, yet powerful? Or just simple-minded? Leave a comment or email me and let me know!