[This article was originally posted as part of the Rising Star Dream Team Future of Search series on the VortexDNA blog. I'm deeply grateful to Kaila Colbin, VortexDNA's resident blogger, for the opportunity to participate!]
I've been writing about Search technologies for a while now, so when Kaila Colbin of VortexDNA offered me the chance to participate in a future-of-search marathon, I jumped at the chance!
What will the search for information look like in the future - in five years, ten, twenty? Is it just more of the same, or will it look radically different?
Before looking to the future, let us first look at how far we have come. Danny Sullivan has a great post looking at a decade of search history and the various tribulations of past and present search engines - AltaVista, Ask Jeeves, Microsoft, Yahoo! and of course, the early Google. We owe a huge debt of gratitude to the tremendous contributions of these and other early pioneers of Search; Google, in particular, deserves a great deal of the credit for making web search ubiquitous outside the tech community. Indeed, "to Google" as a verb has become virtually synonymous with the idea of Web Search, much as the Xerox brand became synonymous with the idea of the photocopier in a bygone era.
Google's venerable PageRank algorithm is certainly best-of-breed for the present, and Google keeps tweaking its results continuously. Given this progress in the Search area, can we still expect to see major improvements in search in the forseeable future? As an analogy, consider the DC-3 airplane - the first truly modern airliner, it was powerful, safe, reliable and economical (indeed, some of these are still flying today). It revolutionized air travel, and with its introduction, many considered the aviation age to have arrived for the general public. And yet, early jet aircraft had already appeared on the horizon, so to speak; within a decade, this reliable workhorse was obsolete, overtaken by jet aircraft in the competition for public air travel.
It could easily be the same with search. The key question that a search engine addresses is: what results do the maximum number of users find most useful for a given search query? PageRank is simply an approximation of the Wisdom of Crowds to answer this question. Is there a richer abstraction? Is Engagement the new black? Whatever the new approach is - in order to provide accurate results, it must work as implicitly as possible.
We have only to envision the possibilities ...
So let us take a speculative look at search, circa 2015 . To look at it systematically, we can separate the search engine into the following components from a user perspective: (To see this breakdown in visual format, check out my earlier post on an abstract architecture for search )
- Query specification
- Base Index
- Relevance Algorithm
- Results Visualization
- Ongoing Interest
Let us consider the possible future directions for each one in turn.
Query Specification (Input)
Google pioneered the keyword-centric, minimalist approach for specifying the search query, and all the major search engines follow that lead. But the search criteria could be so much richer ; instead of experimenting with different types of keyword searches to find the information they need, users could simply provide additional criteria up front to qualify their request.
Admittedly, this approach does not work for everyone. The casual user would get reasonable defaults, which would automatically get updated with regular use to their favorite values; the topical researcher, on the other hand, would actively tinker with these widgets in a "power user" mode. (Google already supports this type of functionality in a limited fashion.)
Some possible advanced features for specifying the query, are given below:
- Content Spec: Enabling the user to dynamically specify the data sources to be included, based on domain, reputation, social network, and so on
- Scope: Input for seamlessly limiting the scope of the search, to Enterprise or personal data
- Qualifiers: Allow the user to add more information to disambiguate result matches, e.g. qualifying if "Java" means the programming language or the island
- Parameter ranges: Domain-specific parameters can be extremely valuable even to a general-purpose engine (see #5 in the section on Relevance Algorithms below)
- UI paradigms: Text keywords are a limited form of input. The actual input mechanisms could be more visual, in the form of sliders, buttons, fields and other UI widgets. Imagine, for example, that as you move a slider, the search results change or an increasing number of results appear on the page!
- Multiple Profiles: Personalization does not always have to be implicit. A user could explicitly set up profiles to represent different interests - professional, hobby, personal and so on, so that switching the profile would quickly change the areas of interest
Base Index (Content)
This is a core area of concern for search engines: what is the scope of content to be considered when searching for information?
The standard approach currently is to build web crawlers that continuously scan as many web sites and web pages as possible; the scanned content is then used to build a master content index that is then updated regularly. This index is then used as the basis when searching for information.
For the base index, the big changes in the future are likely to involve both the scope and understanding of the content; here is a short list:
- Rich media search, e.g. true indexing of audio and video content
- Dynamic content search (searching the invisible web )
- Integration of personal, web and corporate information
- Perspective-based search, e.g. conservative vs liberal, hard news vs opinion, and so on
- Subset creation, on-the-fly, e.g. to search for domain-specific data
Relevance Algorithm (Mechanism)
This is, of course, the most-debated topic when discussing the future of search engines. Clearly, many different approaches and technologies show promise; some of these are noted below:
- Personalization (but without storing personal info )
- Social Input / Wisdom-of-Crowds (which has its pitfalls )
- Social Graph: where your selected network of people help improve search results (Robert Scoble has recently gotten religion about this concept; Danny Sullivan rebuts )
- Semantic Processing: of both, the query AND the content (will this let the Search Engine find answers that we never knew we had?)
- Parametric Search: Vertical search engines already routinely offer domain-specific parametric search; for example, job search engines allow the user to specify the all-important location of the job as a primary criterion. Can this type of feature be generalized, so that as a user drills down deeper into search results, an increasing number of parameters can be offered?
- Human-powered Search, for either the short head or the long tail of search
- Swarm Intelligence: Mimicking biological search, such as Ant colony optimization, particle swarm optimization, and so on
Results Visualization (Output)
Again, Google leads the way with its minimalist approach: simple headings, links and snippets of text. This is slowly changing, with the new "Universal Search" approach from Microsoft, Google and others; Ask.com is a leader in this area.
Search engines of the future will likely implement completely new paradigms for users to navigate and view search results. Often, meta-results - representing information about the results - are as important as the results themselves: users can figure out where a given result fits into the overall universe of results, and find the related results to an item of interest that has been found.
Some possibilities for results display are given below:
- Tag clusters is not a new concept, but has yet to gain traction among the majors. Quintura, with its dynamic tag cloud display, has one of the best examples.
- Organize results information by content type, is something every search engine will have to think about in the future. For example, should news stories be presented in an "overview capsule" fashion, or organized as a timeline-based view? Dale Dougherty at O'Reilly Radar has a brilliant article on this topic: Journalism is burning.
- Follow-up actions - on viewing search engine results, a very common user action (as Greg Linden points out ) is to modify the current query, either to drill-down further or to try a different approach to find the required information. Google's "did you mean ..." feature is a step in this direction (although it leaves much to be desired ).
- Domain-specific visualization can significantly enhance the understanding of results. This is similar to the data organization point above, but focused on the display itself; results from different vertical domains may require very different visualization techniques, such as colors, graphs, images, trend lines, heat maps, topographic charts, and so on. [For a list of the more exotic variations, check out this amazing list from Smashing Magazine.]
- Dynamic scoping - enabling users to widen or narrow search results, based on different criteria - such as geography (local or global), site authority, timeliness, point of view, domain, and so on - is a powerful feature, that will continue to grow in importance.
Ongoing Interest (Notification)
This can be best explained as a Reverse search, where it is the content that finds the user - thus turning the concept of search on its head.
Most of us have ongoing interests in certain areas; they could be professional, social or personal. It makes a great deal of sense for the search engine to keep track of these interests and pro-actively notify the user at some periodic interval of new items that fit those interests. Google Alerts is an early example in this direction. But enhancements to its functionality in the future could significantly boost its utility.
Some day, search engine notifications could support the following features:
- Diverse Mediums: Many search engines already support email notifications. What's to stop them from adding support for many additional delivery mechanisms, such as IM, SMS, widgets, the twitter API, and so on?
- Levels of Detail: Allowing users to set the scope and organization of information presented.
- Prioritization: This is a key feature! Once users are able to set priorities for different types of searches and for different areas, this can be used to drive the other features. For example, send me the headline about a breaking news event directly relevant to my blog, as an instant message, but email me a digest of the day's results for baseball scores.
- Schedules: Some search results make sense only at certain times of the day; e.g. traffic search results are only relevant at commute times on work days.
- Dynamic Control: Finally, empowering users to assert dynamic, granular control over their search alerts would make this functionality truly powerful. For example, once I've been notified about a breaking news story, I might want to artificially boost its priority and delivery method to continually get updates quickly and efficiently.
Power and Responsibility
As search engines start including a few or many of the features described above, search will grow increasingly more powerful. It will get easier to find any information we want, quickly and easily. Whether the information is high-level or detailed, global or local, general or specific, past or present, in any domain - no nugget of human knowledge shall escape this relentless spotlight.
Is shining a light on the darkest corners of the web always a good thing? As a webbed superhero once told me (and a few billion others) - "With great power comes great responsibility!". Privacy advocates are rightly concerned with the growing power of global web search engines; ongoing efforts from official and community channels are essential in minimizing abuse. A related issue is that web content can be archived and searched in perpetuity - the societal effects of this phenomenon have not yet begun to be understood. A recent New York Times column highlighted this issue (paid content; here's a perspective on it from Slate magazine ).
Clearly, search engines will continue to evolve, and a future engine might well have many of the improvements described above, in the next ten years. But how about even further out - say, 2020 or 2030? Will disruptive changes in networking, computing and information technologies radically change the way search engines operate? A change in the nature of human thinking, interaction and social customs would be even more dramatic, and could cause a change in the nature of search itself.
This is, of course, a fertile area of speculation more in the realm of Science Fiction (for now): for example, will we one day need a galactic search engine? Can we create microscopic information-matching agents, either biological or atomic? Results that suddenly become available to the user as knowledge in the brain? An "implicit" search engine that finds information as we need it? Why not?