In a previous post, I asked the question: Where are the Meaning-Enabled Authoring Tools?, arguing that publishers who regularly post similar content (especially content that conforms to common formats) would get a big advantage from using Semantic Authoring tools for creating new content. By using semantic tools, not only can you get SEO benefits and improve findability , the content can more easily be re-purposed for other uses such as web applications and services.
This is essentially a bottom-up approach to the semantic web: adding semantic notation to the content itself. However, as the post went on to say, the prevailing view is definitely a top-down one, viz. that semantic meaning will have to be extracted by applications from perfectly ordinary web pages, and that the adding of semantic knowledge to the content itself is unlikely (aside from very limited contexts, such as Microformats).
Two recent podcasts with two of the leading voices in this space further confirm this view.
In a conversation with Paul Miller of ZDNet's Semantic Web blog, Sir Tim Berners-Lee said that this idea - that the Semantic Web involves users marking up web pages with semantic information - is only a minor part of it; the data will mostly come from databases, or will be scraped from HTML. (Earlier coverage.)
Recently, my friend Aaron Strout asked the same question to Tim O'Reilly during one of his We Are Smarter podcasts (time index: 19:36, if you want to check it out), who said essentially the same thing: although useful in certain contexts, semantic markup in the content is unlikely, even by publishers, and a top-down approach of extracting meaning out of regular documents is likely to prevail.
So there you have it folks! If you've been holding your breath waiting for semantic markup tools to appear, you can go home now. And if you're working on an application to extract meaning - such as AdaptiveBlue or twine - you're on the right track!