Chad Osgood's Blog

Chad Osgood's Old, Expired Blog

Content Pipelines

On the flight from Athens to Madrid this last week I had an idea that I'd like to float in order to see what other people think.

The weblog infrastructure that I am (still, due to little free time) building, has its own aggregation system that flows aggregated content though a pipeline until it's pushed into the storage system. So, what the system does is to pull content from RSS feeds, from Exchange public folders, websites and others sources (the "feed readers" are pluggable), maps everything into a common representation and flows articles through the pipeline. The stages in the pipeline can look at the content and make adjustments (fix up HTML), do filtering (assign categories) and, most importantly, can enrich the content with metadata. So, when the system is pulling information from an RSS source, it may invoke a stage that runs all the words in the article against a dictionary and add links to a site like dictionary.com, it may invoke a stage that find relevant books on amazon.com or a stage to get Google links or even a stage that translates the original Russian text into German for me, and add all that additional information to the "extended metadata" of the article, etc.  Everything is pluggable.

Here's the idea: I really don't want to write the Amazon, Google, Dictionary and Babelfish stages, myself. What I rather want to do is to enlist those sites as web services into my pipeline. Using one-way messaging and WS-Routing I could say "here's an article, add your metadata to it and send it back me or the next pipeline stage here at my system or elsewhere when you're done". Or I could just walk up to an RSS provider and say, "don't reply to be directly, please route back to me these stages".

So, if such a distributed infrastructure existed, and you'd aggregate this entry "backrouted" through a pipeline of filters provided by Weather.com, Google.com, Dictionary.com and Amazon.com, you'd have the weather for Athens and Madrid, all relevant Google links and books on "content" and/or "pipelines" and WS-Routing, and links to explanations of all non-trivial words in this text. How's that?

[Clemens Vasters]

I like it.  In a previous musing on prevalence systems I noted one of the architectural benefits of Site Server Commerce 3.0: pipelines.  It was similar to how you describe the content pipelines: one would pass the order (aggregated content in your example) through a series of components that implemented the IPipelineComponent COM interface.  Each component had it's own opportunity to annotate the content of the order, which in my case meant applying volatile business rules or integrating with third-parties.  For example, in order to provide "real time" shipping calculation facilities one had to write a custom interface to UPS/FedEx (pre-web service days).  I implemented this by creating a COM component that communicated with UPS's/FedEx's server and implemented the IPipelineComponent interface.  I could then insert it into the pipeline which would allow the component to annotate the order with an accurate shipping total from the courier of their choice.

While the above is a bit antiquated by today's standards, you can see there are some architectural analogs with respect to content pipelines.  I'd be really interested in seeing/hearing more about how you would implement this.

Comments

No Comments