Archives / 2004 / July
  • Another solution to spam

    My previous solution to spam detailed in another entry was based upon how I, as a service-oriented technical architect, naturally approach such problems.  Having had a bit of a think about it, and a bit of enlightenment about how money isn't the only currency (CPU time, etc. also are in a way), I've come up with a new zero-infrastructure solution:

    1. A plugin is developed for mail clients such as Outlook, Eudora, etc.  This is installed on all client machines
    2. This plugin is activated whenever an item is sent.  It takes a subset of:
      • The sender's e-mail address
      • The recipient's e-mail address
      • The subject line
      • The date-time stamp
    3. Using this data, it computes some mathematically intensive function that results in a value.  This should take a few seconds (of background processing) to compute.  This function doesn't involve public/private keys - it would be a freely available algorithm that simply takes in the order of 2 seconds to come up with a value.
    4. This value is appended to the outgoing mail as a header - this is the signature
    5. --- The mail is transmitted and received by the client ---
    6. A plugin exists in the client's mail-reader that intercepts this header
    7. The function chosen for the computation must allow the "correctness" of the value to be determined within a fraction of a second rather than several seconds (there are formulae like this - I just can't remember them).  Again, this algorithm would be freely available
    8.  The validity of the header determines the validity of the e-mail

    Rather than creating the signature/validating at the client, certain mail-servers could do this - both inbound and potentially outbound (from certain trusted servers).  Individual users could set up rules as to whether or not they accept unsigned mails.

    Why's this solution good?
    Basically, it's realistically free for everyone and requires no infrastructure.  If there's one thing that all the file-sharing applications have proven, it's that de-centralised peer-to-peer systems can thrive.  In terms of implementation, this solution would take minimal time to be developed as a plugin for mail clients - the triviality will lend itself to freeware implementations, leading to mail clients including it in the long term.  The fact that an e-mail takes an extra few seconds to send in the background wouldn't affect a normal user, but it would make sending signed bulk-mailings prohibitively expensive.  For companies that send genuine bulk mail-shots, they could just be added to an allow-list on an ad-hoc basis (i.e. when you sign up to a mailing list).



  • Energy expenditure signatures

    In an ideal world, the effort being expended at any point in a project will be the same as at any other point - the team involved would never be overstretched or under utilised, always working at an optimal pace.  One goal of any software development methodology should be to achieve this burn-rate equilibrium.

    In projects run with non-agile methodologies this very rarely happens.  The reason behind this is the same reason the projects fail - traditional methodologies involve big up-front designs that will be wrong, requirements will change, and so on.  What's interesting is the energy expenditure signature of these projects is the inverse of what the methodology advocates.  The effort *should* be expended at the beginning, ensuring all requirements are captured, that implementation and infrastructure designs are produced, that SLAs are defined, that resources are allocated and equipment acquired.  Basically, the aim is to mitigate all risk downstream by spotting it up-front.  In reality, you get an inverse signature of this - rather than having energy expenditure tailing off towards the end of the project, it picks up - bugs get found, requirements change, assumptions are proven incorrect, etc.  So, you have a rising curve (and this isn't a linear gradient), rather than a falling one.  Judging the exact graphs you're going to get of expenditure is really tricky, as you never know until the end of a project whether or not you've got all the requirements.

    These signatures are far more interesting in an Agile project, however.  Whilst the methodology allows, and even predicates the execution of work at a near-constant ongoing pace, the reality of this is likely to be a tendency towards higher expenditure at one end of the project than the other, and this can be measured quite accurately against stories completed (in XP).  My supposition is that the signature that this generates may be able to be used to determine what state of health the team/project is in.

    If we assume that "0" is a centre-point for energy expenditure that is equivalent to optimal working pace (that's not inducing burn-out), and that a line is draw against time on the x axis, then we can represent this signature as a standard 2d graph.  I would predict that the individual signatures will vary from team to team depending on the dynamics of the individuals within them.  Based upon experience, below is an initial attempt at the breakdown of a couple of signatures, and what they imply (note that these are per-iteration graphs as well as per-project) with projects/teams I've seen.

    1. If the line starts at zero, and rises over the duration then internal facets may be underdeveloped.  One of several things may be happening:
      • Assumptions are being made that aren't holding true
      • The team members may not know each others capabilities/responsibilities
      • Basically, the learning curve is quite high
    2. If the line starts off positive and drops to zero over the duration, then the team has spotted a deficiency and is trying to compensate.  To me, this implies a mature team in an unknown situation - this may well be a lack of supporting infrastructure such as environmental defects, or a known lack of information.  Either way, the foundations for the team to work upon aren't ideal.

    One common signature should be the case across all projects/teams, however - A line that remains static at zero implies that the team and project is mature, running healthily... This is something to aim for.

    Feedback on these and other signatures would be received with interest...


  • A Solution to Spam

    I've had a constant e-mail address since 1998 or so.  I've never received more than five to ten real mails a day to that account as I've had others at the same time - whether at University, at work, on Hotmail, and so on.  I've got to keep this account because of all the books and articles that are out there have it as my contact details in the biography, and so on.  The problem is, the junk mail is now dwarfing the real mail (I now only receive one or two real mails a week to that account).  Here are a few statistics:

    • An average of 250 junk mails per day are being received, totalling around 5.5MB
    • 67% of the mail is "spam" in terms of items received
    • 33% of the mail are viruses in terms of items received
    • The average size of a piece of spam (not including viruses) is 11KB
    • The average size of a virus is 46KB
    • Microsoft's spam filter (Outlook 2003) catches approximately 30% of all junk mail, set on it's least aggresive level
    • My own VBA macro when combined with AVG anti-virus and Outlook's junk mail filter catches 99% of junk mail and viruses
    • On a 600Kbps Internet connection, it takes me around 10 minutes to download and process the rules on my mail each day

    For me to have to take the time to write a junk mail filter that works better than Microsoft's, this is clearly an issue that irritates me.  When I spotted that the junk mail problem was getting out of control, I started thinking about what the key problems are that led to this, and how to fix them...

    1. One of the main problems is that you can't guarantee the validity of a sender.  Anyone can send me mail, and they can claim to be anyone.  This is due to open relays, the fact that you can set any SMTP details you want on sending, and so on.  This problem will hopefully go away over the next couple of years with new standards that are being put in place to validate servers.  This will no doubt be circumvented by servers being hacked, however.
    2. The second problem is the biggest - other than the cost of electricity/ISP bills, it's free to send e-mails.  Having worked for a marketing company in the past, it's all a matter of numbers...  If it costs you 25 pence (cents) to send an item of mail, and you get a 1% return rate on that, you'd need to make 25 pounds (dollars) from each sale to break even.  But if it costs you a fraction of 1 penny (cent), then you need a much lower return rate, so you can afford to mail a larger demographic of people, and be less selective of the recipients.  This is why there is so much junk mail.

    The next generation of e-mail

    The way to solve this second problem is to charge for sending mails, even if just a single penny (cent).  To achieve this, there will be a network of mail-authentication servers around the world.  These expose two "Web services" - a SignMessage service, and a CheckSignature service.  Everyone that wants to send an e-mail has to open an account with one of these providers.  Whenever an e-mail is sent from within Outlook/whatever, it would first call the SignMessage service, and get a signature (which would probably just be a GUID).  This would cost a trivial amount.  The message then has a header attached containing the signature.  When the message is received, the mail-server itself or the end-user's mail client sends the signature to the server-network, which validates whether the mail has been paid for or not.  If the signature doesn't validate, or there is none, the mail is destroyed.

    There are two main ways of this system being funded - by the charging for mails, or by a subscription/subsidy model:

    1. The nominal charge for signing the message could easily fund the infrastructure required to do this.
    2. Upon successful receipt of a message, a third Web service could be put in place to refund the sender.  If this were the case, an alternative funding model would be required:
      • All invalid mails would go towards the funding
      • Governments could subsidise the network
      • ISPs could subsidise the network (as it would lower their bandwidth costs incurred from spam)
      • Individual users could "subscribe" to the service for a per-annum charge

    To me, the obvious candidate for running this service would be Google; they've shown an interest in getting into the e-mail market, and they know how to create a massively scalable, high-availability system.


  • Anti-Patterns of Architecture

    There are many books and articles on architectural patterns available. Despite these, systems with bad architecture seem far more endemic than good architecture. Here's a quick taxonomy of the anti-patterns of architecture that I've spotted in systems over the last couple of years:

    1. Catastrophe oriented architecture (or worst-case architecture) - a common anti-pattern that I've previously discussed in a Blog entry. This is where, rather than designing a system based on the current requirements, effort is focused on dealing with designing in support for the possible negative impacts that may occur, leading to a system designed as though every undesirable feature has been made into a requirement. For instance, designing a .NET application to be built in J#, and to only make use of Java compatible functionality, on the off chance that the enterprise strategy changes to a Java platform in the future.
    2. Pattern oriented architecture - A trend that's found in systems created by architects who've recently read and digested books on architectural patterns and come to the conclusion that implementing patterns (such as front-controller, etc) is the key to a successful architecture. Rather than determining if common patterns are appropriate, and picking the most suitable one, all patterns that can be applied are done so, on top of one another, creating an entangled, schizophrenic system.
    3. Enterprise oriented architecture - This is found in applications that sit within a larger Enterprise, where overly-engineered facets are put in place "for the sake of the Enterprise. For instance, using a data-center scale database server for logging exceptions as "it's the enterprise standard", or using an application framework that's larger than the application itself. The impact of this anti-pattern is the increased infrastructural cost and lowered agility of a system.
    4. Interface oriented architecture - Noticeable by the number of interfaces in a system dwarfing the number of classes that actually implement them. This pattern regularly evolves for similar reasons to pattern oriented architecture, where certain more "advanced" features of systems (in this case interfaces), are thought to imbue quality into a system. A good example of this is a system where Customer and Employee both inherit from a base Person, even though there are no common functions in a system that could perform operations on both object-types.
    5. Service dis-oriented architecture - Service dis-orientation is a recent anti-pattern that occurs when the technical principle of SOA gets introduced to an architect without the supporting body of knowledge. The principle, having communication via loosely-coupled Web services, is a technique that's useful in circumstances where distribution and interoperation are a requirement, but it also introduces many drawbacks (transactional incapability, for one). Service dis-orientation usually manifests itself as an architecture where internal objects or large datasets are exposed via the services, ignoring one of the key reasons for using them - abstracting the communication to that of business document, rather than the internal implementation.
    6. XML oriented architecture - a design created by those that have jumped on the XML bandwagon and see it as a panacea to all. Systems designed using this anti-pattern are permeated by XML being passed through all layers - XML in the database, XML between layers, XML & XSLT for the presentation, generally resulting in an architecture with fairly loose tiering. XML is a great way of communicating between heterogeneous systems and represeting complex data structures, but, like every technology, it has its limitations. For instance, linking data and functionality is difficult with XML, as, potentially, is storing and manipulating it relationally.

    I'll see if I can think of any more to add to this list as time goes on.  In the meantime, here's a question: Is it "antipattern" or "anti-pattern"?  The Google hit-count seems fairly even on the subject...


  • What is Service Oriented Architecture (SOA) Really

    I've had a number of conversations with people since SOA became the vaunted architecture of the future.  Everybody that I've come across has slightly different thoughts on what it means, and many are in the "it's exposing objects as Web services" camp.  I'm really comfortable with what I think it is now, so maybe this'll help clarify it for a few people...

    Firstly, let's start on what SOA isn't.  It's not a replacement for DCOM or .NET Remoting or any other object technology.  It's not simply exposing objects as Web Services, (that approach is in some ways a replacement for those technologies).  It isn't just a way of getting round firewalls to allow remote commmunication - once XML Firewalls proliferate within enterprises, that problem will resurface in many situations.

    What SOA gives us is a new slant on architecture, rather than simply extending the object paradigm that's been evolving over the last couple of decades, it addresses a couple of fundamental shortcomings:

    • The interoperability between disparate platforms (using standards based technologies)
    • The ability to distribute these systems over an arbitrary distance

    I won't dwell on these points as everybody knows about them, and the term "Service Oriented Architecture" makes people immediately think of "Web services" and those two are purely benefits of Web services.  To me, there's a third aspect that's equally important in most scenarios, and potentially far more important in internal systems:

    The ability to design systems around business processes rather than the technical implementation details

    This is a really key point that's not immediately obvious - if an attempt at an SOA system is made where standard objects are exposed via Web services, then it may not be apparent for quite some time that this has been missed.  Designed correctly, the service interface will abstract the internal implementation from the business function.  This is a subtle but critical difference from OO - although OO offers encapsulation, the object relationships are still both apparent and externalised, creating dependencies.  A well designed service will never expose a single facet of implementation.  What this gives a business is the ability to continually introduce change within a service, rather than the slow-down and increasing cost of change you usually get over time in OO systems.

    The best way to think about this is in terms of paper forms in an office.  If you imagine application forms for accounts, invoices, and so on, these rarely change within an organisation, whilst the computer systems that produce and process them change fairly regularly.  If the essence of these forms (the fields on them and the form's purpose) can be captured rather than worrying about the internal details of the implementation, then change can be managed behind the service interface.  A good example of this is representing and managing a Customer within a system.  On first examination, you'd probably come to the conclusion that you'd need an "UpdateCustomer" option.  But that's implementation detail, and the customer object you passed in would probably have fields like CustomerStatus which clearly wouldn't ever exist on an paper form sent from a client.  More likely, there'd be a ChangeAddress or ChangePersonalDetails method; these methods should only take in the business-data required.

    If a system is designed this way, and a change to the address (perhaps to add an extra line of detail) is introduced, there will be very few parts of the system that need to be updated and tested to support this - how much of your system is dedicated to address change?  But imagine that was part of the UpdateCustomer method - that would probably be used throughout the system if implemented, and the change/risk surface would be huge.

    This is the first thing to take away about SOA - it's not about another level of technical abstraction, it's about capturing the business process as-is, defining it precisely with contracts in XML and allowing internal OO change as necessary.  Again, if you find yourself simply exposing an internal object as a parameter/return value from a Web service, then the SOA paradigm has been broken.

    I'll probably write an entry about data immutability in SOAs over the next couple of days, as that's another really important subject...


  • Thoughts from 1998

    Back in 1998 I wrote a few "articlettes" similar in length to those I'm doing for my blog now.  They were each put as a page on my website, which at the time was promoting my skills around web-design and Internet technologies.  I was refreshing a bit of content on my site over the weekend and spotted one of them from around March '98 on my predictions about the .COM boom.  I thought it might be worth posting up here for interest, so here it is, unedited in any way other than to tidy up a couple of typos:

    One of the most important, yet overlooked, facts about business on the Internet is that there is no new money. There is only ever a finite amount of money and a finite amount of consumer purchasing. The key issue is this - for every pound/dollar that is spent on the Web, one will be lost from traditional bricks-and-mortar companies. This is where the problem exists - it is very difficult to prise a loyal, existing customer base away from an established company to a start-up without a decent incentive. The main incentive that Internet based companies try to offer is price due to lower overheads. However, this is easier said than done, because the start-ups don't have the economies of scale that large, existing companies have.

    The second problem is that existing companies will quickly notice if they are losing market share to start-ups - they will create e-commerce sites themselves. These sites may well be able to offer lower prices due to the volumes they are purchasing in. Having said this, there will be successful businesses on the Web - Amazon for example should, eventually, become a profitable company because they appear to have the whole package: Good service, large product range, good prices, and a good reputation.

    On the other hand, many high-profile companies will probably not succeed. A great example of this is - they only exist to offload under-subscribed holidays, etc. onto the public. Once the airline and tour operators realise that there is profit here, they will incorporate such deals into their own online booking systems (which most of them are currently working on). LastMinute will find itself competing with its own suppliers for business - with the disadvantage that they have to add mark-up to make a profit.

    For these reasons, the glut of Venture Capital that is being thrown at .COM ideas is sooner or later going to stop - these new-economy companies don't make a profit then ideas will stop being funded. Valuing these companies at greater than the standard 10x earnings is acceptable if it can be shown that they have a viable business model. The majority of these companies don't as they are not making profit even when there is little competition. Once the big-guns get their new-media storefronts set up, and the world realises it isn't as rich and wise as it thought, the bottom will fall out of the new economy. This may take 2 or 3 years, but it will happen. I don't know about anyone else, but I certainly won't have my money invested in half-baked ideas when this happens!

    In case you were wondering, that text has actually been live on my site ever since it was first written.  Looking back, my only wish is that I'd had more money in .COM ideas up until just before it did all go bang a couple of years later...


  • Outsourcing the Future

    In almost all of companies that I've worked, the issue of "who does development on the new technology" has arisen.  In a large number of cases, the work has been outsourced to contractors, leaving internal developers disenfranchised and resentful.  The general developer opinion has been that the managers don't care, that they were brought in for their skills and have been left to rot since, and that there's a lack of belief in their abilities.

    Based upon the number of times this situation occurs, I personally think the blame is more attributable to a combination of slight failings from different areas depending on the company (as always, there can be good and bad people in any position):

    1. Contractors generally don't share knowledge when they are hired in.  The contractor usually cuts a solitary silhouette, not engaging with the developers other than to assert their technical superiority, potentially trying to prolong the contract for as long as possible.  The contractor may well even be using the project to skill themselves up on a particular area.
    2. Those that hold the purse strings for the projects/programme (upper management) for generally pushing for a short-term financial gain over the long term success and health of the business
    3. Technical management for not having the belief in and the strength to stand up for their internal staff.  Again, this is taking short term success over long term health of a company.
    4. The project manager/development-lead regularly pick the contractor, basing the decision upon technical skills (as that is the perceived requirement), rather than their ability to ensure that the project is not just delivered, but *maintainable* and understood.  I.e. a contractor that understands that they are there to mentor staff as much as cut code.
    5. And finally, the developers...  The developers assume that they are "owed" by the company to provide training.  Whilst different people learn in different ways, I don't believe that training courses alone (which is what many developers ask for) really add much value.  I personally find Google and MSDN far better value resources than any course.  If a developer really wants to ensure that they get to work on new technologies, they need to do their bit to stop it from (perceptibly) costing the company so much.  They need to add a percentage to how long "legacy" projects take, spending time learning new skills and techniques, basically "playing the game".

    Still on the subject of developers, the typical training budget of a couple of thousand pounds/dollars a year is nowhere near enough to learn a new technology, so the shortfall will have to be made up somehow.  Rather than a rigid stance that "the company owes it to me", developers should understand the commercial realities of business, and judge:

    • Why they want to work on the new technology - is it for fun, or is it because of current skills becoming obsolete?
    • How can they justify training on a new technology in financial terms?  (I.e. exactly what training do they need, how much will it cost, and how long will it take for the company to make that money back)
    • Is their current company really any worse than any other?
    • What are the risks and rewards of changing job?

    From a management perspective, the problems occur (in my experience) for two reasons:

    1. Bonuses and promotions are usually only based on the success of the projects the individual has delivered, promoting short-termism.
    2. The power generally lies with those the hold the purse strings; people that (rightly) generally don't care about the technicalities of the solution.  One technology may well appear much as any other to such people, so the importance of internal skilling and the relevance of this to the developers may well not be apparent.  (I'll be writing more about this in another entry soon...)

    When this problem occurs, analyse which one of the following 5 parties is causing the issue and why.  Then you can determine if you have the power to affect the situation in any way, or if it's better to save your energy.


  • Post-it notes and planning tools

    As other projects have been relatively quiet, a lot of my effort from 9-5 over this last week has been on XP StoryStudio - an XP planning tool that we've been developing at work, and should be open-sourcing shortly.  It's the second version of the tool, with the first already used internally.  So, this version's nearing completion - I've been adding a few reporting statistics such as story-point burndown rate, making it know when stories and iterations should be in-progress/complete based on the tasks within them, etc.  I've got through about 13 stories this week along the lines of "As a developer, I'd like a story to be set to be in progress when I mark a task within it as in progress".

    Given that these are standard stories and that the product is nearly finished, you'd think that I'd be eating my own dog-food and using XPSS v2, right?  Well, no.  Not even v1, for that matter.  I've got a whiteboard with post-it notes for the stories on it.  <Looks embarassed>  I didn't realise the irony of this until Friday afternoon.

    There are two ways to look at this situation: the first is that StoryStudio isn't as great as we thought.  I think everyone in the office is in agreement that it's almost a necessity to use it to get meaningful information about how projects are progressing, especially now this version supports multiple projects running at once.  That leaves the second option; that rather than simply thinking we're being a proper Agile team, going through the motions, "the simplest thing that can possibly work" is alive and kicking in the team; there was no overhead where it wasn't needed.

    That's the moral of the story:  Although it's a good starting point, following books and the like on Agile development isn't truly getting it - methodologies are, by nature, generalisations, and generalisations can always be tailored to be more appropriate to specific cases.


  • Generations of software design paradigm (or "what comes after SOA")

    As time progresses, software development becomes a more mature discipline.  With each "new generation" comes a solution to another problem that was found.  For instance, the generation of C++, Visual Basic, Delphi, and other similar technologies in the early 90s was the generation of Object Orientation - trying to model real world artefacts with their digital equivalents.  Each new generation takes what was provided by the previous generation and builds upon it through abstraction, solving another problem (along with introducing efficiency developer gains), bringing the development paradigm closer to the real-world problem domain:

    • Generation 0 is machine code - direct binary (or hexadecimal) encoding of commands.
    • Generation 1 is assembler.  It's a linguistic abstraction of hexadecimal that makes its representation more human readable.  This solved the problem of having a base language.  Obviously, the machine code was still compiled into machine code for execution.
    • Generation 2 should be languages like Plankalkül, but these were never implemented, so we'll have to skip to FORTRAN and the like.  This generation adds the ability to build multiple operations up into statements and group these together.  It solved the problem of "operations" - being able to represent a real world action in programmatic terms.  Such languages regularly compile down to Assembler before machine code.
    • Generation 3 introduces object orientation - a logical grouping of commands and data to represent an entity, solving the problem of dealing with related entities.  But these are still made up of extended generation 2 code.

    If you look at each of the languages in the final two generations above, you'll see that early attempts were extended by more thorough alternatives as the generation progressed.  For instance, Pascal is often considered a richer language than FORTRAN.  The same is true of generation 4 - the point we're at now.  Generation 4 started with Component orientation - the grouping of objects into collective units - another layer of abstraction.  This is now continuing with the introduction of SOA.  What problem is being solved now?

    With the introduction of OO we're now at a point where we can effectively model the real world and the interaction of items within it.  What that's lead us to is a realisation that we have a few more problems once this representation of the world is in place.  These problems are:

    • Location
    • Integration
    • Change

    The reason that this was difficult with just OO is that encapsulates the logic and internal structure in one place, is platform dependent in its execution format, and externally exposes interrelationships, creating sprawling dependency diagrams making change difficult.  Current generation "orientations" get around this by grouping the objects into logical sets that are interdependent, and isolating them from the implementation of further components/services.  COA is almost a halfway house that we get for free by designing OO systems carefully, solving some of the change problems, and allowing earlier attempts at decentralisation (DCOM, for instance) to be more feasible.  SOA takes the COA model, and extends it by allowing us to define contracts in a standardised format that have no bearing on the underlying data model:

    • Allowing for this copy of the data to move between locations
    • Allowing heterogeneous systems to interoperate through homogeneous data items and protocols
    • Allowing rapid internal change without dealing with the dependency issues of a large OO system, due to the abstraction of the contract.

    SOA really is a huge step forwards.  In many ways, it's a bigger change than OO if understood and approached correctly.

    So, where's this 'blog entry going?  Looking to the past, we can predict the future.  Firstly we can see what will happen with SOA adoption:

    1. People will get it wrong and think that simply by using SOA technology, they'll have a good SOA architecture.  I've seen some truly non-OO systems developed in C++, and I bet there'll be some lousy SOA attempts than actually just expose objects over Web Services.
    2. It will add an overhead that some purists won't accept, and it will leave "niches" (quite big ones) in the previous generations for more specialized applications

    Secondly, we can take a guess at what the replacement for SOA will look like:

    1. It'll most likely be in the form of another level of abstraction/grouping that makes use of SOA under the hood
    2. It'll solve the most common problems that arise when creating SOA applications - the issues that are "worked around", and will make developers a bit more productive in the process.

    Having worked on some pretty big SOA systems, the biggest pain I come across is coordination.  Services simply don't currently support transactions, especially long-running ones (and in another post I described why I don't think they ever *really* can).  There's also an issue I touched on in another post about the standardisation of data and operations.  This new architectural paradigm will most likely give us a platform that supports the logical grouping of services into things that can be coordinated together.  Microsoft's upcoming Indigo technologies will be part of that journey, as will more mature evolutions of coordinating engines such as BizTalk.  Again, we'll be getting closer to the real world problem/solution, by having diagrams of virtual "paper forms" flowing between "inboxes".  Personally, I see Indigo being to SOA what components were to OO - an extension that doesn't define a new generation, but does a good job of cosmetic surgery to prolong the shelf life.

    To give an analogy, I see the evolution of software systems a lot like the evolution of shopping.  Thankfully, others have independently come up with the same comparison (Pat Helland of Microsoft, for instance).  You start out with independent shops, which are like unrelated programs.  You can work your way through the growth of the 3-or-4 store group (un-connected product suite) that you get in the suburbs, up to the current shopping centres.  These are like current enterprises - everything fits into regulation sized slots, and can allow certain things to interoperate across them (services such as security, telecomms, cleaning, etc.), but they're not really fundamentally integrated.  The goal is the hypermarket where everything can be bought from micro-stores within the one big store - a single trolley, a single bill, a single payment.  (If you're in the UK, think of the Market Street we have at Morrisons).  To get to this point, we'd need common definitions for a customer, a bill, an actions such as DebitCard.

    That's where I hope the next generation will get us to - the operator and data standardisation talked about.

    Having re-read the bold claims made in this about the potential of SOA, I think an entry on what it really means is necessary over the next few days.


  • Virtual Evolution via TDD

    Whilst falling asleep last night I had a thought - the suite of tests in a Test Driven Development project actually forms a fitness function, one that determines how fit-for-purpose the system they're associated with is.

    That, in itself, isn't really a revelation.  But one of my other interests is anthropology and evolution.  And as soon as I thought of the term "fitness function", I started considering the implications.  In the real world, a fitness function can determine whether or not any organism can survive.  In principle, the same is true of a full acceptance-test suite for a computer system.  So, if generations of organisms can mutate of their own accord until they meet the requirements of a fitness function (or become extinct), then that rule should also apply to computer systems, shouldn't it?

    The idea of evolutionary software isn't a new one - I did a little work on genetic algorithms at university, and there's a good book called Virtual Organisms I once read on the subject - those are both going back a few years back now.  But the thought of us already being in a place where it's feasible to do this, and with the current development practices is new to me at least.

    As always, there's a problem... you can pretty much guarantee that you don't have 100% test coverage of your system.  Basically, not all of your requirements are captured in tests.  Aesthetics are the first things to get missed, followed by other areas such as security and performance.  Without tests for all requirements, an ideal (or even acceptable) solution will never be met.  For instance, imagine the fitness function for surviving on the African Savanna.  It would consist of the ability to withstand prolonged heat, ultraviolet exposure, periods without food and water, and so on.  What about the requirements for co-existence, though?  The ability to outrun a cheetah, or duck from an eagle.  If just one requirement isn't captured, the virtual function won't define the real-world success of an organism.  Just as the programmatic tests won't define the real-world success of an organism if a business-requirement is ommited.

    So, will this ever be possible?  With some technologies that have been stretched beyond their initial purposes, probably not.  HTML is a lousy way of creating dynamic interfaces, so the effort required to be able to capture 100% of UI requirements in tests is unlikely to pay dividends, meaning manual development.  But with the new Avalon technologies in Windows Longhorn and other similar approaches, programmatic testing of the UI is more feasible.  Similarly, security tests for buffer overflows, etc. are becoming less of a requirement as platform support is built in.

    It may not happen for many years, if ever, but just imagine the day where you define precisely what you want your system to do, go and make a coffee, and come back to find it's evolved for you and has interacted with other virtual organisms to deploy itself to a live environment.


  • Good Architecture == Bad Architecture

    Today, I spent some time questioning what makes companies decide to migrate to new architectures.  The conclusion that I came to was based upon the "architectural change is a tax to pay/avoid" mantra that I think Business has.  Basically, change is often only approved where necessary or with zero cost/risk; if the current architecture is seen as failing now, can be shown to fail in the near future, or change has no real cost/risk associated with it then there is a good reason to change or little reason to avoid change.  If the risk of staying on the current architecture vs. migrating can't be shown to be a really clear call, then change is less likely to be approved.  This is where I've found a problem to exist:

    If your current architecture is well designed, you may not be able to justify moving off it.

    Obviously, in companies where a more equal technology-business partnership exists, this isn't as much of a problem.  For others it can be; in the long term, a migration might prove really valuable - a change to SOA may help you become more agile by being able to switch partners more readily.  But if that's not a current requirement, how do you justify change now if, say, you've got quite a robust component oriented DNA architecture?

    Hence a relatively good architecture is sometimes actually a bad architecture.

    I'll qualify that a bit; it's only actually bad in terms of migration.  But surely a good architecture is one that supports change at low cost, not just one that's technically impeccable?  If you bear the need for future change in mind whilst developing on an existing system, this can then be supported much easier than through retrofits, as is always the case. 

    How migration support is built into a system has to be approached on a case-by-case basis.  Several standard approaches exist, though, such as  using data/transport formats that are common to both the old and new platform at certain shearing points.  But these are really just the same techniques as used for supporting internal change.

    If you think about what's just been written, there are two things to note:

    1. It's actually shifting the cost of transformation - rather than incurring it at the migration point from Generation.N to Generation.N+1 architecture, it's spreading it across the development of Generation.N as well.  
    2. It's altering the types of migration that are available.  With common data/transport formats, encapsulated logic, etc. you can implement a piece-meal migration, rather than a big-bang approach necessary with systems that consist of, for instance, one giant object model.

    So, the title of this entry is possibly a bit misleading.  There's an important principle, though: systems as a whole are iterative, and creating an internally perfect Generation.X of a system won't necessarily help, and can hinder the creation of Generation.X+1 - if a system is architected to be good for the current situation, that may well be all it's good for.  In the worst case, if a large part of the Generation.X+1 budget is spent on migration issues, your architecture could degenerate over time rather than improve.  Personally, I think this is part of the reason that the "shiny new architecture" that's promised always appears a little tarnished by the time it's introduced.  I'll look at the other reasons for that in another entry...


  • Transactions and Services

    I haven't got a definitive answer on this, but I came up with a question today.

    Q: Are transactions and services mutually exclusive?

    This query is based upon the fact that a service is inherently self contained - that's why it's a service.  On the other hand, a transaction in many ways implies containment.  Now, I'm sure that technically we can implement a transaction object and mechanism that would allow Commits and Rollbacks across multiple services in a singular fashion.  It's clearly achievable (Turing machines and all).

    But, does that mean that we should do that?  If we're calling a service, it's up to the service to figure out how long to take over executing; we can't assume anything about how much load it's under, what it's internal state is, etc.  So how can we instruct one service to lock resources based upon the presumed state of another?  With short-running (synchronous) calls this is a problem, but with long-running (asynchronous) calls it's a nightmare.  It kind of implies that a service should have to take into consideration the context in which it's being used, and what other service-chain a consumer might be plumbing it into.  Again, this would turn the typically introspective service on its head.

    I toyed up with the idea of a separate transaction service that everything that's "enlisted" in a transaction calls down into to try and tip the balance back to the main service being in control again.  It still doesn't solve this issue of containment, though.

    I'm going to leave this topic for a while and see what others come up with.  My personal opinion is that although SOA is the current buzz-word architecture, it's merely a step down the road, and that when certain things start getting shoe-horned into a technology, it's maybe time for a new TLA and a new paradigm.  The generation N+1 replacement for SOA will probably get around the issue of data being disconnected and out of date.  At this point the transaction problem will go away.  And I'll be posting on that shortly... (I wonder how long it'll be before the term "serviceable object" comes into common use)


  • XML - Just another TLA?

    XML.  Possibly the most highly vaunted TLA of the 21st century thus far.  Religious wars continue on value-vs-attribute - even I'll join in on that argument given half a chance.  Systems everywhere are burning CSV logs on pyres in the race for XML as an output format.  Relational databases are being contorted to support the storage and querying of this data.  At this rate, the angle bracket will outstrip the full-stop as the most commonly used punctuation mark by the end of 2006.  (OK, so I made the last one up, but it wouldn't surprise me)

    There are two key aspects to XML:

    • It can represent arbitrary data
    • It can be interpreted by almost any system.

    I'm sure that other people would argue that the self describing nature of it via schemas is equally/more important.  But this is my Blog, and I don't think that's half as important. Even if it is self describing, you still need to do something with it to get the data you want out - you still have to have your applications understand what's in it, and that still means coding.

    The key point is that the angle brackets are irrelevant.  In fact, the whole structure of XML is irrelevant - XML, in itself, is meaningless.  Any format that allows the encoding of arbitrarily complex data in a uniform manner would suffice.  The fact that it has nice bells and whistles like 7-bit encoding helps, of course, but it's not the key.

    What XML gives us is a means to transmit hierarchical data between locations and query it:

    • It saves us from having to write object models.
    • It saves us from having to create database schemas to represent everything.
    • It saves us from having to write custom ways of interpreting data from external systems

    But if you look at that list above, you'll see that there are times when the inverse of each of those points may be really important.  You may want an object model as it allows methods to be associated with data.  You may want a database schema to allow efficient querying across large data-sets.  You may want a more efficient means of communication with another system.

    One of the problems with the XML - objects - DataSets - whatever-else relationship is that it's quite difficult to change a system oriented towards one approach to another.  SOA will help if the service interfaces are carefully defined and shield the consumer from the internals of the system.  Newer technologies such as SQL Server 2005 and other databases that have decent native support for XML blur the lines, as does XML-serialization of objects.  The price to pay if the wrong option is chosen - the cost of change - is still very real, though.

    The point of all of this?  XML has earned its place in the world: it both saves us time when we don't the benefits that additional overhead may bring (databses, object models) around what we're doing, and it allows for great machine-readable external communication.  This is what it really standard for.  But anyone that says they've created an entire enterprise infrastructure based entirely and unrelentingly around XML would give me cause for concern (and I've seen systems like this).  It's a technology to be used judiciously, just like any other.  There is no one technology that's good for everything - in the majority of cases the very benefits that each option offers us are also its drawbacks in another form.



  • XML Explicit

    On various MS code help forums I've seen dozens of posts along the lines of "How do I get a root tag in an FOR XML EXPLICIT statement.  There have been numerous responses:
    Do three SELECTs, with a "SELECT '<root>' FROM [Foo]" as the first one
    Read the data back into one string, and append the opening/closing tag
    Etc. etc.

    These all feel like "hacks" to get round the real issue - SQL Server creates structured XML by converting flat, tabular data into a hierarchy using level-indices (i.e. an integer from 0..N).  All we actually need to do is create a single element at a level higher.  So, given the following SQL (Run against Northwind):

      1            AS Tag,
      NULL         AS Parent,
      [CustomerID] AS [customer!1!customerid]



    We can add a root element of "customers" by adding the following preceding SQL:

      1    AS Tag,
      0    AS Parent,
      NULL AS [customers!1!foo],
      NULL AS [customer!2!customerid]

      2            AS Tag,
      1            AS Parent,
      [CustomerID] AS [customer!2!customerid]



    To see how this tabular -> hierarchical conversion takes place, simply omit the FOR XML EXPLICIT statement from the end of each statement when run in Query Analyzer.


  • Metaphor and Analogy

    In general, humans learn from experience.  Either directly, or indirectly through the experience of others.  So, we refer to previous examples of similar patterns and behaviours to explain new ones that occur.  The benefits of this are clear - it allows us to more readily understand and convey concepts without the need for a new frame of reference.  This is the use of metaphor.

    However, there are also drawbacks to this.  Firstly, the word CONCEPTS was used above.  That's important as DETAILS aren't conveyed well with the use of metaphors.  If the details were the same, it wouldn't be a metaphor, it would be a recurrence of the exact same event.    As the details ARE different, that means there is a point (a level of detail) at which any analogy (metaphor) becomes redundant.

    How do we know when the metaphor stops being applicable?
    To deal with inconsistencies between the new situation and the metaphor used, tweaks are generally added to allow the continuing usage of the point of reference.  This is worth noting as at a certain level of detail, the effort involved in maintaining the metaphor will outweigh the value that using the metaphor at all adds.

    Predicting the future based on the past
    One of the most commonly used reasons for metaphor, besides giving a commonly understood point of reference is to predict what will happen going forwards based upon events that occured in the realm of the metaphor.  But these predictions may be affected and invalidated as the metaphor, by definition, has differences in the detail.  These differences can be of one of three types:

    • The metaphor will have extra detail that isn't present in the current situation
    • The metaphor will be missing detail that is present in the current situation
    • External factors will have different weightings between the two situations (a discrepancy within a similar detail)

    Why does this matter?  And how do these items link together?
    By determining where the metaphor doesn't match the current situation, we can find the factors that will negatively impact the reliability of predictions, and take them into account when judging their merit.

    The second thing to take away from this is that time should be spent judging at when level-of-detail any given metaphor is appropriate.  Top-down diagramming of a system and then finding the appropriate level up-front is a useful tool, as it allows you to see at what point you are going to be over-investing in the analogy, and where you are going to start drawing false conclusions.

    Remember: metaphor and analogy are great at giving people a high level understanding of a new problem.  If they want more detail than that, then there is a price they have to pay in their grasp of the situation.


  • Archiconjecture

    Having experienced the phenomenon for quite some time, today I've christened it: "Archiconjecture".

    This is the endless contemplation, discussion, postulation, and high-level design that goes on between architects designing systems out of context of the real world.  It is a situation that arises when there is an imbalance in the relationship between the technology and the business it "serves".  The choice of the word "serves" is specific - if a partnership exists between the two sides of the company, then this imbalance is far less likely.  However, when the business feels that technology investment is a tax to be paid if necessary and avoided where possible, as is often the case in this post .COM IT-sceptic marketplace, the ability to introduce technology change is restricted.

    Whilst it is easy to blame the business for a rigid stance, the blame really lies with the lack of a common understanding and focus of effort.  The business is concerned with only one thing - maximising profit at minimum risk.  This is where their effort is expended.  Some "architects" expend their effort designing the most complete, technically immaculate systems they can.  This is where the disparity occurs.  Whilst the design of a system may be immaculate - presenting no risks in terms of implementation costs, running costs, or security flaws, this somewhat misses the point - the risk may well be in whether the cost of implementation is outweighed in the value it will add once completed.

    To avoid endless archiconjecture, there are are two things to understand:

    • The business must understand that an architecture needs maintenance to remain agile - it can't be left to stagnate or the cost of projects in the long term will rise.
    • The architects must understand that technology is only used in companies as a means to an end, and that any cost systems incur is taken directly off the company's bottom line.  The "best technology" and the "best technology for the business" are different things

    And there are two tasks to undertake:

    • The business must define precisely what value any given system will provide, and consider the value of any system to the company as a whole, not just their specific project.
    • The architects must design a solution that costs less than this value, with any technically unnecessary but desirable architectural change defined in terms of cost and value.

    The day that you spot archiconjecture in a company is a good day to organise a meeting between architects and the Business, where they can discuss change in terms of risk, value, and agility.


  • Standardisation of Web Services

    Now that the Service Oriented Architecture (SOA) bandwagon is in full swing, and real-world implementations are starting to proliferate,
    a physical crack is starting to appear in the veneer...  the (lack of) homogeneity of data.  It is a key area of focus in many systems and enterprises.  How many different definitions of a customer do you have?  How does that differ from your supplier's definition of a customer?  How do you reconcile these differences without having mapping functions at every level that create lossy-systems?

    Those aren't questions I'm going to talk about here.  That's not to say they're not important - they're critical - but in the scramble to deal with those issues, something's been forgotten...

    In an SOA, data can be thought of as OPERANDS - the values passed into functions.  These functions are the OPERATORS; "create customer", "debit account", etc.  It's these operators that have largely been ignored; everyone's focussed on data rather than operations.

    QUESTION: When you submit the details of a new customer to a service, what should it return?  Is it a customer instance?  A customer ID?  A result code designating the success/failure of the operation?  Or is it a void function?  This is the first area where heterogeneity of services becomes apparent... there's no commonly agreed standard on what data is returned from a standard type of operation.  This is visible/external or "interface heterogeneity" (my terminology).

    QUESTION: When you submit invalid details of a new customer to a service, what should it do?  Should it throw an exception?  Should it create a record but flag it as invalid?  Or should it just not do anything?  This is an example of invisible/internal or "implementation heterogeneity" (again, my terminology).

    Once the world is in a euphoric state because of operand homogeneity - that magical day where we all have a single definition of a customer, and we all know how many lines to store in an address, the ugly head of operator heterogeneity will rear up to burst our bubble.  Why does it matter?  It matters because without it services aren't interchangeable.  SOA gives us two main benefits: compartmentalisation of functionality & data and standardisation.  This standardisation gives us commonality in:

    • Transport protocols (HTTP)
    • Locations of resources (URIs/URLs)
    • Data-format (XML)
    • Meta-date (security, etc. via SOAP-headers and standards like WS-Security)

    When you list all these benefits, it looks like a state of plug-and-play Nirvana... if you don't like one partner for billing a customer or booking a cargo container, you can just point at a new one, and away you go.  But, unless the way in which the new service works are comparable to the old one, a great deal of redevelopment may be required.  This is especially important as it may LOOK as though no changes are needed at all - you may have interface homogeneity, allowing code to compile and execute seemingly without the need for alteration, but that's not to say there aren't fundamental discrepancies between what's going on under the hood.

    How can this be fixed?
    Internally within a company the benefits of SOA are largely around compartmentalising the systems to allow change.  Interface and implementation heterogeneity will cause pain, as it'll be difficult to deal with boundary conditions without a common set of guidelines around when SoapExceptions are used, what's returned from a method, what happens when bad data is submitted, etc.  But with all of this in place, the problems won't be insurmountable.

    Externally, the problem is much greater.  Without a great deal of effort being expent on communication, and impartial standardisation being introduced, this issue isn't going to go away.  To some systems that only have minor hooks to the outside world this wil again be solvable on a case-by-case basis.  For those businesses that rely on the agility afforded by the ability to change suppliers/partners this is a major issue, though.  And over time the problem will become greater as more and more systems are linked together.

    Does this mean that, for now, you should make every service call configurable to deal with potential differences between services consumed if agility is a key requirement?  In my opinion, a general-case answer is "no".  There are so many variations on implementations, the cost will never be reclaimed.  I would simply isolate the external call - encapsulate it - so that if/when it does have to change, the scope of the impact will be both measurable and minimized.


  • Worst Case Architecture

    I've spotted a (worrying) trend amongst architects over the last couple of years.  Whenever designing a new system, there's a tendency to spot the potential for a change in requirements in the future that would negatively impact the system, then design that in from the start.  This is what I've termed "worst case architecture".  Where rather than designing a system that is either inherently flexible or cheap enough that throwing it away isn't a great loss, the architect becomes a seer, predicting what s/he thinks would have the most costly impact, and designing it in from the start.  This is the cause of several problems:

    • It inherently makes the system cost more as features are only ever added, never taken away
    • It designs undesirable features into a system from the start
    • It forces long term strategic decisions to be made based on presumptions, rather than with all the facts
    • It regularly forces the company down that undesired path as ROI is needed on the expenditure

    There's a great example of this I've seen in the course of my employment.  The company in question's long term strategy was to migrate everything to .NET.  However, there was a supposition that the current Java container would not suffice if ever the decision was made to bring in out-of-the-box Java based solutions for certain parts of the enterprise.  This led to the decision to replace the NAS layer that was currently in place (and which was to be disposed of) with a full WebSphere implementation.  Obviously, once that was implemented there would be a need to get a positive ROI on it, leading us down the route of buying in/developing Java systems, de-railing the long-term strategy.

    I've spotted occurences of worst case architecture at all levels - small facets within applications as well as enterprise wide strategic decisions; it can crop up in the design of any system.  This has led me to the following mantra whenever I'm architecting a system:

    • Positive features should be designed in
    • Negative features should be designed out

    So, what's the alternative?  As discussed in another 'Blog entry I'll be posting soon, I'm a believer in JIT (Just-In-Time) Architecture or "Architecture on Demand"... where you purposely don't design something until you hit the point of needing it, choosing rather to compartmentalise and encapsulate functionality.  After all, that's one of the benefits of SOA, right?  To me, a large part of the reason for having an architect is that they should have "been there and done that", or at least be able to relate to a similar situation, stopping such undesirable end-points from ever being reached, rather than expediting them!