Insights
of |

Fluid Services
September 16, 2010Introduction
As organizations continually build their software integration architecture based on the SOA paradigm, more and more services are being developed and reused to build other services. Just as OOD and CBD paradigms introduced code reuse in applications and component reuse across applications, SOA has brought the advantage of enabling reuse across distributed applications and platforms with flexibility and agility.
However, as systematic reuse of such services become more and more widespread, performance is becoming a real concern; Latencies introduced at each back-end call are accumulated, large units of work hinder utilization of parallelism, chained service calls cause large amounts of wasted resources deteriorating scalability. SOA has to address these problems to advance to the next level of maturity. This article analyzes some of the important bottlenecks and proposes a new approach for rethinking and redesigning existing services to use a stream-oriented rather than message-oriented communication in order to make them more responsive which will in turn encourage more service reuse, increase composability and provide better development agility without the performance concerns.
Revisiting the Goal of SOA (Service-oriented Architecture)
SOA has emerged with the prospects of solving mainly the agile software integration needs of growing organizations. SOA borrows the contract based approach in component software and applies similar principles to enable independently developed systems communicate. Once contracts are established and agreed upon, it is possible to develop systems in parallel, integrate them easily and evolve them in time without having to be always in lock-step. It also ensures that future consumers or even future providers of data can continue to deliver the services using the same contract without breaking other parties in the conversation. Thus, SOA manages the asynchrony between naturally independently developed systems and their emergent complexity. To achieve this goal, SOA’s strategy is to lay down the guidelines for establishing the necessary agreement or contractual platform to allow systems that co-evolve over time even if they are not in sync.
This strategy is actually just a scaling up of a principle in software engineering that has been addressed by various methodologies; managing complexity. By acknowledging the inherent characteristics of such distributed development, their asynchrony, and their independence, SOA was able to address the issue very wisely. SOA’s success can be observed in the fact that more and more systems are written as services and built on top of other services. Large scale distributed computing has been thus rendered accessible and complexity that plagued old monolithic systems has been reduced to a manageable level.
Problems of SOA
On the flip side, chaining service calls to do reuse of existing services leads to performance problems that must be addressed. And those problems are unfortunately not directly addressed by SOA. If you think about it, none of the four tenets of SOA suggest a solution. So let’s try to analyze the performance related problems that organizations face especially at a stage where SOA is just about to take off:
Scalability
Services are usually designed to handle a known or estimated amount of simultaneous clients. However, SOA’s goal is to make services reusable which means there will necessarily be more and more clients in time, either because of the increase in the number of end-users or because of the increase in the number of back-end service calls caused by chaining. When new services are built to use the existing services, existing services quickly start failing to meet the originally intended and promised SLAs. Throughput reduces, latencies and downtime increases.
Latency
Chaining services to others or putting intermediaries that process messages only adds to the latency. For example, if a service-oriented architecture includes an orchestration layer, responsiveness of services becomes a high priority. Latency is the worst form of performance bottleneck that is hard to address with the currently used methods. Intermediary hops fold the latency caused by large data transfers. Simply adding more servers do not solve the latency problem, because each call is an atomic unit of work which is usually not designed to be parallelizable. In an era where parallel computing is resurrected, SOA does not provide the necessary level of guidance for future services to be developed to meet increasing load and data size.
Composability / Reuse
A direct consequence of cumulative latency is that; services exhibit a bad composability and reuse characteristics. When new services are designed, reuse of existing services is avoided because of performance concerns. This means, services are not as valuable assets of an organization as they should be. This issue is actually more apparent in organizations that have the sufficient level of maturity to start systematic service reuse. Reimplementing new services each time new integration needs arise means there is essentially no reuse at service level. It also leads to direct app to app integration which somewhat defeats the purpose of ‘service-orientation’ because in such a world applications take the priority over services; the exact opposite of what is preached by the SOA methodology.
Parallel Computing
Since SOA style services tend to have coarse grained contracts, more callers mean more data transfer and more processing power and cannot be easily parallelized. The end of Moore’s Law as we know it is shifting the ever increasing performance demands to multi-core hardware architectures. Although, new languages emerge to simplify writing parallel code, it is still not clear how this effort will be able address SOA specific concerns.
Current Efforts
There are several approaches to solve the above problems of today’s SOA based systems. Some of the currently used solutions are: load balancing, caching, queuing, asynchronous execution, parallel service calls, dynamic scaling, Comet, HTML5 web sockets, HTTP streaming. None of these solutions are sufficient for every possible scenario and usually used in combination with others.
There are also other innovative approaches to bring parallelism to service development, like the Software Pipelines methodology or Software Pipelines Optimization Cycle (SPOC). The idea is to increase throughput by distributing the embarrassingly parallel business logic to multiple threads or servers while also keeping the order of processing where necessary. This approach requires a sufficient workload to be available in the input to make use of such parallel power. It also doesn’t specifically solve the latency problem that was mentioned above especially where multiple service calls are chained due to service reuse.
Another Possible Solution
In this article, I will try to propose a solution which will solve primarily the latency and throughput issues in the context of growing service reuse, data size, and workload.
It may be possible to address both the throughput and latency issues by rethinking the way services are designed in the following ways:
- Use the inherent parallelism between clients and servers to overlap processing of messages.
- Redesign the service contracts to be more explicit about multiplicity of elements in the input or output and allow consumption of partial data.
- Use streamed transfer of service request/response data, instead of buffered transfer.
- Write services to produce data as it becomes available or as it is computed, rather than waiting till the end of processing.
- Write clients to start consuming data as it is received from the server, rather than waiting for the whole message to complete.
- Allow and encourage clients to cancel ongoing operations as soon as possible, thus releasing server resources earlier.
- Allow and encourage chaining of such service calls to reuse existing assets without performance concerns.
- Allow and encourage the service code to parallelize the processing of incoming data elements.
Use of Inherent Parallelism Between Clients and Servers
As obvious an opportunity as it sounds, the inherent parallelism between clients and servers are not taken into account in developing business services. A client has to wait for a service to complete its response before starting to use it. This may sound a natural way to go for business applications. But in reality it is just wasted time that adds to latency at every node. Service reuse makes the problem even more dramatic. Each service that is reused means additional wait time for the end user. The reason why such an obvious issue has not yet been the number one killer of SOA is because services have mostly been called by their direct consumers, or up to two levels of call chaining despite the latency penalty. Unfortunately, more and more people that go into SOA develop services with the hopes of reusing them in the future and end up hitting the brick wall. SOA infrastructure of organizations will and do inevitably have to deal with this problem sooner or later.
Fortunately, SOA developers aren’t the only ones who have the latency problem. Media streaming, online gaming, and other real-time systems developers already know the problem and practically solved it. The trick is to actually make use of the fact that client and server can run in parallel and process partially transferred data. Simply let the client start using the results as soon as the server starts generating them. This technique makes it possible to start watching a YouTube video almost as quickly as you hit the play button. If YouTube worked like today’s web services, you would have to wait for minutes to download a video that you will probably not even watch till the end. Just think about what the implications would be:
- You would waste your time just waiting for the download.
- The part that is already downloaded on your computer would be idle and take space in the memory or on your hard disk.
- You would waste your network bandwidth.
- You would waste server resources.
- People would be able to watch less number of videos but consume more system resources.
Instead, we get all the benefits by simply changing the design to use an asynchronous mode of operation. The client and the server, and every other device in between are already running in parallel. So why not make use of it before we start parallelizing our business code on the server side? It sounds like we are really missing something here. Doesn’t it?
Reality is a bit more complicated than this picture. Our serial way of designing and coding business application code just prevents services to behave this way. A common pattern in SOA is the request/reply pattern. It makes designing and implementing services very simple. It does not dictate what the request and reply should be. This is because the server is assumed to only accept fully formed request data, and the client is assumed to only accept fully formed response data. Building security, logging, auditing, routing, and many other facilities based on this idea is pretty straightforward. It’s just document processing. Problem is, when document size gets big, the operations required to generate the document grow, or the steps that work on such documents are chained, the idea of passing full documents around starts to crumble.
If you think about the nature of many business services passing full documents around actually is not that necessary at all. Many services just pass some query criteria and get a list of elements. If only the response data elements could be pushed by the server as they are generated and clients could consume them as they receive those elements, the perceived latency would be reduced dramatically. Just as in media streaming, business data elements would behave like a stream that flows back to the client and the client wouldn’t have to wait for the whole service data transfer to be complete. Notice that this is different than just a client side asynchronous service call because it doesn’t just run the service call in a background thread and notify when the full response is available. It actually let’s you start consuming the results even if the whole response is not received, and perhaps even if the server side processing is not complete. This idea actually has other implied benefits that are not obvious at first sight:
- Clients get quick response and start consuming the results almost at the same time as service processing starts.
- Clients can cancel an operation in the middle, and let server resources free earlier.
- Server resources are used only as requested, rather than always requiring large atomic transactions at every single call.
- If back-end services called also behave the same way, latencies are increased by only the processing time for a single element, not for the whole data.
- Multiple levels of service reuse become feasible as latencies are affected minimally.
- When back-end transactions are initiated based on a decision or previous data, cancelling an operation saves a lot of back-end resources too. (Of course this is just an idealization. In reality there is usually a transfer buffer that introduces some latency which is still much less than the full response time of a service)
- Since response time is low, and cancellation is possible, it becomes feasible to run business logic queries that rely on back-end calls.
- Since processing time for many calls become shorter, throughput may increase proportionally.
- If the same async design is used for request data (list of input elements), then the service could start processing as soon as it receives the first elements from the client.
- Incoming stream of elements are much more easily parallelizable using something like the software pipeline approach.
- Even when there seems to be not sufficient requests to justify a software pipeline approach, splitting data into elements or smaller units of work creates an opportunity for better parallelism, thus feeding a pipeline better.
This list can go on and on as we start thinking about the possibilities that are opened up. What about disadvantages?
- When a client’s consumption rate (not just data reception) is slower than the service, the service connection will have to be kept open until the client decides to give up. The usual web services do not leave this kind of decision to the clients, which make them immune to this issue. But for such streaming services, it’s a weakness that is easy to abuse. Setting reasonable timeouts and consumption rate requirements for the service could provide some defense against such abuse. Remember that buffered services could also be abused in different ways.
- If the service is connecting to a database and implementing a similar asynchronous processing pattern, then the database connection will also be kept open until the client(s) are done processing. This may reduce database scalability. To address this issue, database access could be executed in the background within the service, buffered in the memory and but pushed to the client as results become available.
- Today’s message based web services security work on the full request/reply document. With streaming services, this is not possible due to the nature of partial data that flows through each node of processing. A new approach is needed. One solution is to create chunks of data that are separately signed/encrypted. Another solution could be just signing the header of a stream of elements and encrypt the rest using a session key passed in the header.
- Intermediaries that do not acknowledge the nature of such streaming services could cause buffering of data and kill the advantage altogether. All intermediaries have to be also designed to allow flow of partial data rather than relying on full message content.
The Role of Call Cancellation
Why is service call cancellation important? Real reusable services do not know their ultimate consumers and cannot even make assumptions about their consumption patterns. For example, a service could be requesting some data, but using a significantly small portion of it. The obvious and efficient solution to this issue is to filter and return only a restricted amount of data to the client rather than returning an unlimited number of results and then let the client decide when to cancel the call. Question is how realistic is it to assume that we can modify services each time there is a new client. Is it even compatible with the SOA mindset? Another solution is to create more granular services and let the clients decide how to compose them. This is the approach of OOD, but service calls will never be as cheap as object method calls and hence have a bad impact on scalability. In some sense, this is why we treat services specially rather than just remote object calls, isn’t it?
Therefore, cancellation is actually a good thing to have for this kind of asynchronous streaming services, so that we can at least give clients some control over how much of the processing should be done and how much of the results should be pulled for consumption. Here is a list of reasons to have cancellation for streaming services:
- Existing services may not have the filtering capability.
- Filtering capability on the server side may not be feasible to implement.
- Some filtering cannot be implemented on the server side due to client specific knowledge. Since the business logic tends to become more and more distributed, this problem becomes more common.
- When new services are built on top of existing ones, it may not be a safe practice to change the existing service behavior. This would also mean that, each time something new is built, the older services have to know about the new concerns. In small or closely working teams this might not be an issue, but in large organizations this is definitely a major concern.
- As the level of reuse (call depth) increases, the abovementioned issue becomes even more dramatic. You cannot write a service that knows and satisfies the concerns of all types of direct or indirect clients.
- End users that initiate calls to services may actually prefer cancelling, or reinvoking the service. Imagine what it would be like if you could not stop a download that you just started. This problem will be more and more common as different types of clients (mobile as well as desktop) become widespread. The responsiveness on mobile devices is a top priority and no one would be patient enough to waste time waiting for a long service call to finish execution.
- Rules engine type of clients could invoke such services more aggressively, consume data as much as they want and disconnect. Since rules are flexible and configurable at the rules engine level, it is not feasible to expect back-end services to be optimized for the new concerns each time new rules are added or existing rules are modified. Sometimes, this might even require creation of new services because a single method call is no longer sufficient to implement all the rules at hand. It is much more straightforward and more maintainable to implement rules that rely on existing services. If services can provide a stream of data that is partially usable and also cancellable, introduction of such rules on top of existing services will become feasible with the least possible impact on performance.
What are Fluid Services?
The above described idea can be understood as a natural extension of the concept of iterators in OO programming languages. An iterator provides sequential access to a list of objects without exposing the internal details of how the objects are generated or where they come from. An iterator could just spit out elements of an existing array or list object. But here’s where it really shines: When an iterator spits out objects as they are computed without actually using an underlying storage, a whole new world of possibilities open up. Now you can independently write code that produces datasets, and code that consumes them without having to store any intermediary data or even without storing all of the data. You can chain functions that take iterators to create even more complex processing that operate on a set rather than single objects. What’s more, you can even run them in parallel and overlap operations that work on different elements. This is similar to a multistage pipeline in hardware engineering. As function A (stage A) is operating on element N, the next stage B (function B) could be operating on element N-1 at the same time. Multiple such steps/stages (functions) could be chained in this way without folding the response time. When new steps are added, the total latency is increased by only the amount of processing time of a single element for additional steps, not the additional total time for all the elements for all the steps.
Imagine a similar implementation for a web service using the above mentioned asynchronous producer-consumer model. The response of a service could behave like a remote iterator on a computed result set. A client that is inherently parallel to the server just starts iterating on the asynchronously produced elements as they become available. The client could be smart enough to wait for the server if results are not available yet, and notify the server when it no longer wants to continue processing. A metaphorical analogy, although more ambitious is the flow of fluid in a pipe. The request and/or response of such services are pretty much similar to fluid flowing into and out of a pipe. A pipe could be split on the server side, joined again, and connected to other pipes on other servers and so on. The good thing about this architecture is, client decides when to flow data into the pipe, and when to stop it, and when it decides to do it will see a very quick response at the outflow. Also, this architecture is really compatible and supportive of the software pipelines approach, although this is more focused on the latency aspect. So I think, the ‘fluid services’ metaphor is appropriate enough to position the idea and the intent of it. Another similar paradigm is ‘stream processing’ which is devised to exploit parallelism on a stream of data elements, usually primitive data types like floating point or integer. The idea is similar, but it’s mostly about low level programming concerns within the restrictions of a given target hardware (e.g. GPUs, DSPs). Applying those ideas to business services development requires a different point of view although the approach is essentially similar. To distinguish the difference, a new name would be more appropriate. So I prefer to use the term 'fluid services'.
How to Design Fluid Services?
Contract
A fluid service contract should be designed in such a way to allow transfer of multiple atomic data elements rather than a single atomic data element.
A request/reply style service’s contract normally contains operations like:
- DataResult GetData(DataRequest request)
A fluid service contract should on the other hand look something like:
- IEnumerable GetData(DataRequest request)
This makes it clear and known to clients and all other intermediaries that the service is able to generate an open ended stream of data elements.
The same approach could also be used to design the request data contract:
- SubmitData(IEnumerable dataElements)
A complete fluid service operation contract would look something like:
- IEnumerable ProcessData(IEnumerable dataElements)
Just by changing the request/reply pattern into a requests/replies pattern, it becomes possible to overlap production/processing/transfer/consumption of partial data, reducing the perceived response time and potentially terminate processing prematurely to save precious resources.
The size of data elements must be granular enough to keep elemental latencies low. This would make sure that adding intermediary processing stages would not increase the latency significantly.
Transport
A fluid service should use streamed transfer mode rather than buffered. In reality even streamed mode uses a transfer buffer to send the data through the wire. However, buffered transfer usually refers to a buffer that holds the full message document rather than a portion of it. The streamed transfer mode makes it possible to start transferring the data as it is written out. The same streamed transfer mode should be also set in the client so that it will start consuming the response as it is received. In reality there will always be some latency caused by the physical network and transfer buffer. But this latency is way shorter than a message that takes seconds to be received in full. This makes it possible for a fluid service responds almost immediately even when services are chained to call one another.
Applying Fluid Services in Real World SOA Scenario
Figure 1 shows a real world service that aggregates data from multiple back end sources, some of which are direct data sources, some existing services, and some legacy systems. Such a system incurs high latency costs and sometimes become unacceptably unresponsive. The users of such an aggregation service may not know what back-end services are involved but they will know the slow experience for sure.
Figure 1
This is already a problematic case, but what if we want to actually reuse the aggregation service in another service? This could be another aggregation or orchestration service. The red box will automatically become a hot node and may easily cause congestion. But even when there is not much traffic, the latency problem will still be there and will resist all attempts to fix it. Simply put, this service is practically not really reusable anymore, because if you reuse it no one will use it.
The sequence diagram in Figure 2 shows the dramatic impact of accumulating latencies due to careless sequential design.
Figure 2
Server triggers all back end transactions one by one without utilizing any kind of parallelism at all. The client has to wait for all of the serially executing steps to complete. The client also has to wait for the full message to be transferred before starting to process it. This includes serialization, buffering, network transfer time and deserialization of the entire response.
This design is unfortunately what most people are gravitated towards because of its simplicity. Considering all the effort and investment that goes into enabling better and more manageable parallel processing for multi-core and distributed systems development, isn’t it ironic to keep designing systems that are bogged down by sequential logic for the simplest and most obvious parallelism opportunity?
One possible and obvious solution is to start back end transactions almost at the same time to run them simultaneously as illustrated in Figure 3, wait for all the transactions to complete and then respond to the client. This design actually reduces the response time to a great extent and is probably already used by some systems if not all.
Figure 3
However, this approach still has several deficiencies:
- The server eagerly triggers all back-end transactions to give a quick response.
- Client has to wait for the longest running transaction and network transfer time.
- Client has to wait for the entire message to be processed, the full response to be generated and received before starting to use it.
Now let’s look at how the proposed ‘fluid service’ design utilizing the parallelism of all servers producing and consuming data in parallel at all layers would be like the one shown in Figure 4.
Figure 4
Our service first hits the database, and just returns the results as they are received from the database. So the perceived response time from the client’s perspective is incredibly short. The service will be executing as the client processes it simultaneously. At one point the db transaction is finished and the back-end service is called. The client will almost immediately start getting the results from the service as soon as the service hits the back-end service. Client can cancel at any time, which could be earlier than some of the service calls. This means, unused calls are never actually done. Of course, the service could behave more eagerly and process the data before client actually consumes it. This would improve the response time for the services even more, and would not hold on to back-end resources for a long time. However, that would also mean that other incoming requests will have to compete for resources for their own eager utilization. Therefore, triggering back-end transactions only when the actual consumption occurs will probably be much better for scalability. On the other hand, when a database transaction is triggered, it is much better to consume all the query results as soon as possible rather than waiting for client to catch up and request more results.
One of the most important benefits of such fluid service design is its great reuse/composability characteristics thanks to the minimal impact on perceived response time when calls are chained. Immunity to latency barrier creates new opportunities and shifts the mindset to embrace the reuse culture just like in OOD.
Comparing Different Types of Service Reuse
Today’s service architectures lead to services that directly talk to data sources and avoid service reuse. This can be achieved either by replicating business rules into each and every isolated service, or by reusing behavior at component level (Figure 5.)
Figure 5 - Services with No Service Reuse
While this design is acceptable in small organizations, it suffers from complexity introduced by deployment and management of reusable components. This is the approach of most component and object oriented methodologies. Data is processed by in memory reusable objects. This design also suffers from increasing response time due to increasing processing.
Figure 6 shows SOA’s service reuse approach which basically removes the deployment and version management issues of component oriented and object oriented approaches. However, it has a chronic and terminal illness caused by the accumulating latencies at every service reuse.
Figure 6 - Services with Service Reuse
Figure 7 is basically what Fluid Services approach presents as a solution:
Figure 7 - Fluid Services with Performant Service Reuse
Services are still reused but response times aren’t affected significantly, and resources are utilized gracefully. A fluid service call triggers a chain reaction that reaches to the leaf nodes quicker and gets a response quicker. All involved parties keep processing as long as the client opts to continue consuming the response. When the client decides to stop consuming, the services stop again with a chain reaction.
Figure 8 illustrates just one scenario where the Fluid Services would really shine:
Figure 8 - Fluid Services with Rules Engine
A rules engine is used to get real time data from multiple services and process results. But the back end services may not have been designed for the kinds of rules that are currently being executed by the rules engine. According to the principle of separation of concerns, this is exactly how it’s supposed to be. The back end service should have no knowledge of current rule repository, and the rule engine should have no knowledge (other than the contract) of what the internal details of the existing services are. The rules can be stored in a central rule repository and are flexibly and much more quickly changed than the services themselves. Since rules engine can trigger transactions and start consuming the responses quickly and until it meets a certain criterion, it is much more feasible to actually make this scenario work and scale well under growing amounts of load. The same argument can well be made for an orchestration service, for it is almost the same idea, except the rules may not be that flexible.
Services as a Processing Pipeline
The fluid services concept can also be analyzed in terms of the ‘pipeline pattern’. When multiple service calls are chained and each service is able to work on partial data during the call, the operations performed at each service (pipeline stage) overlap. A physical analogy is a ‘serial assembly line’. Each stage in an assembly line is designed for a specific purpose but is always kept busy by feeding one stage’s output into the next stage. This makes sure the assembly throughput is maxed out because no intermediary stage is kept idle until the processing of a single service call is complete.
Figure 9
In Figure 9, it takes only 3 stages to complete a single product and we don’t even have to wait for the whole production to finish. So the latency is also pretty low. If a serial assembly line were to be designed the way we design web services today, it would have to work on a large batch of items at each stage increasing the stage response time, the overall completion time would stretch out dramatically, and throughput would be reduced. The system could still be relying on the arrival of new batches to keep the system busy. But if there is not enough number of batches to feed the stages, the assembly line would stand idle for long intervals of time. Even if there is enough batches that arrive, that would still not change the total time for a single batch to complete.
Similarly a service can be thought as an assembly stage that is specialized to perform a certain type of task. If we can keep all the stages (services) busy as soon as possible even for a single call, we will be able to make use of the parallelism between them to increase the throughput and reduce the total response time just like in an assembly line.
Figure 10
Figure 10 illustrates a simplified view of the overlapped processing steps in which a client triggers a chain of 3 levels of service calls. Despite the call depth, the client is able to start consuming results pretty quickly. Even if the number of elements that the services actually return is large, the perceived response time is still low. This makes such service architectures immune to data size growth problem. Most operations do not really require consumption of all the data returned from services. Rather the data is used by a client selectively, perhaps just to display a limited amount of data, or to find data that matches given criteria.
Even if we pull the entire dataset, we are still better off by utilizing overlapping the produce/consume time because the total time is not multiplied by the number of stages (services). Today’s web services on the other hand, has to buffer the data at each service and return the full dataset which means the total time is multiplied by the number of services assuming each service has the same processing time.
Scalability Characteristics of Fluid Services
Fluid services thus make sure services have well scalability characteristics in the following dimensions:
Figure 11 - Throughput is kept constant as the call depth grows
Figure 12 - Latency is kept constant as the call depth grows
Figure 13 - Throughput decreases linearly as the data size grows, but it has lower slope
Figure 14 - Initial latency is kept constant as the data size grows
Figure 15 - Total latency increases linearly as the data size grows, but it has lower slope
Buffered services incur latency cost at every stage (service) whereas the streamed services incur latency cost only once because almost all processing time is overlapped.
These are not actual test results but only projections plotted based on a qualitative analysis. The real results may be affected many properties of a system including channel and protocol used, network topology, development platform, operating system and others. The purpose of this qualitative analysis is just to give some sense of what we can expect to gain by adopting the proposed design style.
Some Challenges Facing Fluid Services
Here are some of the important problems that must be addressed before adopting the ‘Fluid Services’ style design for a wide range of future services:
Streaming
The streaming capability is an absolute requirement for fluid services to be of any use. If the platform does not support such capability, any workarounds that may be devised may end up creating more problems than solutions it provides. Streaming or a similar feature is supported by HTTP, TCP, Named Pipes, ADO.NET Async Command Execution. On the other hand, queue based message oriented systems like IBM WebSphere MQ, JMS or MSMQ is less likely to directly support such overlapped processing capacity. A message splitter pattern may provide some workaround for this shortcoming but real life performance characteristics must be analyzed to say something conclusive.
Overlapped IO and processing
Overlapping service processing with serialization and data transfer is the most important first step to designing a fluid service platform. But this is only for service response which is half of the problem. The same design must be implemented by the clients of such services. Namely, they have to overlap client’s processing with service response reception and deserialization time. The serialization and deserialization must also be designed so that it is aware of the multiplicity of data elements in the request or reply. A fluid service should return almost immediately from the service method but keep processing and pushing data into the reply channel as the results are computed or received from another source. A fluid service client should execute consumption code as it receives results from the reply channel. Although this design sounds much more complex than usual serial type of service programming, language syntactic sugar (like iterators, yield keyword, continuation pattern) can greatly reduce the complexity.
Call Cancellation
Although not an absolute requirement, call cancellation is one of the neatest features that fluid services can potentially benefit. Depending on the channel type and protocol call cancellation may or may not be possible. A fluid services design strategy with robust support for call cancellation will definitely pay for itself enabling better scalability, composition, distributed processing, rules engine processing, orchestration as well as better end-client response characteristics.
Holistic Aggregators, Holistic Rules
The fluid services require that each node in the system is capable of processing partial data. This is usually something like a domain entity or other complex type that is returned as a sequence of elements in a streamed fashion. However, there are certain types of holistic computation that requires availability of the entire dataset to work on. Some examples are:
Sorting, filtering, vertical calculations like averaging, summing etc, per element processing, processing that has persistent side effects and should not be interrupted in the middle.
Some solutions may be developed to address the above types of operations to be performed while still keeping fluid services style of design. Since these types of operations require some or all of the data to be buffered before executing, it might be a good practice to just defer the buffering to the client so that intermediary and back-end services all work in parallel using streamed communications.
Sorting is a particularly challenging problem: If there are multiple back end services that are called and aggregated, a resequencer pattern may be employed to buffer some of the data and send only after making sure correct order is achieved. If each back end service were to return its own data in the same sort order, then the resequencer wouldn’t have to wait for the whole data set to complete.
If it is not possible to solve the problem by partial buffering of the data, and a full buffering is inevitable then the architecture could be designed so that the buffering occurs only once (or a minimum constant number of times) in the service call chain. This type of processing would be best handled as close to the end-client as possible, and the back end service call chain could just work with unordered data. This will still ensure that the full buffering cost is incurred once (or at least a minimal constant number of times) and adding new services will keep response time constant.
Vertical operations like average, sum, count, max and min etc also requires special handling. But these operations do not really require buffering because of their commutative nature. All such commutative operations could be processed as they pass through a processing pipeline stage. The only requirement is probably that all elements are taken into calculation. Therefore, this type of computation will eliminate the possibility of call cancellation but it is still possible to keep response time constant against call-chaining.
Fortunately, most service calls do not require such complex processing and will be able to process data as it is available and will also allow cancellation. But still, a real world fluid services implementation will need to address this issue intelligently taking into account all other technical considerations of the platform(s) that it targets.
Conclusion
By solving the call latency, throughput, and call cancellation issues the reuse/composability characteristics of services will be greatly improved, and it will be much more feasible to actually start reusing services to create other services which will in turn reduce development and maintenance costs significantly. Evolving service-oriented systems will be able to catch up with growing workload, data size and functionality without impairing the performance which will greatly increase the value of existing services.
of |
