Monday, May 27, 2013

SOA Architecture - The future is the Any-Any-Any Architecture

The Any-Any-Any Architecture (A4) is the consequence on current IT realities, new innovations and developments. I come across the term “Any-Any-Any Architecture” from a replay of a conference keynote from Anne Thomas Manes (VP&Distinguished Analyst, Gartner) on “New Paradigms for Application Architecture: from Applications to IT Services”. It means that applications could run on Any Device using Any Services working with Any Data sources. A4 must be deployable on the Cloud in order to have a faster time-to-market and to reduce costs by utilizing a uniform elastic platform.

In my opinion all signs increasingly pointing to an A4 direction because more and more smart connected devices running applications which request any services utilizing all kind of data sources. Modern interactive applications will be designed in the A4 architecture style.

image

The “application” is now on the device using a collaborative service mashup from different service-enabled backends. The application should work on Any Device with any resolution (and any up-to-date browser). The dominant model with server-side applications will change to client-side applications talking to server-side services running on premise and/or cloud. These services will use a variety of different data storage technologies for different kinds of data. Big Data will increasingly play an important role as the data volume will grow. NoSQL databases provide alternatives with better scalability and performance to the predominant relational databases. Martin Fowler is talking about Polyglot Persistence in this context. So polyglot implemented services which use Polyglot Persistence will become reality.

The change to Any Device happens today, whereas the change on Any Data is slow-moving because companies are naturally conservative when it comes to their data storage (to quote Martin Fowler). The change to Any Service should already have taken place.

Any Device

Mobile computing is actually the driving force on Any Device. Smartphones and tablets are pushing the mobility growth. Additionally new innovative devices like Google Glasses coming soon. Google Glasses belong to the device category of Wearable Computing Devices which are sharing the same challenges like mobile computing including the software architecture. All these connected devices need services to run applications. The service delivery of functionality and data has to be safe and guaranteed at any time.

Browser-based applications using HTML 5 with UI-centric JavaScript (JS) will succeed. Multi-page web application with a “Thick Server Architecture” are turning to Single-page Applications (SPA) with a “Thin Server Architecture” (TSA). jQuery is actually one of the most often used JS library. Newer touch-optimized libraries like jQuery Mobile will address the requirements of mobile devices. But there is a whole bunch of new JS libraries to enable Service-Oriented Front-End Architectures (SOFEA). SOFEA will expect RESTful Web Service interfaces.

I guess GWT and GWT-based products (like Vaadin) or JSF-frameworks (like Oracle ADF Mobile) will not win out over. JavaFX is pushed by Oracle and could also run on Android and iOS devices (demonstrated at JavaOne 2011). But will these technologies break the HTML 5 with JS hype?

Any Service

What’s about SOA? SOA is Any Service on an A4! Some IT practitioners think SOA is a buzzword that is history now. But in my opinion it’s the reverse case. SOA is definitely a pre-requisition of the A4 (to quote Anne Thomas Manes). Beside the reduced costs, monolithic environments with silo-based applications have not the agility and flexibility to serve multi-channel consumers (Desktop, Web, Mobile, Social, B2C, B2B and B2E).

The IT industry has mature technologies to build business-critical services running on fail-safe, scalable and highly available systems already today. One of the most promising solutions is the Service Component Architecture (SCA) which is an independent OASIS specification. Pure JEE Runtime Containers or the SpringSource Application Platform are also a reasonable alternatives. But in my opinion these solutions are looking pale compared with SCA supporting middleware suites. But a certain investment of time and cost is required to bring these middleware suite into Enterprise reality. 

SCA is a technology for creating services implemented and assembled as composites. SCA does not specify a presentation-tier or persistence technology, instead SCA has the focus on service implementation and integrate with other technologies. SCA Services itself may be implemented using different technologies and programming languages. Therefore you could add one more “Any” for Any Language of service implementation. Services are more and more polyglot developed. Multiple General-Purpose Languages (GPL) are mixed with multiple Domain-Specific Languages (DSL). As I already stated on my last blog, in order to produce polyglot implementations you need environments where polyglot-ism is encouraged.

SCA addresses the polyglot complexity and forces reuse. Therefore the SCA technology has strong supporters like TIBCO, IBM or Oracle. For example the Oracle SOA Suite 11g is based on a SCA polyglot supporting platform (including the SCA Spring components as part of the composites) and come along with an well-integrated ESB, BAM and CEP product.

Any Data

Data Services are Any Data. The dominance of relational database management system (RDBMS) will be broken in the future. NoSQL (Not only SQL) databases will become equal priority beside SQL databases. Different kind of databases like Neo4j as graph database, MongoDB as document-oriented database and Cassandra as distributed key-value database will become normality (just to name a few). Newer architectural pattern like CQRS (Command and Query Responsibility Segregation) will force the co-existence of SQL and NoSQL databases, because view data do not need to be placed on heavyweight SQL databases. In-memory technologies will also gain more and more importance and show us a way out of the “database is always the performance bottleneck” issue (“memory is the new disk, and disk will become the new tape”). Todays high-speed performance requirements need in-memory databases (IMDB) like Redis or SAP HANA and/or in-memory data grids (IMDG) like GemFire or Coherence. High data volumes on any type of data will be addressed by Big Data concepts (Volume, Velocity, Variety) and the de-facto standard open-source Hadoop computing framework. Even new integrity models with No ACID transactions and eventual consistency are coming up.

A4 One Step Closer

Let’s take a deeper look on the A4. What do we have today and how is it mapped on current technologies. Therefore I follow in essence the reasoning from Adrian Colyer (CTO Cloud Application Platform, VMWare) given on the SpringOne 2GX conference in 2012 (keynote: “The New Application Architectures”). In my opinion this excellent keynote confirms A4 in more detail.

The presentation layer logic implemented in terms of well-known MVC frameworks like Apache Struts will disappear from the server-side. The service layer is moving up and is now the front-line on the server-side (TSA/SOFEA principle). But the advantages of the MVC pattern are actually recovered by JavaScript developers in order to have the right level of abstraction and maintainable code structuring. JavaScript MVC frameworks taking control on the client device side.

Client developers could create also browser-based web applications which are able to run offline. Google Gears was the first step on client-side persistence and is now continued with the standardized W3C Web Storage specification. Web Storage (also called DOM Storage) supports durable local and temporary session storage.

The server will also be able to initiate the communication to the client. Server-side asynchronous events and notifications are pushed to the client. The W3C WebSocket API specification enables a durable two-way TCP communication between clients and the server. WebSocket support a reliable “real-time” communication with minimal latency.

Today we have several JavaScript libraries which are used in conjunction to do DOM manipulation, templating, modular loading, UI component handling, MVC structuring, etc. So AnyDevice development will come at a cost - but it will come because the benefits are worth it.

Business domain services will provide functionality and data through RESTful APIs. Private service-to-service communication will still use “SOAP over HTTP” Web Services or other binding types (like JMS, RMI or Socket). But lightweight REST Web Service communication to the client device has become accepted and prevailed in web-oriented architectures. Most common REST will return JSON or XML.

“SOA Products” offered by mature middleware suites like TIBCO ActiveMatrix BusinessWorks or Oracle BPM/SOA Suite will help to build Any Service connectable to Any Device. These suites absorb already a lot of complexity coming with SOA, but these products are not sufficient to “do SOA”. Finally SOA implementation needs an own practical architecture approach. As already mentioned, SCA technology will help to build business domain services, but an own “internal” design is necessary. And in my opinion this design is not platform agnostic. So you will need a more platform-specific idiomatic design, based on the selected middleware product. 

Access on Any Data will happen on SQL and NoSQL databases, not forgetting Content Repositories. The most common ORM tools already support NoSQL databases (like Hibernate OGM and EclipseLink NoSQL). But also other forms and types of persistence stores like HDFS (Hadoop Distributed File System) are expected to become increasingly the standard case.

image

More and more solutions will move towards a Cloud platform because of a budget dilemma and time-to-market considerations. On Hybrid Clouds parts of the solution are running on a hosted Private Cloud and parts on the Public Cloud. Hybrid Clouds will appear as a usually solution in the future. Sensitive services on-premise, others on shared and rented off-premises solutions. So fluid IT boundaries turn to reality.

Application services should be deployable to wherever it is optimal (a decision mainly based on security and compliance concerns of mission-critical solutions). PaaS Clouds will play an important role on the application journey to the Cloud. PaaS is about Middleware running safeguarded on-premises or on the off-site Cloud. In my opinion it is desirable to choose a PaaS Cloud solution where application services could work without modification on both, on-site and off-site at a Cloud provider. 

For example Oracle provides a Virtual Assembly Builder (OVAB) (as part of the Cloud Application Foundation 12c) which capture the application topology and virtualizes from the environment. Afterwards the deployment could happen on conventional or engineered systems, but also on hosted Private or Public Clouds. So on-premises and off-site (remote) solutions are supported equally. BPM and SOA Suite developed services could benefit, but also pure Java solutions would be deployable on Oracle MWaaS-enabled Weblogic application servers (MWaas = Middleware-as-a-Service).

Conclusion

The A4 objective is to create solutions, allowing your company or organization to be innovative and competitive (by keeping the in-time, budget and quality constraints).

Gartner warned already last year: “Applications created in 2012 using traditional architecture models will be an IT-constraining legacy by 2016”. Jeff Bezos (Chairman and CEO of Amazon) realized quick the benefits of a Service-Oriented Architecture already 10 years ago. Therefore he formalized simple rules to transform Amazon internally to a SOA. Data and functionality has to be exposed through service interfaces which are the only way of communication. A4 also highly demands independent and autonomous components. Client-side applications must be decoupled from services and data. Companies and organizations without a SOA approach have to realize that it’s late, but the longer they wait, the closer they get to being Too Late.  

Healthy companies or organizations always seek to improve their ability to create software and general conditions that meets their needs at reasonable cost. A4 provides competitive advantages, so think about what A4 means for your company or organization.

Friday, May 10, 2013

SOA Suite Knowledge – Polyglot Service Implementation with Groovy

Polyglot programming, defined most simply, is the use of multiple languages in software development. Implementing services on the SCA Container is already a intended polyglot development approach. Oracle SOA Suite have the SCA service implementation types of BPEL, Human Workflow, Mediator, Rule and Spring Components. These components are mixing the General-Purpose Language (GPL) Java with Domain-Specific Languages (DSL) like XSD, WSDL or Schematron. But Spring Components are also enabler of service implementations with other JVM GPL languages beside Java. 

In my opinion Neal Ford was absolutely right when he coined the term “Polyglot Programming” and predicts “Applications of the future will take advantage of the polyglot nature of the language world … The times of writing an application in a single general purpose language is over” already in 2006. In order to produce polyglot implementations you need environments where polyglot-ism is required or at least encouraged. Oracle SOA Suite is such a polyglot supporting environment and you are doing polyglot development all the time (e.g. XML, WSDL, SQL, Java). But also on the GPL side the developers or not limited on Java. SCA Spring Components are supported since Patch Set 2 and the Spring Framework supports dynamic languages since version 2 (BeanShell, JRuby and Groovy).

Groovy is a General-Purpose Dynamic Language and has the best and seamless integration with Java (beside Spring integration Groovy supports also direct and JSR 223 integration). Therefore, using Groovy has a low threshold for Java developers; it is easy because Groovy has a Java-like syntax. Most Java code is also syntactically valid Groovy. So it’s an iterative learning approach for Java developers switching to Groovy. My first contact with Groovy was at the JAX conference in 2008. It was a Groovy newbie session from Dierk König (well-known book author of "Groovy in Action"). And for the first time in a long while of Java programming I had an awakening and still addicting interest on this development language.

The Groovy language is meant to complement Java and is adding a wide range of features that are sadly lacking in Java (for example Closures, Dynamic Typing and Metaprogramming – just to name a few). Concurrency and parallelism is increasingly important in the era of multi-core CPUs with a growing number of CPU cores. GPars is part of Groovy and offers a intuitive and safe way to handle tasks concurrently. Groovy makes it also easy to create DSLs (to simplify the “solution”). Optional parentheses, optional semicolons, method pointers and Metaprogramming let the code viewed as “executable pseudo code” (and easy readable by non-programmers). One of my personal favorites on using Groovy is the easy XML handling, both for consuming and producing XML content with its XML builders and parsers. Therefore I encourage you to take a deeper look on Groovy.

In my opinion Groovy envision also a bright future because the Spring development team announced to put a strong focus on Groovy on the upcoming Spring Framework 4.0. Spring 4 will about properly supporting Groovy 2 as a first-class implementation language for Spring-based application architectures - not just for selected scripted beans but rather for an entire application codebase, as a direct alternative to the standard Java language (quoting on Jürgen Höller from SpringSource who is the Spring Framework co-founder and project leader). The Spring Framework is the most popular application development framework for enterprise Java and will become therefore the driving force for getting more Java developers in touch with Groovy (and let them feel what is the “right language for the job”). 

The latest version of Oracle SOA Suite (v11.1.1.7) comes with Spring v2.5.6 and has also the Groovy v1.6.3. libraries on board. These versions are outdated, because Spring v.2.5.6 was released in November 2008 and Spring is already on the 3.x version. Groovy v1.6.3 was released in Mai 2009 and Groovy is today on version 2.x. Anyway, since Oracle SOA Suite PS5 (11.1.1.6) it’s possible to do Spring Bean implementation using Groovy. Oracle itself is also using Groovy, for example the Rule Reporter is written in Groovy or Oracle ADF is using Groovy as well. But the official documentation on how to write SOA Suite Spring Beans with Groovy should be improved, because you need more details to make our polyglot implementation running. Motivating for using Groovy and showing the details are the reasons for this blog. Now lets go to the details.     

First you have to do some configure steps. I show you the steps with the out-of-the-box coming Groovy library v1.6.3. But you also could download the latest Groovy version and make use of the newest Groovy features (I did a successful test with Groovy v2.1.3).

1.) Copy the $MW_HOME/oracle_common/modules/groovy-all-1.6.3.jar library to the $ORACLE_HOME/soa/modules/oracle.soa.ext_11.1.1 folder

You can add JAR files and custom classes to a SOA composite application. A SOA extension library for adding extension JARSs and classes to a SOA composite application is available in the $ORACLE_HOME/soa/modules/oracle.soa.ext_11.1.1 directory. The Groovy library has become known to SOA composite applications.

2.) Run ANT at the $ORACLE_HOME/soa/modules/oracle.soa.ext_11.1.1 folder

Running ANT on the "soa.ext" folder will update the oracle.soa.ext.jar library with a changed MANIFEST.MF file, adding a reference to Groovy.

image
Warning: This procedure is not cluster-friendly. If you're running a cluster, you must copy the file and run ANT on each physical server that forms the domain.

3.) Add the Groovy library to the Weblogic system classpath. Open the setDomainEnv configuration file and put the Groovy library on the POST_CLASSPATH setting. 

image

4.) Restart the Weblogic application server

Once the configuration is done, the Spring Bean Groovy coding could start. I’m reusing the Calculator example from an older blog about Spring Bean Contract-First. Therefore I copy the Calculator WSDL on the project folder, create a Web Service Adapter and pointing on the Calculator WSDL file. Afterwards I create an Spring Bean component and wire the Spring Bean with the Web Service Adapter (contract-first approach). The JDeveloper wizard creates the necessary Java files for the Spring Bean.

image

I’m doing only one change on the setter methods of the generated type classes for the response (AddResponseType, SubtractResponseType, MultiplyResponseType and DivideResponseType). The setter should return “this”. This approach is commonly called a builder pattern or a fluent interface. You don’t need to care about the returned object if you don't want, so it doesn't change the "normal" setter usage syntax. But this approach allows you also to return the changed class (I’m using this approach together with the end return Groovy feature) and to chain setters together.

image

Next step is the Groovy implementation of the generated Java ICalculatorService interface. I’m placing the GroovyCalculator class on the same package name like the Java interface but on the <project>\SCA-INF\src folder instead of the <project>\src folder (details below).

image

The implementation is coming in Groovy style. Semicolons and the default public visibility declaration disappeared. Variables are dynamically typed (keyword def). The mathematical operations are Closures (support for Closures in Java is planned for Java 8; recently postponed to March 2014). The last expression result will be returned (so-called end return), therefore the keyword return is in Groovy optional. In general, Groovy code is less verbose, and more expressive and compact.

You have to make sure that the Groovy class is coming on the SCA-INF/classes folder at the SOA archive (SAR) JAR file. Therefore you have to place the Groovy class file on the SCA-INF/src sub-folder and instruct JDeveloper to copy also files with groovy extension on the Output Directory (Project Properties –> Compiler –> Copy File Types to Output Directory).

ProjectProperties

The final step involves defining dynamic-language-backed bean definition on the Spring Bean Context file (the wiring-XML-file). One for each bean that you want to configure (this is no different to normal Java bean configuration). However, instead of specifying the fully qualified classname of the class that is to be instantiated and configured by the container, you use the <lang:language/> element to define the dynamic language-backed bean. For Groovy you have to use the <lang:groovy/> element (which instructs Spring to use the specific Groovy-Factory).

image

Afterwards the SCA Composite could be deployed and tested. The Groovy script will be automatically compiled during the deployment on the application server. It is worth to mention that Spring supports also refreshable beans and inline scripting.

Refreshable beans allow code changes without the need to shut down a running service or redeploy the service. The dynamic-language-backed bean so amended will pick up the new state and logic from the changed dynamic language source file. Be careful, in my opinion it’s a powerful but dangerous feature. One reason for inline scripting could be a quick Mock Bean implementation (take a look on the Mock example below).

MockBean

The Mock implementation is always returning 42 as a result.

Groovy coding errors will result in a ScriptCompilationException giving you more details about the occurred issue (reason, line, column). For example …

There was an error deploying the composite on SOAServer: Error occurred during deployment of component: CalculatorSB to service engine: implementation.spring, for composite: Calculator05Groovy: SCA Engine deployment failure.: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'GroovyCalculatorBean': BeanPostProcessor before instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'GroovyCalculatorBean': Could not determine scripted object type for GroovyScriptFactory: script source locator [classpath:calculator/example/suchier/GroovyCalculator.groovy]; nested exception is org.springframework.scripting.ScriptCompilationException: Could not compile script; nested exception is org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed, GroovyCalculator: 17: unexpected token: this @ line 17, column 84.

Often I’m using SoapUI beside Enterprise Manager Fusion Control for doing first tests. Here is the test SoapUI output for a valid division.

image

This is the output for a division by zero. The exception is throw on the Groovy class, but is also a standard based SOAP fault.

image

The source code is available as 7-zip here.

That’s it. This is not a bash-the-other-languages blog. But my personal favorite language beside Java is Groovy and there many reasons to take a look on Groovy also as a SOA Suite developer. Now you have the details to start Groovy based polyglot service implementation on Oracle SOA Suite. If you are a Groovy beginner, I would recommend to take a look at the excellent books “Groovy in Action” (from Dierk Koenig) and “Groovy Recipes” (from Scott Davis).

Monday, March 4, 2013

SOA Suite Knowledge – MTOM enabled Web Services

You might have heard about MTOM/XOP. Possibly you have a broad idea of what MTOM/XOP is doing. Than you start from the same initial situation like me some months ago. Let me give you a jump start on what MTOM/XOP means in general and in particular on the Oracle platform.

MTOM is the W3C standardized Message Transmission Optimization Mechanism which is used in combination with the concrete referencing mechanism XOP which stands for XML-binary Optimized Packaging. MTOM/XOP is a way to handle opaque data with XML like binary data of images, audio or video. MTOM/XOP does not use any compression and decompression technique. There is another standard called Fast Infoset that addresses the compression issue. But large application data which need to be compressed for efficacy could be also handled by MTOM/XOP (more about it on the given example below).

XML parsers will fail when you place binary data in a text node of a XML document. The binary data could contain reserved XML characters like ‘<’, ‘>’ or ‘&’ which will break the parser on the “other side”. CDATA wrapping allows these reserved XML characters but the parser is looking for a ‘]]>’ (which marks a CDATA section end) and there is the risk that the binary data have such a byte sequence. Base 64 encoding (xsd:base64Binary) is a solution because each byte is mapped to a valid 64 character “alphabet” which guarantees that a byte is not misinterpreted as anything that would break an XML parser. Unfortunately, base 64 encoded binary data are on average 33% larger (number of bytes) than the raw data. The alternative of hexadecimal encoding (xsd:hexBinary) even expands the binary data by a factor of 2.

NoticeBox

Another idea to handle opaque data is to place them outside of the SOAP message. This technique is well-known in the Java world as “SOAP with Attachments” (SwA). The SOAP message contains a reference to the binary data which are placed at the Multipurpose Internet Mail Extension (MIME) attachment.

SOAPMessagePackage

But the binary attachment is not part of the SOAP Envelope at all. It’s leaving it up to the message processor to retrieve the actual binary data. Standards like WS-Security couldn’t work anymore, because WS-Security can’t be used to secure attachments. Furthermore the Microsoft world was using DIME (Direct Internet Message Encapsulation) encoded attachments. It’s not a good idea that Web Service providers have to support both attachment technologies, SwA and DIME.

Since 2005 the agreed solution is MTOM/XOP. It has the efficiency of SwA, but it does so without having to break the binary data outside of the SOAP Envelope. XOP allows binary data to be included as part of the XML Infoset. When data is within the XML Infoset, it means that it operates within the constraints of the SOAP processing model. In other words, you can control the processing of the XML Infoset using SOAP headers with policies.

MTOM/XOP uses a XOP:include element to tell the processor to replace the content with the referenced binary data which are “stored” outside of the XML document at the MIME attachment part. The logic of discrete storage and retrieval of binary data becomes inherent to the XML parser. The SOAP parser could treat the binary data in a uniform way and as part of the XML document, with no special retrieval logic. Fortunately sender and receiver are not aware that any MIME artifacts were used to transport the base64Binary or hexBinary data.

Let’s take a look on how to enable MTOM/XOP on SOA Suite 11g. The real world requirement is to build a cache warming-up dictionary service for the Web Presentation Tier which has no database access. Therefore the SCA Container running on the Service Tier provides a Dictionary Entity Service which should be able to transport large datasets to the Web Presentation Tier intermediated by an OSB.

clip_image005

A first top-down WSDL-first implementation would cause the client to send a request for Dictionary ID ranges (request example); afterwards the service retrieves the requested dictionary data from the database and sends the response back to the client as normal dictionary entry structure (response example).

Request Example:

clip_image007

Response Example:

clip_image009

It’s obviously that a large range of Dictionary IDs would cause a large SOAP Response message. But in our case we have a Spring Bean implementation using the simple JDBCTemplate approach. The result of the database request is a list of maps (line 11 and 12) which is mapped on the service response (line 32 to 38).

clip_image011

How about that the service return the raw list instead of the collection of dictionary entry type? Isn’t it a technical motivated private service where the known Java client on the Web Presentation Tier has to navigate through the list anyway? Yes, it is.

The service should become MTOM enabled in order to handle the binary data as part of the message. The same request structure should return a byte array with the data list returned from the database. Two meta data information should be returned additionally, how much dictionary entries are returned and the byte size of the array.

The XSD structure is simple. Element Size and RecNum are optional and of type long. And here we use the base64Binary type for the list of dictionary entries (Element return).

clip_image013

MTOM will be enabled with a right mouse click on the Web Service Adapter placed at the Exposed Services swim lane. The menu option “Configure WS Policies …” will open the Web Service Policies configuration wizard. You only have to enable the MTOM policy (oracle/wsmtom_policy).

clip_image015

The Web Service Adapter on the graphical SCA Editor will have a lock icon at the right upper corner after adding the policy. Behind the scene the wizard will add only one additional configuration line on the Web Service binding (line 4) at the composite.xml file.

clip_image017

When you deploy the MTOM enabled service and take a look at the WSDL URL you will realize that Oracle SOA Suite added a MTOM policy to the WSDL (line 6 to 8). Additionally the WSDL binding has a policy reference to the MTOM policy (line 100). The wsdl:required="false" declaration means that also not MTOM enabled clients should work. Tools such as soapUI or SoapSonar should be able to invoke the service which makes sense because the service offers operations without MTOM involvement. From my experiences with soapUI it’s not working because both MTOM request properties (“Enable MTOM” and “Force MTOM”) have to be enabled always. Otherwise the service responses with an internal sever error (“Non-MTOM message has been received when MTOM message was expected.”).

clip_image019

Oracle SOA Suite will create a byte[] equivalence for the XSD type base64Binary. So byte[] is the object representation of XSD types base64Binary and hexBinary. The only need is to convert the Spring database result list to a byte array (by using the classes ObjectOutputStream and ByteArrayOutputStream inside of a DataMTOMUtil helper class at line 34). The two meta data values are quickly and easily determined (line 37 and 38).

clip_image021

The same request will now result in a more interesting response containing the XOP:include.

clip_image023

So the two meta data information of size and record numbers are like expected. But the dictionary result is now placed with context type “application/octet-stream” (=binary attachment) at the MIME part attachment with a given content ID. soapUI is showing you a raw view on the complete SOAP message.

clip_image025

As you could see, the HTTP message has the content type “multipart/related” which is used for compound documents. The type is “application/xop+xml”. The dictionary list is send as binary part of the MIME message with the content type “application/octet-stream” and the same content ID which is placed at the href attribute (cid:6df6245c6aff457d8d51db92b6707170).

You will realize an interesting effect when the binary data are below a size of 1024 byte. In this case the Web Service framework will not create a separate attachment. Instead the binary data are placed directly on the XML message.

clip_image027

The MTOM threshold of 1024 byte is a default value. When the potential attachment bytes are less than that threshold size, the data will be inlined. The MTOM threshold determines the break-even point between sending data inline or as attachment.

I already mentioned that MTOM/XOP will not do any kind of data compression. But it’s obvious that data compression will have a great benefit on our solution. The objToByteArray method is using standard OutputStream classes. It’s a quick implementation to use the java.util.zip.GZIPOutputStream for additional compression (line 10).

clip_image029

The standard package java.util.zip provides classes for reading and writing standard GZIP format which is a popular compression algorithm. The client side creation is simply using the JDeveloper Web Service Proxy wizard. The wizard creates a Web Service client skeleton which is also MTOM/XOP enabled.

clip_image031

Regarding MTOM/XOP you only have to enable the “oracle/wsmtom_policy” during the Web Service client creation.

clip_image033

Finally you place your code snippets at the created Web Service client skeleton. In our case the conversation of a byte array back to a List object (line 31) and the mapping to the dictionary structure (line 33 to 38). The code has also to consider the GZIP decompression (line 7).

clip_image035

What is the effect of data compression? I executed the service with several dictionary ranges and put the values on a spreadsheet. Additionally I added the size for base64 encoding (which would be used without MTOM/XOP).

Dictionary Rec Num

Dictionary List Size Uncompressed (byte)

Dictionary List Size GZip Compressed (byte)

Dictionary List Size Uncompressed Base64 (byte)

Dictionary List Size Compressed Base64 (byte)

1

646

493

864

660

14

2706

740

3608

988

194

36383

3737

48512

4984

1783

307619

13830

410160

18440

11574

2188494

89687

2917992

119584

The compression rate is significant. Take a look at the last line. When I would transport 11574 records without MTOM on the base64Binary element I have to base 64 encoding of the data. In this way I need 2.78 MB byte (2917992 byte) which is 25% more than the original dictionary list size of 2.09 MB (2188494 byte). A GZIP compression will bring down the original dictionary list size at 0.09 MB (89687 byte). This is the final size which travels on the MTOM/XOP MIME attachment which is around 4.1% of the original size.

Therefore service is able to transport much more data in a given time with the GZIP compression. Base64 encoding will add additional size and is not a good option (even when you compress the dictionary list because it’s still around 25% larger). So MTOM/XOP with GZIP compression is the best solution.

The solution shown happens in memory. The Oracle documentation describes also a streaming approach which may be interesting in case of very large binary data. But the documentation also mentions limitations regarding message encryption which is not working on streaming MTOM.

Thursday, November 15, 2012

SOA Suite Performance – Right Garbage Collection Tuning - Part 2

My last blog pointed out the importance of the machine power and that it’s impossible to regain performance on the higher architectural layers when the machine power isn’t adequate. This time I want to focus on another very important performance adjusting screw – the JVM and especially Garbage Collection (GC).

GC tuning is in most cases an indispensable prerequisite for good performance on non-trivial projects from a certain size (and Oracle SOA Suite project are designated to this category). Unfortunately, it's also for SOA Suite projects not an easy and one-off task. Most likely it will take some optimization iterations where you measure your GC behavior, tune the GC stetting and measure again. And even when you find the right GC settings for the moment you have to monitor GC behavior over the time because raising number of SCA Composites, more SCA Service calls or higher data volume will change the GC behavior. It’s also safe to keep basic GC logging in the production system.

Good thing about GC tuning is that there are plenty of good articles and blogs describing how to do meaningful GC tuning. I neither want to repeat all the available stuff nor I want to go through all 50+ parameters of the Sun Hotspot JVM. Instead I want to give some helpful GC hints on the JVM which are important during our own JVM tuning for SOA Suite, rarely mentioned or difficult to find. So make yourself familiar with the JVM basics if this didn’t happen so far.

First of all when you want to do JVM tuning you need a GC analyzing tool like HPjmeter to visualize GC behavior. Some tools could perform real-time monitoring but it’s sufficient to offline analyze the GC log files. Raw GC log file analyzing without a tool is a time-consuming task and needs a certain experience level.

Basic GC logging parameters: –verbose:gc –XX:+PrintGCDetails –XX:+PrintGCTimeStamps – Xloggc:<file>.

The –XX:+PrintGCDetails option provides the details needed for GC tuning. The –XX:+PrintGCTimeStamps setting causes that time stamps are printed at GC. The time stamp for a GC event is indicating the amount of elapsed time since the JVM launched (approximately matching with the Weblogic Server Activation Time which is shown on the Weblogic Console). When it is desirable to know the exact wall clock time over a time stamp representing the time since JVM launch, you could use the -XX:+PrintGCDateStamps setting which enables the printing of a localized date and time stamp indicating the current date and time.

The most obvious and important parameter is the right JVM Memory sizing which has to be aligned to the physical memory. Make sure the JVM always has 100% of the memory it needs, do not over-committing memory because the JVM Memory is an active space where objects are constantly being created and garbage collected (memory over-committing is the process of allocating more JVM Memory than there is physical memory on the system). Such an active memory space requires its memory to be available all the time. I was reading about a recommendation not to use more than 80% of the available RAM which is not taken by the operating system or other processes. Too small JVM Heap memory sizing will lead in worst case to an out-of-memory error already during startup or many Full GC Collections from the beginning. Too large JVM Heap memory sizing will pause your application for several tens of seconds. As a general rule of thumb, as the JVM Heap size grows, so does the overhead of GC for the JVM process. In order to give you a ballpark figure, if you have below 50 SCA Composites, I would recommend starting with 4 GB JVM memory (-Xmx4g) on your Weblogic Managed Servers. During your optimization, try to reduce the JVM Heap size if possible to get shorter GC times and to avoid wasting memory. If the JVM Heap always settles to 85% free, you might set the Heap size smaller.

Note: A common misunderstanding is to assume that the –Xmx value is equal to the Java Process memory needed, but the JVM Memory (or Java Process Memory) is greater than JVM Max Heap (greater than –Xmx) and this is due to the other additional JVM areas outside of the JVM Heap that make up the memory space of the total Java process such as JVM Perm Size (-XX:MaxPermSize), [number of concurrent threads]*(-Xss), and the “Other JVM Memory” section. The –Xss option is the Java thread stack size setting. Each thread has its own stack, its own program counter, and its own local variables. The default size is Operating System and JVM dependent, and it can range from 256k to 1024k. Each thread is given exactly the amount specified. There is no starting or maximum stack size. Since the default –Xss stetting is often too high, tuning it down can help save on memory used and given back to the Operating System. Tune down to a range that doesn’t cause StackOverflow errors. The “Other JVM Memory” is additional memory required for JIT Code Cache, NIO buffers, classloaders, Socket Buffers (receive/send), JNI and GC internal info.

clip_image002

Therefore the final JVM Memory calculation has to happen with this formula:

JVM Memory = JVM Max Heap (-Xmx) + JVM Perm Size (-XX:MaxPermSize) + [number of concurrent threads] * (-Xss) + “Other JVM Memory”

Typically “Other JVM Memory” is not significant however can be quite large if the application uses lots of NIO buffers, and socket buffers.  Otherwise it’s safe assuming about 5% of the total JVM Process memory.

Check that you have an activation of "-server" JVM Hotspot JIT. It delivers best performance for SOA Suite running servers after a fair amount of time to warm up (keep in mind that the Domain configuration wizard configures “-client” when you create a Domain in development mode). The server mode has differences on the compiler and memory defaults tuned to maximize peak operating speed for long-running server applications (doing more code analysis and complex optimizations). The old rule for Server JVMs to put initial Heap size (-Xms) equal to maximum Heap size (-Xmx) is still valid. Otherwise you get Heap increase on the fly which always requires a Stop-The-World (STW) GC, even if the resizing is very tiny.

Equalization of memory values is also a good approach for the Permanent Space which is allocated outside of the Java Heap. Permanent Space is used for storing meta-data describing classes that are not part of the Java language. The -XX:PermSize setting specifies the initial size that will be allocated during startup of the JVM. If necessary, the JVM will allocate up to -XX:MaxPermSize setting. JVM efficiency is improved by setting PermSize equal to MaxPermSize. This Non-Heap memory area is pretty stable to my observations. Observe PermSpace usage and adjust accordingly by using tools like JConsole or VisualVM.

Keep always in mind that you should consider to scale-out an application to multiple JVMs (=Weblogic Managed Servers) even on the same host (so-called vertical clustering). Horizontal clustering is clustering across hardware boundaries for both load balance and failover as first objective. Even though for a 64 bit system, there is theoretically no upper memory limit but the constraint of available physical memory. But again, too large heap sizes certainly can cause GC STW problems. Smaller JVM heap sizes running on more JVMs is the solution implemented with vertical and horizontal clustering. There is no “gold rule”, optimal JVM heap size and number of JVMs (= Managed Servers) could only be found through performance testing simulating average and peak load scenarios (use tools like the supplied Oracle Enterprise Manager Fusion Control, professional tools like HP LoadRunner or open source tools like Apache JMeter and soapUI).

Most important besides the right JVM memory sizing is the choice of the right GC strategy or also called GC scheme. You have to decide between the optimization goals of maximum throughput and minimum pause time. You couldn’t have both. So if you have a domain doing online processing where users are waiting for quick response you would like to optimize for minimum pause time. On the other hand, if you have a domain doing batch processing and your SCA Services are not involved on an interactive application you could effort longer pause time, than you would like to optimize for maximum throughput (that’s also a reason why I recommend on my last blog a domain split in a scenario were you have to do online processing and batch processing).

The acceptable rate for Garbage Collection is for sure application-specific but the Oracle documentation mentions that a best practice is to tune the time spent doing GC to within 5% of execution time (which is already a high value in my opinion). Full Garbage Collection should in general not take longer than 3 to 5 seconds.

Let’s take a look on the GC scheme settings. Today most of the environments operate on (several) Multi Core/CPUs. I assume that your SOA Suite machine(s) have Multi Core/CPUs and therefore we could neglect the special settings for Single Core/CPU environments. Here are the two important Sun JVM Flags to choose the right GC strategy:

 

Maximum Throughput
(pause-times are not an issue)

Minimum Pause Time

(pause-times are minimized)

Sun JVM Flag

-XX:+UseParallelGC

-XX:+UseConcMarkSweepGC

Young Generation

(Eden Space + Survivor Spaces)

(Scavenges)

Parallel Young GC

(stop-the-world parallel mark-and-copy)

Parallel Young GC

(stop-the-world parallel mark-and-copy)

Old Generation

(Tenured Space)

Parallel Old GC

(stop-the-world parallel mark-and-compact)

CMS GC

(concurrent mark-and-sweep)

Parallel GC scheme is stopping the application execution and is using as many threads as possible (and therefore all available CPU resources) to clean up memory. GC STW happens. Concurrent GC scheme attempts to minimize the complete halting of the application execution as much as possible by performing the GC logic in parallel within threads that run concurrently with the application logic threads. Anyway, even Concurrent Mark-and-Sweep (CMS) will cause GC STW but less and shorter than a Parallel GC. Take a look at the GC log file, only the “CMS-initial-mark” and “CMS-remark” phase is causing GC STW. The marking and remarking pauses are directly proportional to the amount of objects in the Old Generation (Tenured Space). Longer pauses indicate a lot of tiny objects.

The JVM offers also two settings to control how many GC threads are used in the JVM. The –XX:ParallelGCThreads setting controls the number of threads used in the Parallel GC. The -XX:ConcGCThreads setting let you control the number of threads Concurrent Garbage Collectors will use.

On smaller multiprocessor machines with less than or exactly 8 CPU Cores you will configure the number of parallel threads equal to the CPU Cores.

–XX:ParallelGCThreads = [number of CPU Cores]

For example, if there are two Dual-Core Processors you will have a setting of 4 threads. If there are using 4 Dual-Core Processors or 2 Quad-Core processors you will have a setting of 8 threads.

On medium to large multiprocessor machines don't set the number of GC threads to be the same as the CPU Cores (there are diminishing returns). This is the formula to do the thread configuration for machines with more than 8 CPU Cores.

–XX:ParallelGCThreads = 8 + (([number of CPU Cores] – 8) * 5)/8

You get 1 parallel GC thread per CPU Core for up to 8 CPU Cores. With more CPU Cores the formula reduces the number of threads. For example for 16 CPU Cores you get: 8+((16-8)*5)/8 = 13 GC threads.

The number of threads for the CMS process is dependent on the number of the threads for the parallel GC.

-XX:ConcGCThreads = ([number of ParallelGCThreads] + 3) / 4

But be rather conservative and not too aggressive with the thread setting especially when you are doing vertical clustering. In a virtual environment the calculation is based on the number of CPU Cores assigned to the Guest OS.

Also be aware that CMS leads over the time to some Heap fragmentation which will cause the JVM to switch to a Mark-and-Compact collector. A mix of both small and large objects would fragment the Heap sooner. The JVM needs to find a block of contiguous space for the size of the object and this will slow down the JVM. There is a JVM parameter that could be used to detect fragmentation (-XX:PrintFLSStatistics=2) but it slows down the GC significant. Consider that most likely a SOA Suite Batch Domain has to handle larger objects than an Online Application Processing Domain.

The new Garbage-First (G1) Garbage Collector (testable since Java SE 6 update 14 Early Access Package and officially available since Java SE 7 Update 4) will be the long-term replacement for CMS and targets medium to large multiprocessor machines and large Heap sizes. Unlike CMS, G1 compacts to battle fragmentation and to achieve more-consistent long-term operation. But the first Weblogic Server version which supports JDK 7 is 12c.

When you do the JVM sizing you should know how large the JVM Heap sections are and when GC is triggered. Only with this knowledge you could do meaningful sizing and react on the number given by the GC log files.

These formulas are helpful to calculate the JVM Heap memory sizes:

Space

Calculation

Eden Space

NewSize – ((NewSize / (SurvivorRatio +2)) * 2)

Survivor Space (To)

(NewSize – Eden) / 2

Survivor Space (From)

(NewSize – Eden) / 2

Tenured Space

HeapSize – Young Generation

These formulas give you the real sizes of the generational JVM spaces. The Survivor Spaces serves as the destination of the next copying collection of any living objects in Eden Space and the other Survivor Space. Keep in mind that one Survivor Space is empty at any given time. Now it’s on you to monitor the GC behavior, do the right GC scheme setting, to calculate the optimized heap sizes and number of threads, doing the best non-heap size settings.

Let me show you an example on how successful GC optimization could improve your overall performance on SOA Suite Services running at a Weblogic Domain with two Managed Servers (SOA Suite 11g PS4, WLS 10.3.5, JDK 1.6.0_27 running on Solaris).

The story started with a default ParallelGC scheme setting (default GC values for JDK 6: -XX:+UseParallelGC, ParallelGCThreads=#CPU, SurvivorRatio=32, PermSize=64m, no GC logging) and a JVM heap size of 4 GB.

-server –Xms4g –Xmx4g –Xss512k -XX:PermSize=768m -XX:MaxPermSize=768m –verbose:gc –XX:+PrintGCDetails –XX:+PrintGCTimeStamps – Xloggc:/gc.out

The initial 4 GB JVM Heap size setting was discussed with Oracle and the Permanent Space setting is coming from first observation. The 64bit JDK6 stack size on Solaris is 1024k. We reduced the thread stack size on 512k.

After running some HP LoadRunner Average Load Tests, the HP JMeter GC STW diagram was showing GC STW activities between 23 and 35 seconds.

clip_image004

This was unacceptable for an Online Processing Domain where a user is waiting for response. SCA Services shouldn’t be blocked by 35 seconds freezing the execution. In order to optimize for minimal pause time the GC scheme changed on CMS (–XX:+UseConcMarkSweepGC).

-server –Xms4g –Xmx4g –Xmn2g –Xss512k –XX:PermSize=768m –XX:MaxPermSize=768m –XX:+UseConcMarkSweepGC –XX+UseParNewGC –XX:+CMSParallelRemarkEnabled – XX:+UseCMSInitiatingOccupancyOnly –XX:CMSInitiatingOccupancyFraction=55 –XX:+CMS ClassUnloadingEnabled –XX:ParallelGCThreads=8 –XX:SurvivorRatio=4 –verbose:gc –XX:+PrintGCDetails –XX:+PrintGCTimeStamps – Xloggc:/gc.out

The -Xmn2g setting configures the Young Generation Heap size. A parallel Young Generation collector (–XX+UseParNewGC) is best used with the CMS low pause collector that collects the Tenured Space. The –XX:+CMSParallelRemarkEnabled setting enables multiple parallel threads to participate in the remark phase of CMS. Since this is a STW phase, performance is improved if the collector uses multiple CPU Cores while collecting the Tenured Space. The –XX:CMSInitiatingOccupancyFraction setting on 55 means that CMS GC starts at 55% memory allocation (default is 68%). The – XX:+UseCMSInitiatingOccupancyOnly setting forces CMS to accept the –XX:CMSInitiatingOccupancyFraction setting and not starting the CMS collection before the threshold is reached (disables internal JVM heuristics, without this setting the JVM may not obey CMS initiating occupancy fraction setting). The -XX:+CMSClassUnloadingEnabled setting activates the class unloading option. It helps to decrease the probability of “Java.lang.OutOfMemoryError: PermGen space” errors.

Here is the calculation on the JVM Heap memory size with the given parameter.

Space

Calculation

MB

KB

Eden Space

2048m – ((2048m / (4 +2)) * 2)

1365.3

1398101.33

Survivor Space (To)

(2048m – 1365.3m) / 2

341.35

349525.335

Survivor Space (From)

(2048m – 1365.33m) / 2

341.35

349525.335

Tenured Space

4096m – 2048m

2048

2097152

So the Tenured Space (Old Generation) cleanup starts at a filling size of 1153433.6 KB (1126.4 MB).

Further average and performance load tests reported that the GC STW activities went down on a maximum of around 10 seconds for most CMS GC STW, but 10 seconds are still too long and STW activities happened much too often.

clip_image006

We analyzed the GC logs which reported Concurrent Mode Failures like the following GC entry.

238276.333: [GC 238276.334: [ParNew: 1709331K->1709331K(1747648K), 0.0001286 secs]238276.334: [CMS238276.640: [CMS-concurrent-mark: 13.637/14.048 secs]

(concurrent mode failure): 1663134K->1619082K(2097152K), 53.1504117 secs] 3372466K->1619082K(3844800K)

Concurrent Mode Failure means that a requested ParNew collection didn’t run because GC perceives that the CMS collector will fail to free up enough memory space in time from Tenured Space. Therefore worst case surviving Young Generation objects couldn’t be promoted to the Tenured Space. Due it this fact, the concurrent mode of CMS is interrupted and a time-consuming Mark-Sweep-Compact Full GC STW is invoked.

On GC log file entry mentioned above the ParNew request happens at JVM Clock Time 238276.333 whereas the Young Generation had a fill level of 1709331 KB (1669.27 MB) out of 1747648 KB (1706,69 MB). This means a filling level of 97.8% out of 1747648 KB (rounded: 1706 MB = 1365 MB Eden Space + 341 MB One Survivor Space). The GC STW happens at Clock Time 238276.334 and took 53.15 seconds. The Tenured Space occupancy dropped from 1663134 KB (1624 MB) to 1619082 KB (1581 MB). This means that 97.5% of all objects survived the Tenured Space clean-up. Only 43 MB of the Tenured Space is getting freed.

So the Young Generation had around 1669 MB before GC which could only free 43 MB on the Old Generation. The Tenured Space seems not big enough to keep all the promoted objects. There is an Oracle recommendation when changing from a Parallel GC scheme to a Concurrent GC scheme. Oracle recommends increasing the Tenured Space by at least 20% to 30% in order to accommodate fragmentation.

We decided to keep the overall JVM Heap size stable and instead to decrease the Young Generation in order to give the Old Generation more space (-Xmn2g to – Xmn1g).

-server –Xms4g –Xmx4g –Xmn1g –Xss512k –XX:PermSize=768m –XX:MaxPermSize=768m –XX:+UseConcMarkSweepGC –XX+UseParNewGC –XX:+CMSParallelRemarkEnabled – XX:+UseCMSInitiatingOccupancyOnly –XX:CMSInitiatingOccupancyFraction=55 –XX:+CMS ClassUnloadingEnabled –XX:ParallelGCThreads=8 –XX:SurvivorRatio=4 –verbose:gc –XX:+PrintGCDetails –XX:+PrintGCTimeStamps – Xloggc:/gc.out

Here is the new calculation on the JVM Heap memory size after changing Young Generation on 1 GB.

Space

Calculation

MB

KB

Eden Space

1024m – ((1024m / (4 +2)) * 2)

682.66

699050.66

Survivor Space (To)

(1024m – 682.66m) / 2

170,67

174766.08

Survivor Space (From)

(1024m – 682.66m) / 2

170,67

174766.08

Tenured Space

4096m – 1024m

3072

3145728

Afterwards we triggered a new Average Load Test. The GC STW activities went down to a maximum of 4.3 seconds, but much more important is the fact that GC STW significantly less frequently happens.

clip_image008

This was good for the Average Load Tests, but a Peak Performance Test was showing an accumulation of CMS GC STW activities during the Performance Test (around Clock Time 200000).

clip_image010

We decided to do a resizing and slightly increased Young Generation (from –Xmn1g to –Xmn1280m) in order to allow objects to be held longer in Young Generation with the hope that they will be collected there and are not promoted to Old Generation. As mentioned by Oracle Doc ID 748473.1, most of the BPEL engine’s objects are short lived. Therefore the Young Generation shouldn’t be too small.

The Survivor Spaces allow the JVM to copy live objects back and forth between the two spaces for up to 15 times to give them a chance to die young. The –XX:MaxTenuringThreshold setting governs how many times the objects are copied between the Survivor Spaces (the default value is 15 for the parallel collector and is 4 for CMS). Afterwards the objects are old enough to be tenured (copied to the Tenured Space). So we increased also the Survivor Spaces (from –XX:SurvivorRatio=4 to –XX:SurvivorRatio=3, see calculation below). Additionally we increase the –XX:CMSInitiatingOccupancyFraction setting on 80% in order to make use of the large Old Generation capacity.

-server –Xms4g –xmx4g –Xmn1280m –Xss512k –XX:PermSize=768m –XX:MaxPermSize=768m –XX:+UseConcMarkSweepGC –XX+UseParNewGC –XX:+CMSParallelRemarkEnabled – XX:+UseCMSInitiatingOccupancyOnly –XX:CMSInitiatingOccupancyFraction=80 –XX:+CMS ClassUnloadingEnabled –XX:ParallelGCThreads=8 –XX:SurvivorRatio=3 –verbose:gc –XX:+PrintGCDetails –XX:+PrintGCTimeStamps – Xloggc:/gc.out

This means the following new JVM Heap sizes:

Space

Calculation

MB

KB

Eden Space

1280m – ((1280m / (3 +2)) * 2)

768

786432

Survivor Space (To)

(1280m – 768m) / 2

256

262144

Survivor Space (From)

(1280m – 768m) / 2

256

262144

Tenured Space

4096m – 1280m

2816

2883584

Now the Old Generation cleanup starts at a filling size of 2252.8 MB. Take a look at the following GC STW diagram which confirms the GC tuning effort. The diagram reports a well-tuned JVM with relatively large number of short Parallel Scavenges with less frequent, but more expensive, full CMS GC.

clip_image012

More GC fine tuning and better GC behavior is for sure possible. But we are quite satisfied with the reached performance results. The SCA Services have much better response times on Average and Peak Performance Tests.

Finally you have to keep a closer eye on the Code Cache of the JVMs running your Weblogic Servers with SOA Suite. It has nothing to do with GC, but here is the explanation why it’s important. As you all know, Java code is getting compiled to bytecode for the JVM. Bytecode has to be converted to native instructions and library calls for the target platform. The interpreted mode always converts bytecode “as it used” which slows down the execution performance. Whereas the Just-In-Time (JIT) compilation keeps preparative compiled code segments (performance-critical “Hotspots”) in a Code Cache, the cached native compiled code is getting reused later without needing to be recompiled (for example in loops). That’s the reason why the JVM over time obtain near-native execution speed after the code has been run a few times.

So it’s performance critical when you have warnings like …

Java HotSpot(TM) Client VM warning: CodeCache is full. Compiler has been disabled.

Java HotSpot(TM) Client VM warning: Try increasing the code cache size using -XXReservedCodeCacheSize=

You can imagine how bad it would be if JIT’ing is switched off because the Code Cache is getting full (default Code Cache size with –server option and 64bit is 48 MB). The Non-Heap Code Cache is allocated during JVM startup and, once allocated, it cannot grow or shrink. Fortunately the Code Cache size could be changed with the -XX:ReservedCodeCacheSize setting. But increasing the Code Cache size will only delay its inevitable overflow. Therefore is more important to avoid the performance breaking interpreted-only mode. The -XX:+UseCodeCacheFlushing setting enables the compiler thread to cycle (optimize, throw away, optimize, throw away), that’s much better than disabled compilation. So if you see that Code Cache warning I would recommend to slightly increase the Code Cache size and to enable Code Cache Flushing. The –XX:+PrintCompilation setting give you more details (or watch the Code Cache behavior on JConsole).

I just want to leave you with three more tips for your own JVM GC tuning. There is a JVM setting -XX:+PrintFlagsFinal which will show you all JVM settings during the startup phase. Second tip is a suppression of programmatic caused Full GC STW by using the System.gc() and Runtime.getRuntime().gc() methods. The -XX:+DisableExplicitGC setting will ignore these method calls which are undermines the GC tuning efforts. Take a look for “Full GC (System)” entries on your GC log files. Third tip is to take a look at the Standard Performance Evaluation Corporation and how they configure the JVM for Weblogic in order to get best performance (move on the result page, look for a Weblogic test on our platform and take a look on the JVM settings at the Notes/Tuning Information section).

Tuesday, August 28, 2012

SOA Suite Performance – Don’t fight against the machine - Part 1

Performance optimization on SOA Suite applications is a challenge! After about one and a half years of performance tuning activities on a large SOA Suite project I’m fully confident to make such an assertion. Why is it so difficult to tune the performance, because the performance is depended on several system architecture layers.

SOA Suite Layers
On top of the stack JDeveloper developed applications/services running on the SCA Container. Design and configuration on SCA Composite and SCA Component level have a critical influence on the success of performance optimization activities. Below the SCA Application layer we have the FMW Common Infrastructure (e.g., JRF, OWSM, EM Fusion Control) together with the SOA Suite Service Engines (e.g. BPEL, Mediator) as SCA Runtime environment SOA Infra. There are many performance relevant tweaks you could do on the SOA Infra layer. The Oracle Weblogic Application Server is the foundation on which all Java EE-based Oracle FMW components are build. Classic techniques on JEE performance optimization are used together with the possible utilization of the Coherence Data Grid. JVM optimization is also well known but its performance essential to have the right GC strategy with the right parameterization. Below the JVM we will have the layer with the Operating System which could run on a Hypervisor. Both are highly critical on resource allocation and therefore important on performance tuning activities. As an Application Architect you have to trust the Infrastructure Experts doing their best on performance tuning because in most cases you are leaving your knowledge domain.

Down below you have the machine hardware with network and storage. Here you will find your “physical” limitations. The machine is coming with a certain numbers of CPU/cores, clock speed and the size of the internal memory cache of the system CPU(s). You will have a certain speed and width of the system’s address and data buses. And you will be limited by the amount of memory, the width of its data path and the memory access time which causes the CPU(s) to wait for memory. And you will be highly depended on the speed of the I/O devices such as network and disk controllers.

These hardware factors have the greatest influence on the application performance. For example, Oracle SOA Suite architectures will have CPU-intensive operations happening in a single thread for components like rule evaluation and XML processing. Therefore the CPU clock speed is an important factor to consider. There will come a time during your performance optimization activities when, despite all tuning activities on the upper architecture system layers, you have to add more powerful hardware or upgrade the existing hardware. If you miss the right time to scale up and to scale out, the application architecture will become out of balance on the non-functional system qualities by focusing too much on the performance aspect on the SCA application layer. And because you have the design freedom ugly compromise are make for reaching the right performance. Clearness and comprehensibility of the architecture, extensibility, reusability and maintainability are system qualities which will suffer as a result. System architects are also talking of increasing system entropy (disorder). Furthermore the relationship on costs of the performance tuning activities and the purchase of hardware is at a certain point of time no longer reasonable.  

When you start to reduce your SCA Composite modularization on performance reasons, when you realize that your BPEL processes are getting monolithic and you are doing BPEL programming in small, when you don’t mediate on inbound and outbound interfaces because of performance reasons you will have on indicator that it’s time to scale up and to scale out.

Note: Don’t fight for last possible amount of performance on the SCA application layer. You should realize when the right time has come to throw in the towel. Don’t be naïve and believe that all performance problems could be solved on the layers above the machine.

If you have online and batch processing you should also consider to have an own Weblogic Cluster running SOA Suite for online processing and an own Weblogic cluster for batch processing. This is an important consideration because the configuration on the layers will differ between an online and batch related configuration. That’s something we will examine on the next blog when we take a detailed look on the JVM.