Welcome to EMC Consulting Blogs Sign in | Join | Help

Manjunatha Subbarya's Blog

  • Transaction design of vFabric Gemfire

     

    As described  in my previous blog regarding the solution design (http://consultingblogs.emc.com/manjunathasubbarya/archive/2011/11/27/solution-design-using-vfabric-gemfire-spring-aop-and-hibernate.aspx), here I am showing the sequence diagram to describe the interaction between the user interface and the vFabric gemfire.

    Transactional – create or Update

    The following describes the approach to cache the created and modified objects of application using hibernate.

    ProfileService implementation is responsible for providing the service methods to create or update the user profile.

    AccountRepository: Provides CRUD methods to handle account object from the database perspective.

    image 

    Hibernate Listners- For any CRUD operation, from the perspective of application optimization hibernate listeners are used. This provides mechanism in which one can implement the logic of storing the model object in the cache based on the operation performed.

    Gemfire Region Service: The gemifre region service provides implementation to perform CRUD operation on the cache regions.

    Best part about this approach is we are delegating the CRUD operation to gemfire to Hibernate, otherwise it is extremely difficult to implement at application layer.

    Transactional – Readonly

    The following describes the approach to retrieve the cached object

    ProfileService implementation is responsible for providing the service methods to find the user profile.

    AccountRepository Profile service calls this repository method to retrieve an user profile.

    AccountRepositoryAspect- This Spring AOP aspect intercepts the repository call and routes the request to gemfire region service. If the object found in the cache it returns else using the repository method retrieve from the database.

    The following picture describes the Read-only transaction.

    image

  • Solution design using vFabric gemfire, Spring AOP and Hibernate

    The following are are main requirements  for vFabric gemfire implementation for a large financial institution here in bay area .

    1) Build a framework using the Spring AOP, to optimize the application using vFabric gemfire. The application might cache data from database or any other source.

    2) Use  topology which provides the vertical scaling

    3) Use lazy-initialization for domain objects – Hibernate Objects might use lazy approach for loading the objects. Develop an approach to handle transient objects in application.

    4) Apply plugin architecture with opportunity to disable GemFire at any time. This is an ON/OFF switch required for the cache implementation and uses AOP interceptor mechanism.

    5)  CacheServers has to be configured without application code. This requirement forced us to use latest PDX(portable data exchange of Gemfire)

    Generic Framework

    The following are the general approach for a cache implementation. The complete design I will be blogging in my next post

    User Interface interacts with Data layer using JPA Repository methods. JPA Repository methods are intercepted using Spring AOP. Spring AOP routes the calls to gemfire APIs. If the data is found in the cache, it is returned to the customer, else data is fetched from the database and stored in cache before returning data to user interface.

    Client-Server topology

    please read the blog for more details http://consultingblogs.emc.com/manjunathasubbarya/archive/2011/11/27/topologies-options-with-vfabric-gemfire.aspx

    Handling Lazy Interception of Repository method calls using Spring AOP

    In our example we have Account and Address domain objects with relation OneToMany with Lazy fetch strategy.

    image

    In our example we have Account with business logic in getAddresses method. We can’t intercept method getAddresses because in this case we need to implements business logic in interceptor. The solution is to create additional method getAllAddresses with simple return value and intercept this method. Additionally we need to change code in getAddresses method in order not to use field addresses directly and always call getAllAddresses method.

    Second option is to intercept method getAddresses and using reflection API to access to field addresses directly. In this case we will have this logic:

    1) Check is collection a PersistanceBag

    2) Load childs from database

    3) Set field addresses by reflection API

    The method getAddresses in Account object is returning list of child objects. The method getAccount in Address object is returning Account object. With LAZY fetch Hibernate creates objects with javaassist proxy that automatically loads referenced objects from hibernate session. In this case we need to store additionally IDs for Account and back reference to Addresses according Limitation 3.

    Gemfire objects:

    We are creating two gemfire cache regions for separate storing parent and child objects:

    1) accounts

    2) addresses

    <region name="accounts">

    <region-attributes refid="CACHING_PROXY" scope="distributed-ack">

    </region-attributes>

    </region>

    <region name="addresses">

    <region-attributes refid="CACHING_PROXY" scope="distributed-ack">

    </region-attributes>

    </region>

    The idea why we are doing this separately is to manage updates for address and all other child objects separately. In this case we could make read only collections of childs and manage all updates in their own repositories. We will intercept all methods that finds and saves objects in child repositories and will simply update them in cache. Also we have Limitation 4 where we need lookup by childId.

    Each region will have CacheLoader on client. We will not use queries and OQL for retrieving object from Cache and we don’t need to guarantee that all objects are in cache. In this case all requests will be get and puts by key. GemFire will invoke CacheLoader if object is not in cache. In this case we will load object from database.

    Each time where somebody access getAddresses() from Account we need to make lookup addresses by Id (we can’t use queries to avoid preloading data to cache). In this case we need to store Ids for addresses. The solution is to store in AccountCache object the list of AddressIds. We need to maintain this list and make them lazy loaded to cache. At first time where getAddresses method will be invoked, we will go to database and retrieve all addresses for Account.

    Predefined JPA queries could improve productivity. We are using AOP to intercept getAddresses method, in this case Account object that we are returning to application has to be AOP proxy. We are intercepting AccountRepository method findById and creating account proxy in this method.

    If object not found in cache, GemFire will invoke AccountCacheLoader and automatically loads object to cache. Account object loaded specially with Lazy initialized collections and it will be very light.

    Proxy Account object intercepts methods that returns child objects, for example getAddresses().

    This method has a realization that is loads AccountCache object and checks AddressIds collection of back index. If collection is null, then we load all data from EntitiyManager, update collection with AccountCache object and addresses region.

    When we save Account, we need to check is it a proxy object. If yes, we need to get targetObject and save it in Hibernate. Empty child collections will be guarantee that we will not change childs in database.

    Child updates are done by intercepting ChildRepository save method. In this method we are updating in AccountCache object Ids collection and updating child region with database.

    All child collections like getAddresses we are making un-modifiable.

    Plugin-architecture

    As the application is designed on the basis of interception commenting few lines configuration turns of the whole cache implementation.

    PDX Serialization

    vFabric GemFire's Portable Data eXchange (PDX) is a cross-language data format that can reduce the cost of distributing and serializing your objects. PDX stores data in named fields that you can access individually, to avoid the cost of deserializing the entire data object. PDX also allows you to mix versions of objects where you have added or removed fields.

  • Topologies options with vFabric Gemfire

    All the while I blogged about different approach possible to cache the objects using Gemfire. In this blog I briefly describe available topology options before blogging on the implemented design for large financial institution here in bay area.

    VMware vFabric™ GemFire® provides a variety of cache topologies to meet different enterprise needs.

    • At the core of all systems is the single, peer-to-peer distributed system.
    • For horizontal and vertical scaling, one can combine individual systems into client/server and multi-site (WAN)
      topologies:
      • In client/server systems, a small number of server processes manage data and event processing for a much
        larger client group.
      • In multi-site systems, several geographically disparate systems are loosely coupled into a single, cohesive
        processing unit.

    Peer-to-Peer Configuration:
    The peer-to-peer distributed system is the building block for all GemFire installations. Peer-to-peer alone is the most simple topology. Each cache instance, or member, directly communicates with each every other member in the distributed system. This cache configuration is primarily designed for applications that want to embed a cache within the application process space and participate in a cluster. A typical application example would be an application server cluster where the application and the cache are co-located and share the same heap.

    image

    Client/Server Configuration:
    The client/server topology is the model for vertical scaling, where clients typically host a small subset of the data in the application process space and delegate to the server system for the rest. Compared to peer-to-peer by itself, the client/server architecture provides better data isolation, high fetch performance, and more scalability. If application expect data distribution to put a very heavy load on the network, a client/server architecture usually gives better performance. In any client/server installation, the server system is itself a peer-to-peer system, with data distributed between servers. Client systems have a connection pool, which it uses to communicate with servers and other GemFire members. A client may also contain a local cache.

     

    image

     

    Multi-site Configuration
    For horizontal scaling, we can use a loosely coupled multi-site topology. With multi-site, multiple peer-to-peer systems are loosely coupled, generally across geographical distances with slower connections, such as with a WAN. This topology provides better performance than the tight coupling of peer-to-peer, and greater independence between locations, so that each site can function on its own should the connection or remote site become unavailable. In a multi-site installation, each individual site is a peer-to-peer system.

     

    image

     

    Customer decision: After careful evaluation customer chosen client server topology without multi-site configuration. Deciding factor is application servers must host minimal data and most of the cached data to be in data management nodes/Gemfire cache servers. At the moment product does not need multi-site configuration but may be required at later date depending on the popularity of product in coming months.

  • When Gemfire Hibernate L2 cache plugin is not an option

    Previously I blogged about the different options provided by ORM layer for data caching. This blog describes reason to opt out gemfire L2 cache plug in for hibernate for caching.

    Reason 

    We have consistently tried to advise customer on how best to use GemFire in order to meet business and both short and medium-term technical requirements (emphasis on the former). Several requirements along-the-way, combined with our recommended best practices in-use today, drove the decision NOT to suggest our Hibernate L2 cache plug-in. Among these are stated use-cases such as use of server-side GemFire disk persistence for certain classes of data that will NOT be stored in the relational database at all. Another short coming is L2 cache plug-in provides zero direct access to any GemFire API's.

    At a broader level, GemFire in a customer facing publicly available site is a key/critical enabling technology on a strategic level--facilitating such capabilities like dynamic/elastic scalability, continuous availability, ultra-high throughput potential with sub-millisecond avg and sub-100 ms outlier response times, operational efficiency via support for a 1:1 VM to cache server ratio (supporting large Java heaps without GC constraints), and (quite frankly) a straightforward and intuitive data access API and object lifecycle management.

    The decision to give Hibernate--a commodity component with no real strategic value to the project success--the greatest weight in integration pattern choices, is something we just have to live with for now. This decision is at the heart of the current issue we face: Hibernate has an opaque and inflexible paradigm for object lifecycle management that is at direct cross-purposes with the goal of leveraging GemFire's CacheServer performance and general architecture benefits. Given that (a) customer wants to leverage server-side caching (allowing client-side cache entries to expire and thus precisely controlling application server memory requirements) and (b) use Hibernate to lazy-load child objects of these same cached and sometimes locally evicted/invalidated entities, and (c) Hibernate provides no mechanism (via standard usage) for propagating the keys needed to load child objects beyond a single session, and (d) even when you manually piggy-back these keys on the cached object somehow, Hibernate cannot utilize them for transparent lazy-loading anyway . . . Something's got to give.

    A relatively low-effort solution to this quagmire is to model the DB and/or Hibernate O/R mapping in such a way that the parent/child relationship in question is ignored by Hibernate.  With this approach, the previously lazy-loaded child objects are instead loaded via direct requests to Hibernate.  Thus, child objects are loaded in the same way that parent objects are loaded.  Since the child objects described last week are read-only, we dont need to consider the update scenarios (though even that is easily and efficiently manageable by enforcing FK relationship rules in the application--again, a practice we recommend in this scenario from years of experience). 

  • Cache using Hibernate

    Hibernate comes with three different caching mechanisms - first level, second level and query cache. Truly understanding how the Hibernate caches work and interact with each other is important when you need to increase performance - just enabling caching in entity with an annotation (or in classic .hbm.xml mapping file) is easy. But understanding what and how things happens behind the scenes is not. We might even end up with a less performing system if we do not know what we are doing.

    The purpose of the Hibernate SessionFactory (called EntityManager in JEE) is to create Sessions, initialize JDBC connections and pool them (using a pluggable provider). A SessionFactory is immutable and built from a Configuration holding mapping information, cache information and a lot of other information usually provided by means of a hibernate.cfg.cml file or through a Spring bean configuration.

    A Session is a unit of work at its lowest level - representing a transaction in database lingua. When a Session is created and operations are done on Hibernate entities, e.g. setting an attribute of an entity, Hibernate does not go of and update the underlying table immediately. Instead Hibernate keeps track of the state of an entity, whether it is dirty or not, and flushes (commits) updates at the end at the end of a unit of work. This is what Hibernate calls the first level cache.

    The 1st level cache
    Definition: The first level cache is where Hibernate keeps track of the possible dirty states of the ongoing Session's loaded and touched entities. The ongoing Session represents a unit of work and is always used and can not be turned of. The purpose of the first level cache is to hinder to many SQL queries or updates being made to the database, and instead batch them together at the end of the Session. When we think about the 1st level cache we think about user Session.

    The 2nd level cache
    The 2nd level cache is a process scoped cache that is associated with one SessionFactory. It will survive Sessions and can be reused in new Session by same SessionFactory (which usually is one per application). By default the 2nd level cache is not enabled.
    The hibernate cache does not store instances of an entity - instead Hibernate uses something called dehydrated state. A dehydrated state can be thought of as a deserialized entity where the dehydrated state is like an array of strings, integers etc and the id of the entity is the pointer to the dehydrated entity. Conceptually you can think of it as a Map which contains the id as key and an array as value. Or something like below for a cache region:

    { id -> { atribute1, attribute2, attribute3 } }
    { 1 -> { "a name", 20, null } }
    { 2 -> { "another name", 30, 4 } }

    Enabling the 2nd level cache requires the cache strategy definition and provider. Statergy such as read only, transactional, read-write and provider such as VMware gemfire, ehCache etc..

    The Query cache
    The Query cache of Hibernate is not on by default. It uses two cache regions called org.hibernate.cache.StandardQueryCache and org.hibernate.cache.UpdateTimestampsCache. The first one stores the query along with the parameters to the query as a key into the cache and the last one keeps track of stale query results. If an entity part of a cached query is updated the the query cache evicts the query and its cached result from the query cache. Of course to utilize the Query cache the returned and used entities must be set using a cache strategy as discussed previously. A simple load( id ) will not use the query cache but instead if you have a query like:

    Query query = session.createQuery("from employee as r where r.created = :creationDate");
    query.setParameter("creationDate", new Date());
    query.setCacheable(true);
    List l = query.list(); // will return one instance with id 4321

    Hibernate will cache using as key the query and the parameters the value of the if of the entity.
    { query,{parameters}} ---> {id of cached entity}
    {"from employee as r where r.id= :id and r.created = :creationDate", [ new Date() ] } ----> [ 4321 ] ]

  • Data Encryption Options

    You must have read my previous article on the handling cross cutting concern Encryption with Spring AOP. The following are the analysis for encrypting the data

    Solution

    Description

    Pros

    Cons

    DB Encryption

    DB encrypts the hard drives

    •Transparent to app
    •Any app that can access DB using SQL can get the unencrypted data

    •It solves the easy problem (encryption), not the hard problem (key  management)
    * Data is still in DB memory accessible to DB administrators with privileges
    * Approach requires Oracle Advanced Security Suite and doesn't scale to non-Oracle DBs

    Crypto1

    Software-only w/manually entered seed for deriving the master key

    • Easily scalable
    •Low Cost
    •Need to manage keys

    •Need to extend crypto library (internal developed code)
    •Need to create process to manually store/enter master key
    •some banks may not be OK with this approach
    * PCI-DSS requires dual key control when manual processes are involved which will require more complicated key management procedures to be compliant

    Crypto2

    Crypto1 + key mgmt. using the protected private key of a signed X509 certificate as the seed to derive the master key

    • Easily scalable
    •Low Cost
    Solves the key management problem, may meet PCIDSS

    •Need to extend crypto library
    •some companies may insist on use of HSMs for processing their client's/consumer's data

    HSM for key management

    Software-based encryption with HSM for key management

    * Secure key management

    •Need to interface with  HSM (leverage Proximity code) •Need to create proxy in front of HSM

    HSM for Encryption

    hardware-based encryption + key management

    •Most secure, some banks may insist on this approach

    •Need to interface with HSM 
    •Array of HSMs needed, depending on traffic

    Please let me know any of the option missing.

  • Another use case for Aspect Oriented Programming: Data encryption

    All the while I wrote on caching technique, tools and using Spring AOP handling the cross cutting concern of caching. Another cross cutting concern we have is data encryption. As the customer need to encrypt – PAI and PII data, I was asked to come up with a high-level design. The below cross-functional diagram gives an overview on one of the way to implement data encryption using Spring AOP

    clip_image002[7]

    Domain object definition: The attributes of domain objects (Persistence object) is annotated with @Encrypt @Hash etc… This provides the joinpoint for the Spring AOP.

    Service Implementation: This layer receives the messages from UI, and converts it to domain object and back to UI messages.

    Encryption Annotation Framework: This is the heart of the encryption implementation, which intercepts the method invoked and by identifying attribute annotation and decides on calling encryption of decryption. The framework is designed to configure JCE compliant cryptography provider. This provider will be used for encryption and decryption. Converted data will be persisted in the domain object.

    Cache Manager: When the Save/Find method is invoked on the domain object, the data gets persisted in database as well as cache. In my previous blog I have explained this in detail (http://consultingblogs.emc.com/manjunathasubbarya/archive/2011/09/18/aspect-oriented-design-separation-of-concern.aspx)

    Data persistence: The encrypted data is persisted/retrieved in the database.

    Please post to this thread if you have any better ideas to handle this persistence concern.

  • Aspect Oriented design–Continued…

    This is the continuation of previous article on Spring AOP. As with most technologies, AOP comes with its own specific set of concepts and terms, and in this article I attempt  to explain the way I solved the cross cutting concern of caching the required objects, locally and remotely using the Spring AOP concepts. The following list explains the core concepts of AOP:


    Joinpoint: A joinpoint is a well-defined point during the execution of your application. Typical examples of joinpoints include a call to a method, the method invocation itself, class initialization, and object instantiation. Joinpoints are a core concept of AOP and define the points in your application at which you can insert additional logic using AOP.

    >>>>The place where I need my cross cutting concern to be handled is Repository.Save(account), where account is user account need caching. One can call this as trigger.

    Advice: The code that is executed at a particular joinpoint is called the advice. There are many different types of advice, including before advice, which executes before the joinpoint, and after advice, which executes after it.

    >>>>This is the method where I need to write my caching cross cutting concern logic

    CacheAdvice(){

                     Repository.Save(account) // Save it to database

                     Cache.put(account) // put that object in cache level1/level2

    }

    Pointcuts: A pointcut is a collection of joinpoints that you use to define when advice should be executed. By creating pointcuts, you gain fine-grained control over how you apply advice to the components in your application. As mentioned previously, a typical joinpoint is a method invocation. A typical pointcut is the collection of all method invocations in a particular class.
    Often, you can compose pointcuts in complex relationships to further constrain when advice is executed. We discuss pointcut composition in more detail in the next chapter.

    >>>>We can specify triggers use the same function, for example call the same Advice when saving Account or saving Sales data.

    Aspects: An aspect is the combination of advice and pointcuts. This combination results in a definition of the logic that should be included in the application and where it should execute.

    >>>As part of my cache implementation I developed an Aspect “CacheAspect”, to centralize all my pointcuts and advice.

    Weaving: This is the process of actually inserting aspects into the application code at the appropriate point.

    Compile time weaving:For compile-time AOP solutions, this is, unsurprisingly, done at compile time usually as an extra step in the build process. Likewise.

    Run time weaving: for runtime AOP solutions, the weaving process is executed dynamically at runtime.

    Target: An object whose execution flow is modified by some AOP process is referred to as the target object. Often, you see the target object referred to as the advised object.

    >>>>In our case target object is AccountRepository

    Introduction: Introduction is the process by which you can modify the structure of an object by introducing additional methods or fields to it. You can use introduction to make any object implement a specific interface without needing the object’s class to implement that interface explicitly.

    >>>>The below code Cache.put(account) is an introduction by Spring AOP.

     

  • Aspect oriented design–Separation of concern

    From previous posts I kept on writing about cache technology, selection process and the way VMWare data fabric Gemfire turned to be well suited for the today’s data intensive applications etc... This post I will describe the way to integrate the cache without affecting the existing code.

    Use case:

    Customer decided the cache technology much later than the database technology, many of the functionality is developed without the knowledge of cache introduction at later date.

    What are the implications?

    Persistence layer need to be aware of the cache technology introduction. Is this statement correct?

    Answer is YES and NOSmile

    How can this be? It depends on the framework on which the application is being developed.

    YES because if we are just using object oriented programing. We would have modified the code as below

    { …..

    RepositoryUtil.save(Account);

    …….}

    would get modified to

    { …..

    RepositoryUtil.save(Account);

    CacheService.save(Account);

    …….}

    NO because of Aspect oriented design

    What is this?

    Aspect-oriented programming entails breaking down program logic into distinct parts (so-called concerns, cohesive areas of functionality). Nearly all programming paradigms support some level of grouping and encapsulation of concerns into separate, independent entities by providing abstractions (e.g., procedures, modules, classes, methods) that can be used for implementing, abstracting and composing these concerns. But some concerns defy these forms of implementation and are called crosscutting concerns because they "cut across" multiple abstractions in a program.

    With the above you would have  correctly guessed persistence of account object to database and cache both are separate concerns. Where ever the RepositoryUtil.save() method is called the code also need to call CacheService.save(), making it as cross cutting concern.

    It would be nice if we call just RepositoryUtil.save() and under the cover CacheService.save() gets called automatically. This is exactly Aspect oriented programming does. We can address effectively the crosscutting concerns in the object oriented programming.

    With properly architected/implemented AOP application gets benefited in the area of  modularity and simplified maintenance

    Examples of concerns that tend to be crosscutting include:

    • Synchronization
    • Real-time constraints
    • Error detection and correction
    • Product features
    • Memory management
    • Data validation
    • Persistence
    • Transaction processing
    • Globalization which includes Language_localization
    • Information security
    • Logging
    • Monitoring
    • Business rules
    • Code mobility
    • Internationalization and localization
    • Domain-specific optimizations

    In my next blog I will provide the details on how one can achieve this with Spring AOP.

  • VMWare Enterprise vFabric- Gemfire

    When comes to improve the performance of any web application, quickly one thinks of is about caching the web data/content for certain period. Earlier customers were constantly thinking of just level-1 caching (Caching@ App server). From past few years it is increasingly common that customers started thinking Level-2 remote caching as well. 

    What is Gemfire?

    1. Distributed Operational Data Infrastructure

    • Not just a distributed cache
    • Key semantics of a database
    • Key semantics of a Message bus

    2. Enable data sharing and event notifications

    • At memory speeds

    3. Enable apps to continuously analyze and react to very fast moving data.

    Gemfire value proposition

    • Enable notification of data changes to users and other applications
    • Create framework for high performance data access
    • Scale applications to meet unpredictable demands for information
    • Boost performance across applications without increasing other hardware/software requirements
    • Reduces network load and can work over low bandwidth networks
    • Enhance the performance, scalability, and network characteristics of other software.
    • Single product with multiple uses, easy to implement and little to no management requirements
    • Standards compliant interfaces

    Typical application

     

    image

    Who are all providing the L-2 caching products?

    1. VMWare – Gemfire

    2. Oracle – Coherence

    3. Terracotta – ehCache

    Please send me email if your application is not scaling due to bottleneck at database or respond to this blog if you find this information is useful.

Powered by Community Server (Personal Edition), by Telligent Systems