Oct 15, 2010

Unit Testing EJB using EJB3UNIT


Unit Testing EJB is something that not everyone needs. EJB is so much container managed, and has been around for a long time. But no one can skip the Unit Testing.

Let me discuss a scenario(/ an architecture), where unit testing is nightmare.

In my recent project, its SOA based architecture, and each EJB going into its own .ear file. Right now, around 150 .ear files. On a new requirement, there are 10-12 new EJB's and all having huge amount of changes, and lot of cuntionality. Complication is they are very much interdependent, and all are going to be exposed as Web Services, and there is an Integration layer over these EJB's.

Now, Imagine you are working on a new component( an EJB session bean), which is dependent on other 8-9 EJB's, which are in different EAR's, of which most are not developed. Some are EJB 2.1, and some are EJB 3.0, many new tables( and entities). And similar case is with all other developers.

With EJB3Unit, you can design a framework for such circumstances.

We all know that Unit test cases are written befor the actual development starts.

So, if there are 8 developers woking on 8 new components, and all are interdependent. They can start using the EJB3Unit framework.

Each developer starts with wriring the EJB interface, and a implementation class with hard coded response(which makes sense), and write a Unit test class for the EJB, in a common project.
And anyone developing its own component first can test its own components.
Because there are mocks for all component, which are not developed yet.
Best thing is we dont need Application Server. And we dont need to alter the common build prematurely.
Apart from that what we can do using the EJB3Unit:
  • Dependency Injection(@Resource, @PersistenceContext, etc.)
  • Entities and Queries over entities( EJB3UNIT comes with In memory database).  
  • Component Interdependancy( EJB injection)
  •  Automated JUNIT test cases.

Starting with EJB3UNIT is very simple. As with most of the open source project, EJB3UNIT also comes with a sample project which can be download and imported into IDE.
For new ones to Java/J2EE, it might be difficult to understand the sample project.
There can be another way, I am going to explain how to start working with EJB3UNIT, without using Sample project. 
I am using Eclipse 3.6.1, and I also tested on RAD 7.5. 

I am going to create a Sample project named: EJB3UnitUsage.
I have create few source folders:
 src/main/java - to hold my java classes, i.e. session and entity beans.
src/test/java    - to hold the Unit Test Classes
src/test/resources - to hold the properties file.
If you are wondering, what is the contet of properties file.
Actually I just copied them from the downloaded Sample project from the EJB3UNIT site.
And we only need two properties file right now:
ejb3unit.properties, and log4j.properties
It would be better if you follow the same or similar structure.
Below is the screen shot of the package structure and the classes:


 MyEntity is my Entity bean
MyInterface is my Session bean Interface.
(Local Interface, It wont make any difference if you make it Remote also).
MySessionBean is my Session Bean.
Below is the coe for MySessionBean:

I have injected the EntityManager and I can use it also.
We would test if at runtime it is null or not, and also what the operation "getName()" returns at runtime.
Below is my Test class:



Advanced:
 There would be cases, when there is already a project with entities.
We can import those entities, and persistence.xml, in our framework.
For eg, I created a sample project with three entities, and persistence.xml.

I exported the java classes as jar file, and imported the jar file into my project, i.e. "EJB3UnitUsage".
I added the persistence.xml to the "EJB3UnitUsage" project.
Now, I am going to perform some operations over the imported entities.
Inside my session bean, I write code to persist the Book entity, and then find it later.
Our framework is using In mememory database, and so entity would be persisted there.
Also, we need to add following properties in the "ejb3unit.properties":
#Configuration for Persistence classes

ejb3unit.loadPersistenceXML=true
ejb3unit.persistenceUnit.name=mgr
Remember, "mgr" is the name of persistence-unit defiend in the persistence.xml.
Now, when we would run our test case, the persistence.xml, would be read, and also the persistence-unit, and entities would be loaded in memory.
Note the path of persistence.xml, it should in class path. I placed it inside the META-INF.
I have added following code in my Session Bean:
And inside my the test class I add code for testing the entity operation:
I have also tested Entities with @OneToMany, and @ManyToOne relations.
They also work.
I dont know why ArrayList doesn't work in the case of @OneToMany. List works, and Collection works.
For eg., below is the raltion between two entities:

And my code of Lesson.java looks like:

And in my Session EJB I added code to perform some operations over these related entities:
Share:

May 31, 2010

Caching Framework - Sample Cache Implementation



I was asked in an interview to design a cache framework.
The problem statement was:
Develop a caching framework that provides the following:
1) Load data from file, database or any other source
2) Refreshes the data after a configurable interval if it is not a one time load
3) Invalidates the cached objects after a configurable interval.

Few questions that come to mind while designing is what should be cache.
And If I assume it is going to be Map, then what would be key and value.
Second critical thing is how to expose the Cache to clients.
What operations to support.

I decided for using ConcurrentHashMap as my cache.
There are several benefits of using this Map over others like:
Operations are thread safe supporting full concurrency and highly efficient.

Clients would access the Cache through a CacheProvider.
I created an interface named CacheProvider.
This is the starting point of my Cache Implementation.
Below is the code for CacheProvider:

public interface CacheProvider {
        Object getAndSet( String key);
Object setAndGet( String key );
void refresh();
void invalidateObj();
void cancel();
}


I had an initial requirement where I can invalidate objects in Cache after certain interval if there is no access.
I also decided that,
If a particular data is retrieved then its there should be fresh start of invalidation time, i.e. if I am going to configure the invalidation time as 5 min.
and If some object is not accessed for 5 mins, it would be removed from Cache.
And if some object is not accessed for 4 mins, and some client asks for this object, then there should be fresh start, means it would be invalidated after 5 mins from now, if not accessed in next 5 mins.
I would also be configuring an refresh policy, and refreshing all objects in the cache at particular interval.

Regarding data. I used Derby for testing.
I designed Cache to support data from Database, or from file, or to test simple data can be provided as command line arguments.
I know this is too early to talk about this.
But this is going to be initial requirement of where data would be stored and how MyCacheProvider would load data from data provider.

I created three classes for handling data.
CollectionManager - for managing data from command line
FileManager - for managing data from File
DBManager - for managing data from Database.

I am providing the code for all the data managers.
But if one is not interested, then one can simple ignore the below three classes.


CollectonManager:


FileManager:

DBManager:


DBManager is a class which manages data in database.  I have used Derby.  Using derby is very easy, and its useful in applications like this.  You don't need database, its just one jar. Evrything inside the jar, database, and drivers.


My DBManager class is below:


DBmanager has code for initialization, where it reads config values from a file and intializes a DB connection. 
Then, DBManager has code to create the table, populate the table with data. 
Remember, this data would be used to cache. 
Then code to retrieve all data from the database, this method would be called to refresh the whole cache with data from database. 
And method to retrieve a particular data from database. 




I started my design with a CacheProvider interface. 
Now I would extend that clas and create MyCacheProvider, also I would create a CustomKey to store in Cache.
MyCacheProvider would internally use a ConcurrentHashMap for storing data.


Below is the code of CustomCacheKey, instance of this class would act as key in the Cache, or ConcurrentHashMap inside the MyCacheProvider.

CustomCacheKey:


MyCacheProvider:
MyCacheProvider extends CacheProvider, implements getAndSet() method retrieves the value from Cache, put the same key again in the cahce with updated timestamp.
Whenever a key is retrieved from Cache, its timestamp should be updated, so we either update the timestamp of the key(CustomKey), or put a new key with same value and new timestamp in the Cache.
We have to maintain the timestamp, in order to invalidate inactive objects in Cache.

We also have methods to refresh the cache and append a user provided hashmap to cache.

MyCacheProvider should also contain methods to invalidate a particular object, and provide method to cancel or clear whole cache.


Now, I decided that clients should not directly use the MyCacheProvider, because in future I would be creating some more implementation of CacheProvider, and making it configurable.

I made a CacheHandler, and clients would use this handler, with limited operations exposed.

CacheHandler:


public class CacheHandler {
static MyCacheProvider myCacheProvider;

public static void start(){
myCacheProvider = MyCacheProvider.getCacheProvider();

}
public static Object get(String key){
Object myCachedObject = myCacheProvider.getAndSet(key);
if (myCachedObject == null) {
myCachedObject = myCacheProvider.setAndGet(key);
}
return myCachedObject;
}

public static void refreshCache(){
myCacheProvider.refresh();
}

public static void invalidateObj(){
myCacheProvider.invalidateObj();
}

public static void cancel(){
myCacheProvider.cancel();
myCacheProvider = null;
}

public static Map mapStatus(){
return myCacheProvider.mapStatus();
}
}

For simplicity I made above class a static class.

Now I am done with initial design of Cache with three classes.
MyCacheProvider, CustomCacheKey, and CacheHandler.
CacheProvider is a super interface.
And I created three classes for handling persistent data.


I have to run two tasks periodically:
Invalidate objects after specific interval
Refresh the whole cache.
I decided of one more task, which prints the Cache status.

SO, I am going to create three tasks:
CacheInvalidateObjTask - for invalidating the Cache objects.
CacheMapStatus - for printing the Cache status.
CacheRefreshTask - for refreshing the cache.

CacheInvalidateObjTask:

public class CacheInvalidateObjTask implements Runnable{

public void run(){
CacheHandler.invalidateObj();
}
}


CacheMapStatus:

public class CacheMapStatus implements Runnable{
Map map;

public void run(){
map = CacheHandler.mapStatus();
long time = System.currentTimeMillis();
System.out.println("Map Status.. at.." + time/1000 + " secs..");
System.out.println(map);
}

}


CacheRefreshTask:

public class CacheRefreshTask implements Runnable{

public void run(){
CacheHandler.refreshCache();
}

}


So I created three tasks.

I also created two classes, which can be skipped:

CacheConstants:

public interface CacheConstants {

// Resource provider Constants
String RESOURCE_PROVIDER_COLLECTION = "collection";
String RESOURCE_PROVIDER_DATABASE = "database";
String RESOURCE_PROVIDER_FILE = "file";
}


CacheProperties:

public class CacheProperties {

public static int CACHE_SIZE = 0;

public static long INVALIDATE_OBJ_TIMER_INTERVAL = 1000L;
public static long REFRESH_TIMER_INTERVAL =1000L;
public static long PRINT_MAP_TIMER= 1000L;

public static String RESOURCE_PROVIDER = "";

public static String RESOURCE_FILE_PATH= "";

public static String DATABASE_DRIVER = "";
public static String DATABASE_PROTOCOL = "";
public static String DATABASE_NAME = "";
public static String DATABASE_USERID = "";
public static String DATABASE_PWD = "";
public static String DATABASE_TABLE_NAME = "";
public static Boolean DATABASE_CREATE = true;
public static Boolean DATABASE_CREATE_TABLE = true;
}


To start the application, I have created a class CacheService.
CacheService uses CacheServiceHelper, and delegates the call to CacheServiceHelper.


CacheService:

CacheServiceHelper:


CacheServiceHelper contains methods to schedule the three tasks. And method to set the Env.
And a cleanup method to clear the cache:

void cleanUp(){
CacheHandler.cancel();
}



And our cache is ready.


Summary:

CacheService is the main class to start the program.
CahceServiceHelper is for handling all the startup tasks.
CacheService calls the CacheServiceHelper for initializing the environment, and starting the service.
In initialization, CacheServiceHelper sets the properties, and the resource provider.
Resource provider will provide data to store in the cache, it can be either database, or file, or command line arguments.
CacheServiceHelper, starts the service by creating three tasks and submitting them to a ScheduledthreadPool.
Tasks are:
1. Task to refresh the cache map, periodically, CacheRefreshTask
2. Task to invalidate objects inside cache map after specified time, CacheInvalidateObjTask
3. Task to print the map status to console, CacheMapStatus
Each of these tasks call CacheHandler, which is a fa├žade for all Cache related operations.
CacheHandler, depends on MyCacheProvider for all its operations.
MyCacheProvider is the core class for handling the cache.
MyCacheProvider has a map to hold data. It holds data in a ConcurrentHashMap.

Application needs a cache.properties file with following configurable properties:
intial.cache.size - Initial cache Size.
invalidate.object.interval - Interval to invalidate the object, or remove o0bject from cache.
refresh.cache.interval - Interval for Refreshing the cache.
print.map.timer - Interval to print the contents of cache map.
source - source of the data provider.
Valied values are :
collection – if data is provided via
command Line arguments
database - if data provider is database
file - if data provider is file.


#Database properties
#if database as resource, below properties are mandatory
db.driver=
db.userId=
db.pwd=
db.protocol=
db.dbname=
#optional Database properties
db.createdatabase=
db.createTable=

# File properties.. if “file” is data provider, i.e. source property is file
file.path=C:\\test.txt

Below is the package Structure and UML class diagrams:
Classes:












!!!Any Comments would be really appreciated!!!
Share:

May 30, 2010

Interview Questions when 5 or 6 years of Java/J2EE experience


I recommend reading eBook "Java Notes: Effective Programming & Interview" available on Amazon, and can be read using Kindle or Kindle app.


Generally there were some basic questions:
1. java Objective SCJP level
2. J2EE objective
3. JMS Objective
4. IBM MQ Objective
5. Complicated Multithreading
6. Serialization, File I/O
Below are some extract of questions which I faced while giving Interview of Sapient, Nomura, Gemstone, Mobility, ION, HP, RBS, TIBCO, BOA, CSC. 

Generally I have seen if the interview is for an Investment Bank or any product based organization, then core concepts are very important. And since 2008and just adding this line in 2013 I would say most important concept stills remains Multi threading. And so Executer framework remains hot.

CSC, Chennai
1. 50 numbers ( 1 to 50 ) are distributed in random order in an array of size 50. One number is missing. How to find.
2. Inner Classes, Anonymous Classes.
3. Find manager name who has max. number of employees working under him, when table has employeedID, Name, and ManagerID as columns. and Manager can also be employee.
4. Serialization.
5. How to design a web application, medium size.
6. Write logic to print alternate elements in a ArrayList, even some are removed.
7. How to implement a Hospital scenario, where doctor is examining patients, and emergency cases, and informing doctor of emergency cases, scheduling, etc.
 8. How to select unique elements from a table.
9. Calculate number of legs, and give formula:
7 persons, each person having 7 bags, each bag having 7 dogs, and each dog having 7 puppies.
Bank Of America, Mumbai
1. How is semaphore implemented.
2. How to ensure only one instance in Singleton class.
3. How synchronization works, inside JVM.
4. Master JVM is holding objects. Slave JVM's are reading and writing the object.
What problems can come and what is solution.
5. How Garbage Collection works in JVM.
6. When are class garbage collected, when are classes unloaded.
7. How to ensure that instance is never garbage collected.
8. Different protocols for Web Services.
9. Managing Session in Web Services
10. RMI, how EJB works
11. passing objects between two JVM's.
12. role of WSDL, stubs, etc. in Web Services.
13. Garbage Collection Tuning parameters.
14. LAZY Loading in ORM.
15. Why there is Stack and heap in JVM. Why this structure.
16. how much memory a program consumes, when no objects created inside program.
1. JCA - Tntro
2. JPA - Intro
3. EAGER/ LAZY loading
4. JPA relations
5. IBM MQ Objective
6. JMS session, Thread Safety In JMS
7. JMS providers
8. JMS implementation
9. AQ Provider and implementation using JNDI
10. Garbage Collection algorithms.
11. SOAP
12. JTA.
RBS, Gurgaon
1. Hashing Algorithm.
2. Search Algorithm
3. MVC design Pattern.
4. Multi threading Objectives.
5. Design Application which handles a particular operation for millions of trades, operation like trade validation. Application communicates external application for data.
6. Several Designing Questions.
7. Scala, Open Source high performing JMS implementations, Dynamic languages
8. Cache Implementation.
9. How Oracle Coherence works.
Gemstone, Pune
1. Why Use EJB 3.0
2. Why use JPA
3. Why use JPA with ORM, Standard ORM products
4. Writing Complex Query while using JPA and ORM
5. JTA, how it works
6. JTA, how to use
7. Where to use JDBC and where ORM
8. JSF Architecture, who is model, view and controller
9. Optimistic Lock in JPA - Detailed
10. Multithreaded program to read large file/ code and design
HP Labs, Banglore
1. Latch, Barrier, synchronized, Lock, ThreadPoolExecuter, Garbage Collection Algo.
2. Which forums do you follow,  Cloud Computing, How to improve programming and Coding skills.
3.  Design horse racing application.
4. Struts, how to extend struts. Why to use Struts.
5.  Where to use messaging
1. Core Java SCJP level
2. Current Project Architecture
3. Message Flow in current project.
4. Fault Tolerance in JMS
5. Hashing Algo in Java, HashMap
6. Aggregation, Composition, Encapsulation, Abstraction.
7. How to implement Thread Pool
8. Complete Garbage Colection Algo.
9. DataSource, DB connection pool - how it works.
10. Exception- Logic, how to handle and design Exception, Error.
11. NIO-concept, Socket handling using NIO.
12. Adapter Design pattern
13. Stored Proc, Trigger
14. JDK1.5 Thread pool
15. Serialization, Externalization
16. Immutable, FInal, Usage of Final and Immutable
17. Exception handling in the spawned threads
18. RESTful Web Services, JEEVAN
19. Types of Parsers
1. deep copy and shallow copy
2. hash code
3. graceful shutdown of threads
4. NaN, 0/0
5. wrapper classes.
6. BootStrap Class Loader
7. Oracle OCI.
8. Oracle JDBC drivers
9. Oracle Thin and Thick Clients.
10. Oracle Rak
11. MQ API, MQ classes for Java.
12. MessageId, Priority for MQ message.
13. Garbage Collection
14. Java Green Threads
15. Coorelated Sub Queries.
16. Steps to make connection to MQ using Java.
17. Options to set for browsing message in MQ Queue.
18. Client Mode/Bindings Mode.
19. Log4j
1. How would you implement your own Thread Pool.
2. What is NIO
3. Best way to read data from socket using traditional I/O
4. Beat way to read file.
5. How to read characters from InputStream.
6. How Serialization works
7. Synchronization
8. how to implement counter in servlet.
9. how hashtable is thread safe.
10. CharSet, Enocder, Decoder
11. AtomicInteger, volatile, transient.
12. JTA, XA, how XA works, two phase commit.
13. Role of transaction manager in XA transaction.
14. XADataSource, and closing XADataSource.
15. UserTransaction
16. JCA Architecture
1. How would you implement cache in java.
2. How would you implement auto complete using java, jsp.
3. How would you implement a chache with auto complete feature.
4. How to make class immutable.
5. How to serialize immutable class.
6. How to gaurantee message delivery in JMS, fault tolerance
7. Message recovery techniques and redelivery.
8. DROP table.
9. Oracle Stored Procedure or function
10. Struts architecture
11. JSP Exception Handling
12. java Objective SCJP Level
13. java Design Pattern
14. J2EE objective
15. Implementing references
16. UNIX Commands - grep, softlink, hardlink
17. Serializable, Externalizable, Cloneable
JDk 1.5 Collections classes.
Multintreading Concepts - Barrier, Latch, etc.
JMS functioning
SQl Query using JOINS
Java Design pattern - detail
Overloading, Overriding, Comparator, Comparable, hashCode(), equals(), Serializable, Externalizable, RMI, garbage Collection.
Please read my more posts in the same category - Answers Java/J2EE interview questions
Share:

Mar 21, 2010

How JTA(Java transaction API) works


I am taking a scenario, where an Application thread starts a transaction, perform operation on two different resources( for eg., JDBC, and JMS), and finnaly commits the transaction.

Scenario:
A thread (within Application) wants to start a global transaction, perform operations on several resources, and depending on final status, would commit or rollback, and all within one transaction.

Thread starts the Global Transaction.
TransactionManager.begin( ) is called, a global transaction associates the Transaction Context with the global transaction.

Thread calls TransactionManager.getTransaction( ).
Transaction object is returned. Transaction object represents the Transaction Context associated with the transaction.

Thread associates a resource (assume JDBC resource) with Transaction Object.
TransactionManager.enlistResource( XAResource) would be called. XAResource identifies the resource.
XAResource.start(Xid, flag ) is called. This method is called by Transaction Manager, this method associates the global transaction with the resource.
If the XAResource object represents a resource manager who has not previously seen the global transaction, the TM establishes a different transaction branch ID and ensures that this new resource manager is informed about the transaction completion with proper prepare-commit calls. XID, is the transaction branch identifier. XID is given from TM ro RM. All the RM’s work in support of this global transaction, would be part of only one branch, which has been started right now.

Thread associates another resource (assume JMS resource) with Transaction Object.
Note: for each resource in use by the application thread, the application server invokes the enlistResource( ) method and specifies the XAResource object that identifies the resource in use. The enlistResource( ) request results in the TransactionManager informing the resource manager to start associating the transaction with the work performed through the corresponding resource – by invoking the XAResource.start( ) method.
This Transaction Context represented by Transaction object, already has another XAResource object participating in the transaction, the TransactionManager invokes the XAResource.isSameRM( ) method to determine the specified XAResource represents the same resource manager instance. This returns false, in our case because, the new request is for new JMS resource, and one already registered is JDBC resource.
So, the TM establishes a different branch transaction ID and invokes XAResource.start( ) method, passing the new XID.
The pseudo-code:
public boolean enlistResource(XAResource xares){
…….
// Assume there is already xid1, and a xaRes1 associated with TM.
….
boolean sameRM = xares.isSameRM(xaRes1);
if(sameRM){
xares.start(xid1, TMJOIN);
} else {
xid1NewBranch = makeNewBranch(xid1);
xares.start(xidNewBranch, TMNOFLAGS);
}
Also, Application Server invokes getConnection( ) using Resource Adapter, to make connection, and returns the Connection object reference to the application.

Thread performs some JDBC operations and JMS operations.
Application Thread performs one or more operations on the Connection’s.

Application ends the Transaction, either invoking either commit or rollback.
Application closes the connections, JDBC connection and the JMS connection.
Application Server delist the resource, when notified by the resource adapter about the connection close.
The Transaction manager invokes the XAResource.end to dissociate the transaction from the XAResource.
The application server asks the Transaction Manager to commit the transaction.
Synchronization.beforeCompletion is called by Transaction Manager to notify the Application Server of start of Transaction.
The transaction manager invokes the XAResource.prepare to inform the resource maanger to prepare the transaction work for commit.
The transaction manager invokes the XAResource.commit to commit the transaction.
Synchronization.afterCompletion is called by Transaction Manager to notify the Application Server of start of Transaction.

Note:
The resource manager once opened, is kept open until the resource is released(closed) explicitly. When the application ninvokes the connection;s close( ) method, the resource adapter invalidates the connection object reference that was held by the application and notifies the application server about the close. The Transaction manager should invoke the XAResource.end method to disassociate the transaction from the connection.
The resource manager initialization is done implicitly by the resource adapter when the resource(connection) is acquired.
There is no xa_open equivalent in the XAResource interface.
Error return values that are caused by the transaction manager's improper handling of the XAResource object are mapped to Java exceptions via the XAException class.
The X/Open XA interface specifies that the transaction manager must initialize a resource manager(xa_open) prior to any xa_ calls.
Knowledge of initializing a resource manager is embedded within the resource adapter that represents the resource manager.

While Coding:

When we are writing an application to use the JTA, we should not be concerned about the details of the distributed transaction management. This is the job of distributed transaction infrastructure - application server, transaction manager, and the resource adapter.
For eg.
While performing JDBC operations, we have application server, transaction manager, and JDBC drivers as distributed transaction infrastructure.
UserTransaction—The javax.transaction.UserTransaction interface provides the application the ability to control transaction boundaries programmatically. The javax.transaction.UserTransaction method starts a global transaction and associates the transaction with the calling thread.
We should not call any method directly on the connection, like: commit(), and rollback() on connection, while performing operation within JTA transaction boundaries.

!!!Any Comments would be really appreciated!!!
Share:

Feb 27, 2010

Garbage Collection Algorithm - JDK Implementation


There are several basic strategies for garbage collection: reference counting, mark-sweep, mark-compact, and copying. In addition, some algorithms can do their job incrementally (the entire heap need not be collected at once, resulting in shorter collection pauses), and some can run while the user program runs (concurrent collectors). Others must perform an entire collection at once while the user program is suspended (so-called stop-the-world collectors). Finally, there are hybrid collectors, such as the generational collector employed by the 1.2 and later JDKs, which use different collection algorithms on different areas of the heap.

Old objects, young objects

In any application heap, some objects become garbage shortly after their creation, some survive for a long time and then become garbage, and others can remain live for the entirety of the program's run. Empirical studies have shown that for most object-oriented languages, the Java language included, the vast majority of objects -- as much as 98 percent, depending on your metric for object youth -- die young. One can measure an object's age in wall-clock seconds, in total bytes allocated by the memory management subsystem since the object was allocated, or the number of garbage collections since the object was allocated. But no matter how you measure, the studies show the same thing -- most objects die young. The fact that most objects die young has significance for the choice of collector. In particular, copying collectors perform quite well when the majority of objects die young, since copying collectors do not visit dead objects at all; they simply copy the live objects to another heap region, then reclaim the whole of the remaining space in one fell swoop.

Of the objects that survive past their first collection, a significant portion of those will become long-lived or permanent. The various garbage collection strategies perform very differently depending on the mix of short-lived and long-lived objects. Copying collectors work very well when most objects die young, because objects that die young never need to be copied at all. However, the copying collector deals poorly with long-lived objects, repeatedly copying them back and forth from one semi-space to another. Conversely, mark-compact collectors do very well with long-lived objects, because long-lived objects tend to accumulate at the bottom of the heap and then do not need to be copied again. Mark-sweep and mark-compact collectors, however, expend considerably more effort examining dead objects, because they must examine every object in the heap during the sweep phase.

algorithms

Reference counting


The most straightforward garbage collection strategy is reference counting. Reference counting is simple, but requires significant assistance from the compiler and imposes overhead on the mutator (the term for the user program, from the perspective of the garbage collector). Each object has an associated reference count -- the number of active references to that object. If an object's reference count is zero, it is garbage (unreachable from the user program) and can be recycled. Every time a pointer reference is modified, such as through an assignment statement, or when a reference goes out of scope, the compiler must generate code to update the referenced object's reference count. If an object's reference count goes to zero, the runtime can reclaim the block immediately (and decrement the reference counts of any blocks that the reclaimed block references), or place it on a queue for deferred collection.

Reference counting is simple, lends itself well to incremental collection, and the collection process tends to have good locality of reference, but it is rarely used in production garbage collectors for a number of reasons, such as its inability to reclaim unreachable cyclic structures (objects that reference each other directly or indirectly, like a circularly linked list or a tree that contains back-pointers to the parent node).

Tracing collector

None of the standard garbage collectors in the JDK uses reference counting; instead, they all use some form of tracing collector. A tracing collector stops the world (although not necessarily for the entire duration of the collection) and starts tracing objects, starting at the root set and following references until all reachable objects have been examined. Roots can be found in program registers, in local (stack-based) variables in each thread's stack, and in static variables.

Mark-sweep collectors

The most basic form of tracing collector, in which the world is stopped and the collector visits each live node, starting from the roots, and marks each node it visits. When there are no more references to follow, collection is complete, and then the heap is swept (that is, every object in the heap is examined), and any object not marked is reclaimed as garbage and returned to the free list.

The big problem with mark-sweep is that every active (that is, allocated) object, whether reachable or not, is visited during the sweep phase. Because a significant percentage of objects are likely to be garbage, this means that the collector is spending considerable effort examining and handling garbage. Mark-sweep collectors also tend to leave the heap fragmented, which can cause locality issues and can also cause allocation failures even when sufficient free memory appears to be available.

Copying collectors

In a copying collector, another form of tracing collector, the heap is divided into two equally sized semi-spaces, one of which contains active data and the other is unused. When the active space fills up, the world is stopped and live objects are copied from the active space into the inactive space. The roles of the spaces are then flipped, with the old inactive space becoming the new active space.

Copying collection has the advantage of only visiting live objects, which means garbage objects will not be examined, nor will they need to be paged into memory or brought into the cache. The duration of collection cycles in a copying collector is driven by the number of live objects. However, copying collectors have the added cost of copying the data from one space to another, adjusting all references to point to the new copy. In particular, long-lived objects will be copied back and forth on every collection.

Mark-compact collectors

The copying algorithm has excellent performance characteristics, but it has the drawback of requiring twice as much memory as a mark-sweep collector. The mark-compact algorithm combines mark-sweep and copying in a way that avoids this problem, at the cost of some increased collection complexity. Like mark-sweep, mark-compact is a two-phase process, where each live object is visited and marked in the marking phase. Then, marked objects are copied such that all the live objects are compacted at the bottom of the heap. If a complete compaction is performed at every collection, the resulting heap is similar to the result of a copying collector -- there is a clear demarcation between the active portion of the heap and the free area, so that allocation costs are comparable to a copying collector. Long-lived objects tend to accumulate at the bottom of the heap, so they are not copied repeatedly as they are in a copying collector.

Generational collection

A generational collector divides the heap into multiple generations. Objects are created in the young generation, and objects that meet some promotion criteria, such as having survived a certain number of collections, are then promoted to the next older generation. A generational collector is free to use a different collection strategy for different generations and perform garbage collection on the generations separately.

Minor collections

One of the advantages of generational collection is that it can make garbage collection pauses shorter by not collecting all generations at once. When the allocator is unable to fulfill an allocation request, it first triggers a minor collection, which only collects the youngest generation. Since many of the objects in the young generation will already be dead and the copying collector does not need to examine dead objects at all, minor collection pauses can be quite short and can often reclaim significant heap space. If the minor collection frees enough heap space, the user program can resume immediately. If it does not free enough heap space, it proceeds to collect higher generations until enough memory has been reclaimed. (In the event the garbage collector cannot reclaim enough memory after a full collection, it will either expand the heap or it will throw anOutOfMemoryError.)

The JDK 1.4.1 default collector

By default, the 1.4.1 JDK divides the heap into two sections, a young generation and an old generation. (Actually, there is also a third section, the permanent space, which is used for storing loaded class and method objects.) The young generation is divided into a creation space, often called Eden, and two survivor semi-spaces, using a copying collector.

The old generation uses a mark-compact collector. Objects are promoted from the young generation to the old generation after they have survived copying a certain number of times. A minor collection will copy live objects from Eden and one of the survivor semi-spaces into the other survivor space, potentially promoting some objects to the older generation. A major collection will collect both the young and old generation. The System.gc() method always triggers a major collection, which is one of the reasons you should use System.gc() sparingly, if at all, because major collections can take much longer than a minor collection. There is no way to programmatically trigger a minor collection.

Different algorithms can be used to perform garbage collection in the different generations, each algorithm
optimized based on commonly observed characteristics for that particular generation. Generational garbage
collection exploits the following observations, known as the weak generational hypothesis, regarding
applications written in several programming languages, including the Java programming language:
• Most allocated objects are not referenced (considered live) for long, that is, they die young.
• Few references from older to younger objects exist.

Below is the table which summarizes the algorithm used by different types of Garbage Collectors in J2SE 5.0:




Parallel Compacting collector - Old Generation:
(Mark - Summary - Compact)


Marking Phase:
Several threads would run.
Heap would be divided into regions.
One thread per region would run.
Each region has data. Data would be updated by thread with information about the size & location of the live objects.

Summary Phase:
(One thread)

Summary phase examines the density of each region & calculates a point after which, space recovered after compacting would be worth. Region left of that point is referred as "Dense Prefix".
Assume, our heap is in following stage(several cycles of GC has already run).

Our heap reached this state after several GC cycles, so, the left side of heap is dense & contains most live objects. If compacting is performed in the left most region, then very little amount of space would be recovered, and which is not worth.
Summary Phase examines the density of regions, starting from the left most, until it reaches a point where the space that could be recovered from a region and those to right of it, is worth. Worth the cost of compactng those regions.

Compaction phase:
(Multiple threads)

Summary phase calculates and stores the new location of the first byte of the live objects, in data associated with each region. Regions to the right have to be compacted. And we have to move all objects within a region to left side of that region.
Compaction phase uses the data prepared by the summary phase to identify, which regions are to be compacted, and which objects to move, and new location of the objects(in case they are to be moved). Multiple threads run independently, one in each region.

Thus, finally a single large empty block would be created at the right end.

CMS Collector - Old Generation



Initial Mark
Identifies the initial set of live objects, directly reachable form application code.

Concurrent Mark
Marks all live objects that are transitively reachable from the set(prepared by initial mark)

Remark
Revisit any object that was modified during the concurrent mark.
(Objects created while Concurrent mark was running, would be marked now)

At the end of markings, it is guaranteed all live objects on the heap have been marked.
Now, Concurrent Sweep Phase, reclaims all the garbage that has been identified.

CMS Collector is not compacting, which means it does not move the live objects to one end of the old generation.
Since the free space is not contigous, the collector can no longer use a simple pointer indicating the next free location into which the next objext can be allocated.
It employs free lists. Lists linking together unallocated regions of memory.
Each time an object needs to be allocated, the appropriate list(based on the amount of memory needed) must be searched for a region large enough to hold the object.

CMS has advantage in term of pause time, in the figure we can see that total pause time is Pause1 + Pause2. When multiple threads( i.e. n) are running, then pause time is reduced by n.


New JDK7 Garbage Collector:

The Garbage First(G1) Garbage Collecor
Like Other HotSpot GC's, G1 is generational means it has concept of old objects and new objects. GC cycles visit new objects very frequently, and old objects less frequently.
Big difference between G1 and other Garbage collector is that now there is no physical difference between young generation and old generation.
Whole heap is divided into multiple regions.
Each region is of same size.
Few set of regions belong to young generation and few set to old generation.
SO, young generation is a non-contiguous set of regions, and similar is old generation. G1 is flexible enough to move objects from new to old and vice versa any time.
Initially, a few st of regions(belonging to young generation) are considered as part of collection set. so all live objects are marked in the regions belonging to collection set. Now there is a evacuation pause, and during this pause, survivor objects from the collection set are evacuated to another set of regions(the to-spaces), and then collection set region is reclaimed.
Evacuation pauses collect the young generation, but sometimes old generations are also collected during the pause time.
In regions belonging to old generation, there is Concurrent marking(similar to CMS old generation collector). Profitable regions are marked, i.e. where performing collection is worth. There is no concurrent sweeping. The most profitable old regions identified by marking phase are collected during the evacuation pause.
Unlike CMS, G1 performs compaction over time. Compaction eliminates fragmentation, which was a big issue with CMS collector.

Reference: http://www.ibm.com/developerworks/java/library/j-jtp10283/

Can also read very good explanation -
http://www.azulsystems.com/sites/default/files/images/Understanding_Java_Garbage_Collection_v3.pdf

A more updated post on memory management and garbage collection is below -
http://shekup.blogspot.co.uk/2011/11/java-runtime-memory-management.html


Share:
© Shift, ShEkUP, Shape, and Surprise | All rights reserved.
Blogger Template Crafted by pipdig