analytics

Wednesday, October 14, 2009

Java Collections Framework

Collection

A collection is an object that represents a group or collection of objects.

A collections framework is a unified architecture for representation and manipulation of such collections.

The collections framework consists of collection interfaces and some general purpose implementations of these interfaces.


Overview of Collection framework

Generically speaking, a collection is a group of objects. The objects that are stored in a collection object are called as the elements of that collection. Usually all collection objects provide following basic functionality. You can…

1. Add objects to the collection.

2. Remove objects from the collection.

3. Search an object in the collection.

4. Retrieve an object from the collection (without removing it).

5. Iterate through the collection, looking at each element (object) one after another (to do some operation on each one of them).

Beside this basic functionality, some collections have specialized roles. The collection framework is made up of several such classes as well as standard interfaces. Let us divide this framework in three main parts for sake of simplicity as:

1. Six core interfaces

2. Classes implementing the core interfaces.

3. Utility classes with lots of static utility methods




Figure 14.1 shows the collection framework with these three main parts. The entire collection framework lies in the package java.util.

Figure 14.1 The collection framework in java.util package with its three main parts

Note that only prominent classes and interfaces are shown in the figure. There are other classes in the collection framework. But we will learn about only about the most prominent ones in this chapter. Also note that java.util package has other useful classes which are not part of collection framework such as Date, Calendar etc. But we will not study them.


Collection Interfaces

There are six collection interfaces. The most basic interface is Collection. Three interfaces that extend Collection are Set, List, and SortedSet. The other two collection interfaces, Map and SortedMap, do not extend Collection, as they represent mappings rather than true collections.

Collection interfaces can be categorized as ...

• Set is an interface, which models mathematical abstraction. Set provides an API for a collection that contains no duplicate elements. SortedSet interface extends this interface for ordered collection of unique elements.

• List is an interface for ordered collection. Unlike sets, lists typically allow duplicate elements. ArrayList, Vector, LinkedList are the some of the classes which implement this interface.

• Map is an interface for a collection of key-value pairs. It is not an ordered collection. Repetitions are not allowed. It also provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings.Interface SortedMAp extends this interface for ordered key- value pairs. Classes HashMap, Hashtable are some implementations of this interface.


Six core interfaces

The collection framework has six interfaces that generically describe the behavior of different types of collection. Figure 14.2 shows the six interfaces and their relationship among each other.



Figure 14.2 The six core interfaces in the collection framework

Figure shows two interface hierarchies. The Collection interface is extended to define specialized interfaces List and Set. Set is further extended to defineSortedSet interface. The Map interface is independent of Collection interface. It is extended to define SortedMap interface. Let us quickly review these six interfaces and their purpose.

java.util.Collection

This is the generalized interface for maintaining collections. It describes the generic behavior for adding, removing elements in the collection. This interface is the super interface for more specialized interfaces such as Set and List. No concrete top-level class in java.util directly implements this interface.

java.util.Set

The Set interface extends the Collection interface. Mathematically a set is a group of all unique elements. Similarly, the Set interface describes a collection that has all unique elements. In other words, no duplicate elements are allowed in a collection that implements this interface.

java.util.List

The List interface also extends Collection interface. Unlike set, a list defines a collection that need not have all unique elements. In other words, there can be duplicate elements (that means same object can occur multiple times) in a list. Also the List interface describes a collection that can maintain order in which the elements were inserted. In simple words, a list is an ordered collection that can have duplicate elements.

java.util.SortedSet

The SortedSet interface extends from the Set interface to add the sorting functionality. This interface describes behavior of collection in which no duplicates are allowed and elements are kept in sorted order. For instance, if you are storing strings in a collection that implements SortedSet, they are stored in alphabetically ascending order.

java.util.Map

A Map describes behavior of collection of elements in which each element is identifiable by unique key. In other words, it describes a group of mappings in the form of key-and-value pairs. Conceptually a Map is not a collection as it stores the mappings (or associations) instead of actual elements. For that reason, this interface is not extended from the Collection interface.

java.util.SortedMap

A SortedMap describes behavior of collection of key-and-value pairs where these pairs are sorted as per the key. This interface extends from the Map interface.

Collections and hashcode() method

While managing the objects in collections, a collection often compares the objects. This usually done with equals() method. Therefore, if you wish that the logically similar objects of your class should behave properly while stored in collection, it is sometimes necessary to override the equals() method in your class.

In such case equals() method should be overridden such that it will do "Logical Equality" test between two objects.

You must override hashCode in every class that overrides equals(). Equal objects must have equal hash codes. In other words, If two objects are equal according to the equals(Object)method, then calling the hashCode method on each of the two objects must produce the same integer result.


The hashCode() method

The hashCode() method defined in Object class. This method returns a hashcode value for the object. The hash code value is an integer value which is (somehow) derived from the object. The default implementation in Object class provides a unique integer for each object. Overriding this method is important in the context of collection framework. This especially true with the collections that use the hash code of an object to decide where (at what position) that object is stored. These types of collections use hash table data structure and use the hashing algorithm to store and retrieve the elements. Let us quickly review how hashing algorithm works before we learn more about the hashCode() method.


The Hashing algorithm


The ability to quickly search an element in collection of many elements is very important characteristic for collections. The hashing algorithm work keeping essentially this thing in mind. For quick search it is very important to see how the elements are stored in the collection. When elements are stored in the order they are received, you need to search an element sequentially going through all elements one-by-one until the desired one is found. But hashing uses a different approach, the elements are stored sequentially. Let us imagine the storage units as buckets and we want to store flowers in those buckets. When you want to store a flower, you need to do two things.

1. Calculate in which bucket the flower will be stored. In other words, find the position of an flower (using some hashing function)

2. Store the flower in the bucket at that position.

For instance, if you wish to store the flowers “Daisy” “Sunflower” “Orchids” in that collection of buckets. For simplicity, assume that the hash function is as simple as the length of the flower names. Therefore “Daisy” will be stored in the fifth bucket, “Sunflower” at ninth bucket and “Orchid” at sixth bucket. Figure 14.4 illustrates the working of simple hash algorithm. First it shows how the hash code if calculated using the length of flower name.



Figure 14.4 A simple hashing algorithm showing how flowers are placed in empty buckets based on their hashcode

The figure also shows the empty buckets and how the hash code is then used to place flowers in a particular bucket.

The obvious advantage of storing elements like this is while searching. For instance, you want to search “Rose” in this collection. The hashing algorithm again does two things:

1. Calculate the position of “Rose” using a hashing function. It will be 4 in this case.

2. See if the flower in the bucket at that position is indeed a “Rose”.

Thus the searching is performed very quickly. If you had stored the flowers sequentially, you would need to compare all elements in that collection before finally declaring that “Rose” is not in that collection.

Another valid concern you may have is what if two flowers have the same position as per the hash function. For example, if we want to store say “Lotus” in the same collection, it would have same position as “Daisy” because our hash function calculates the position by calculating length. This specific situation is perfectly valid and is called as ‘Collision’ in hashing algorithm. Various algorithms have different ways to tackle the collisions such as storing the object in immediately next bucket etc. We need not go into those details as you need to know the hashing algorithm just enough to understand the concept behind hashing. The position calculated in this example is the hash code for flower object. Remember that a good hashing algorithm produces a hash code for objects in such a way that they are evenly distributed across the buckets in the collection.


The hashCode() method and classes in collection framework


Similar to the example of flowers being stored in buckets, objects can be stored in collections that uses hashing. The hash codes of objects are used to decide their position in such collections. In other words, when you put an object in a collection, the collection uses the hash code of that object to decide in which bucket (position in collection) the object should land. Similarly when you want to retrieve that object, you need give the collection a reference to an object to be searched. Now the object you’re trying to search can be found only when some object in the collection has the same hash code as the object you’re searching. This is precisely why when two objects are equal (according to the equals() method) then their hash code must be equal. If not then the object can never be found because you cannot locate the correct bucket to look for it.

The classes in collection framework that uses hashing use the hashCode() method of an object to get that object’s hash code. Since the default implementation ofhashCode() in Object class always produce a unique number, this method is not useful (efficient) for hashing as it will store two equal objects in two separate buckets even if they should be stored in a single bucket. Therefore whenever you override equals() method (so that you can state what makes two object equal), you should override the hashCode method as well so that it gives same hash code value for two equal objects (even if they are two distinct objects in memory).


Benefits and constraints of using different data structures


Array

Benefits

• Data access is fast.

• Good for ordered data, which is not changed or searched frequently.

Constraints

• Inefficient if number of elements grow.

• Inefficient if an element to be inserted in middle of collection.

• Provides no special search mechanism.


Linked List

Benefits

• Allows efficient inserts/delete at any location

• Allows arbitrary growth of collection.

• Applying order to the elements is easy.

Constraints

• Slower while accessing elements by index.

• No special search mechanism is provided.



Tree

Benefits

• Easy addition/deletion in middle.

• Allows arbitrary growth.

• A better and efficient search mechanism.


Constraints

• Ordering is peculiar and some comparison mechanism is required.

• Searching is not efficient for unevenly distributed data.



Hashtable

Benefits

• Efficient searching.

• Good access mechanism.

• Allows arbitrary growth of collection.


Constraints

• Not good for small data set because of overheads.

• Overhead for storing keys for the values in collection.

• Overhead of hashing scheme.


Few Important Interfaces which are very useful in Collection Framework

java.lang
Interface Comparable

This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class'scompareTo method is referred to as its natural comparison method.

Lists (and arrays) of objects that implement this interface can be sorted automatically by Collections.sort (and Arrays.sort). Objects that implement this interface can be used as keys in a sorted map or elements in a sorted set, without the need to specify a comparator.

The natural ordering for a class C is said to be consistent with equals if and only if (e1.compareTo((Object)e2) == 0) has the same boolean value ase1.equals((Object)e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null)NullPointerException even though e.equals(null) returns false. should throw a

Method Detail :

public int compareTo(Object o)

Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.

In the foregoing description, the notation sgn(expression) designates the mathematical signum function, which is defined to return one of -1, 0, or 1according to whether the value of expression is negative, zero or positive. The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)

The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

Finally, the implementer must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.

It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements theComparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals."

Parameters:

o - the Object to be compared.

Returns:

a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.

Throws:

ClassCastException - if the specified object's type prevents it from being compared to this Object.



java.util
Interface Comparator

A comparison function, which imposes a total ordering on some collection of objects. Comparators can be passed to a sort method (such as Collections.sort) to allow precise control over the sort order. Comparators can also be used to control the order of certain data structures (such as TreeSet or TreeMap).

The ordering imposed by a Comparator c on a set of elements S is said to be consistent with equals if and only if (compare((Object)e1, (Object)e2)==0)has the same boolean value as e1.equals((Object)e2) for every e1 and e2 in S.

Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit Comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.

Method Detail :

public int compare(Object o1, Object o2)

Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)

The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.

Finally, the implementer must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.

It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."

Parameters:

Returns:

Throws:

ClassCastException - if the arguments' types prevent them from being compared by this Comparator.

a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

o1 - the first object to be compared.

o2 - the second object to be compared.

public boolean equals(Object obj)

Indicates whether some other object is "equal to" this Comparator. This method must obey the general contract of Object.equals(Object). Additionally, this method can return true only if the specified Object is also a comparator and it imposes the same ordering as this comparator. Thus,comp1.equals(comp2) implies that sgn(comp1.compare(o1, o2))==sgn(comp2.compare(o1, o2)) for every object reference o1 and o2.

Note that it is always safe not to override Object.equals(Object). However, overriding this method may, in some cases, improve performance by allowing programs to determine that two distinct Comparators impose the same order.

Overrides:

equals in class Object



Parameters:

Returns:

true only if the specified object is also a comparator and it imposes the same ordering as this comparator.

obj - the reference object with which to compare.


Utility Classes for Collection Framework

java.util
Class Collections


public class Collections
extends Object

This class consists exclusively of static methods that operate on or return collections. It contains polymorphic algorithms that operate on collections, "wrappers", which return a new collection backed by a specified collection, and a few other odds and ends.

The methods of this class all throw a NullPointerException if the collections provided to them are null.

The documentation for the polymorphic algorithms contained in this class generally includes a brief description of the implementation. Such descriptions should be regarded as implementation notes, rather than parts of the specification. Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used by sort does not have to be a mergesort, but it does have to be stable.)

The "destructive" algorithms contained in this class, that is, the algorithms that modify the collection on which they operate, are specified to throwUnsupportedOperationException if the collection does not support the appropriate mutation primitive(s), such as the set method. These algorithms may, but are not required to, throw this exception if an invocation would have no effect on the collection. For example, invoking the sort method on an unmodifiable list that is already sorted may or may not throw UnsupportedOperationException.

This class is a member of the Java Collections Framework.



java.util
Class Arrays

public class Arrays
extends Object

This class contains various methods for manipulating arrays (such as sorting and searching). This class also contains a static factory that allows arrays to be viewed as lists.

The methods in this class all throw a NullPointerException if the specified array reference is null.

The documentation for the methods contained in this class includes briefs description of the implementations. Such descriptions should be regarded asimplementation notes, rather than parts of the specification. Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used by sort(Object[]) does not have to be a mergesort, but it does have to be stable.)

This class is a member of the Java Collections Framework.

No comments:

Post a Comment