This tutorial is about Sets (java.util.Set interface). I assume that the audience have mathematical background of set theory.
Sets cannot contain duplicate elements and you cannot add elements at a certain position unlike List. The classes implementing the Set interface are:AbstractSet, HashSet, LinkedHashSet, TreeSet
Set has 3 general purpose implementations:
- HashSet
- TreeSet
- LinkedHashSet
HashSet does not offer any ordering of elements. So HashSet is not the right choice if you value oriented operations are required. For that, TreeSet is proffered. But it is true to say that HashSet is used more often than TreeSet because HashSet is more faster and the reason is obvious i.e no ordering of elements. LinkedHashSet is something in between HashSet and TreeSet. It is as fast as HashSet and is implements using HashSet and LinkedList. The benefit of LinkedHashSet is that it provides ordering feature but the cost of ordering is not even close to what provided by TreeSet.
HashSet
HashSet implement Cloneable, Collection, Serializable and Set interface.
Since HashSet implements Cloneable interface, it means that HashSet can use clone() method to make a copy (field to field) of its object. HashSet is also serializable, so it can be saved on disk and then retrieved later for use.
JobStateReasons and LinkedHashSet are direct subclasses of HashSet.
HashSet has 4 constructors:
HashSet()
HashSet(Collection c)
HashSet(int initialCapacity)
HashSet(int initialCapacity, float loadFactor)
If you use default HastSet constructor, it will create an empty HashSet of default initial capacity (16) and load factor (0.75).
Add() method of the HashSet add the object into the storage if it is not already present. Duplicates are not allowed as it’s a set.
Lets review the code below. We will just add unique values to a HashSet and will display the contents of HashSet.
Output:
As you know, HashSet does not allow duplicates, so lets try this in the following example.
Output:
We tried to add duplicates into the HastSet which is not allowed. Thing
to note is, that no exception will be raised in case of duplicates.
Also note the order of entries in HashSet. The order of elements is not
maintained.
Creating a HashSet through a Collection
HashSet(Collection c) constructor is used to store a collection as a HastSet. It is very useful in scenarios where you have some collection (it can be a List (e.g. Vector/ArrayList) and you want to store it into a HashSet. Lets review the following example where a HashSet is created using a Vector.
Writing/Retriving HashSet to disk
Since HashSet implements Serializable interface, its objects can be written and retrieved from the disk. In other words, we can save the state of a HashSet and can retrieve it when needed.
Union and Intersection Example
Following is an interesting example of union and intersection. I assume that you are familiar with basics of set theory and know what union and intersection are.
Output:
TreeSet
Treeset implements Cloneable, Collection, Serializable, Set and SortedSet interfaces. This class maintains order of the elements in it according to the natural order of the elements which was missing in the HashSet.
TreeSet is not synchronized so if a TreeSet is concurrently accessed by threads then thread modifying the contents must be synchronized externally.
TreeSet provides following constructors:
TreeSet()
TreeSet(Collection c)
TreeSet(Comparator c)
TreeSet(SortedSet s)
Lets dig deep into TreeSet using example. In the following example, we created an empty TreeSet using default constructor. Then we added some values into it.
Output:
We used iterator to print the contents of the TreeSet. Thing to note is
the order of storage. Order of storage is ascending one.
We cannot insert element at a particular index and also cannot fetch element from a particular index. This is done in List. Also duplicates are not allowed in TreeSet. This is a property of Sets. If you try to add duplicate value, no exception will be thrown. Lets do this with an example:
Output:
Output suggests two things. The objects are stored in sorted order and duplicates are not allowed.
Contents of TreeList can be cleared using clear() method. Following example creates a TreeSet from a Vector and then clears the contents.
Output:
Creating a TreeSet through a Collection
A TreeSet can be created using any collection. TreeSet(Collection c) is the constructor to use in such condition. Any collection for example: Vector, ArrayList, HashSet or any can be used to create a TreeSet. Following is an interesting example:
Output:
Vector is List that can contain duplicates and its not ordered. In the
example above, we created a TreeSet from a Vector that contained an
unordered list of objects along with duplicates. Some of you might thing
that its not possible to create a TreeSet from it but it is possible.
The output of the above example shows that a TreeList is created from
the Vector with all the duplicated removed. Furthermore, contents of
newly created TreeList are ordered.
Can we create a TreeList from an exiting HashSet? Since HashSet is also a collection, so answer is yes. The following example first creates a HashSet, populates it and then creates a TreeSet using already created HashSet.
Output:
TreeSet with generics
Java 5 introduced generics which adds more power to Java. TreeSet stores objects in it, but that object can be of any type. It can be a String, Integer, Float or even a custom object. With generics, you can specify the type of objects that are allowed to be stored in a TreeSet. In the following example, I tried to store an integer into a Vector that is bounded by String.
Output:
TreeSet Example
You now will have some understanding of TreeSet. Review the example below. It has some interesting lines of code.
Output:
As we already know, ordering is important in TreeList. This is shown in
the example below. We also see methods like first, last, headset,
tailSet which produced interesting results.
Method headSet() returns a view of the portion of this set whose elements are strictly less than toElement.
Method tailSet() returns a view of the portion of this set whose elements are greater than or equal to fromElement.
Differences between HashSet and TreeSet
Many developer are confused about sets and ask when to use HashSet and when to use TreeSet. Keep following points in mind for deciding the right set.
- HashSet is much faster than TreeSet.
- HashSet offers no ordering guarantees.
- TreeSet is preffered when performing operations in the SortedSet.
- HashSet can be tuned by specifying the right initial capacity using the constructor HashSet(int initialCapacity).
- HashSet can also be tuned used load factor parameter. If you choose right value for this, it can boost the performance a bit but there there won't be a big difference.
- TreeSet has no tuning parameters.
LinkedHashSet
LinkedHashSet handles collisions by using hash-bucket approach to avoid linear probing and rehashing. It uses doubly-linked list to maintain the insertion order. LinkedHashSet can accept the null entry and it is not synchronized.
LinkedHashSet implements Cloneable, Collection, Serializable and Set interface. It’s a set to no duplicates are allowed. LinkedHashSet has following 4 constructors:
LinkedHashSet()
LinkedHashSet(Collection c)
LinkedHashSet(int initialCapacity)
LinkedHashSet(int initialCapacity, float loadFactor
Thing will become obvious when you start using LinkedHashSet. Lets take an example to make things easier to understand.
Output:
We declared a LinkedHashSet and populated it. Then we used iterator to
print the contents of LinkedHashSet. The output suggests that the order
of insertion is maintained and duplicates are not allowed.
Set Example
Now we know what HashSet, TreeSet and LinkedHashSet are and when to use these. Lets review the following example that will use all of the 3 sets and will clear a lot of confusions:
Output:
The output suggests that no duplicate is added in any set. The order of
elements is all of the 3 sets is different. LinkedList maintained the
order of insertion. TreeSet maintained alphabetic order whereas HashSet
does not kept any order.
Sets cannot contain duplicate elements and you cannot add elements at a certain position unlike List. The classes implementing the Set interface are:AbstractSet, HashSet, LinkedHashSet, TreeSet
Set has 3 general purpose implementations:
- HashSet
- TreeSet
- LinkedHashSet
HashSet does not offer any ordering of elements. So HashSet is not the right choice if you value oriented operations are required. For that, TreeSet is proffered. But it is true to say that HashSet is used more often than TreeSet because HashSet is more faster and the reason is obvious i.e no ordering of elements. LinkedHashSet is something in between HashSet and TreeSet. It is as fast as HashSet and is implements using HashSet and LinkedList. The benefit of LinkedHashSet is that it provides ordering feature but the cost of ordering is not even close to what provided by TreeSet.
HashSet
HashSet implement Cloneable, Collection, Serializable and Set interface.
Since HashSet implements Cloneable interface, it means that HashSet can use clone() method to make a copy (field to field) of its object. HashSet is also serializable, so it can be saved on disk and then retrieved later for use.
JobStateReasons and LinkedHashSet are direct subclasses of HashSet.
HashSet has 4 constructors:
HashSet()
HashSet(Collection c)
HashSet(int initialCapacity)
HashSet(int initialCapacity, float loadFactor)
If you use default HastSet constructor, it will create an empty HashSet of default initial capacity (16) and load factor (0.75).
Add() method of the HashSet add the object into the storage if it is not already present. Duplicates are not allowed as it’s a set.
Lets review the code below. We will just add unique values to a HashSet and will display the contents of HashSet.
Java Code:
String [] strArray = new String[ 10 ]; for ( int i= 0 ;i<strArray.length;i++) strArray[i] = "Element" + i; HashSet s = new HashSet(); for ( int i= 0 ; i<strArray.length;i++){ if (!s.add(strArray[i])) System.out.println( "Duplicate found : " + strArray[i]); } System.out.println(s.size() + " distinct words detected : " + s ); } |
Java Code:
10 distinct words detected : [Element6, Element7, Element1, Element9, Element3, Element8, Element2, Element0, Element5, Element4] |
Java Code:
String [] strArray = new String[ 10 ]; for ( int i= 0 ;i<strArray.length;i++) strArray[i] = "Element" + i; strArray[ 1 ] = "Element2" ; strArray[ 7 ] = "Element3" ; strArray[ 8 ] = "Element4" ; HashSet s = new HashSet(); for ( int i= 0 ; i<strArray.length;i++){ if (!s.add(strArray[i])) System.out.println( "Duplicate found : " + strArray[i]); } System.out.println(s.size() + " distinct words detected : " + s ); } |
Java Code:
Duplicate found : Element2 Duplicate found : Element3 Duplicate found : Element4 7 distinct words detected : [Element6, Element9, Element3, Element2, Element0, Element5, Element4] |
Creating a HashSet through a Collection
HashSet(Collection c) constructor is used to store a collection as a HastSet. It is very useful in scenarios where you have some collection (it can be a List (e.g. Vector/ArrayList) and you want to store it into a HashSet. Lets review the following example where a HashSet is created using a Vector.
Java Code:
Vector vec = new Vector(); vec.add( "String1" ); vec.add( "String2" ); vec.add( "String3" ); HashSet hs = new HashSet(vec); |
Since HashSet implements Serializable interface, its objects can be written and retrieved from the disk. In other words, we can save the state of a HashSet and can retrieve it when needed.
Java Code:
HashSet hs = new HashSet(); hs.add( "String1" ); hs.add( "String2" ); hs.add( "String3" ); FileOutputStream f_out = new FileOutputStream( "C:\\hashset.data" ); ObjectOutputStream obj_out = new ObjectOutputStream (f_out); obj_out.writeObject (hs); System.out.println( "HashSet written on the disk." ); FileInputStream f_in = new FileInputStream( "C:\\hashset.data" ); ObjectInputStream obj_in = new ObjectInputStream (f_in); HashSet <String>hashset = new HashSet<String>((HashSet)obj_in.readObject()); System.out.println( "HashSet fetched from the disk." ); Iterator it = hashset.iterator(); while (it.hasNext()) System.out.println(it.next().toString()); |
Following is an interesting example of union and intersection. I assume that you are familiar with basics of set theory and know what union and intersection are.
Java Code:
public static void main(String[] args) { Set s1 = new HashSet(); s1.add( "Australia" ); s1.add( "Sweden" ); s1.add( "Germany" ); Set s2 = new HashSet(); s2.add( "Sweden" ); s2.add( "France" ); Set union = new TreeSet(s1); union.addAll(s2); print( "union" , union); Set intersect = new TreeSet(s1); intersect.retainAll(s2); print( "intersection" , intersect); } protected static void print(String label, Collection c) { System.out.println( "--------------" + label + "--------------" ); Iterator it = c.iterator(); while (it.hasNext()) { System.out.println(it.next()); } } |
Java Code:
--------------union-------------- Australia France Germany Sweden --------------intersection-------------- Sweden |
Treeset implements Cloneable, Collection, Serializable, Set and SortedSet interfaces. This class maintains order of the elements in it according to the natural order of the elements which was missing in the HashSet.
TreeSet is not synchronized so if a TreeSet is concurrently accessed by threads then thread modifying the contents must be synchronized externally.
TreeSet provides following constructors:
TreeSet()
TreeSet(Collection c)
TreeSet(Comparator c)
TreeSet(SortedSet s)
Lets dig deep into TreeSet using example. In the following example, we created an empty TreeSet using default constructor. Then we added some values into it.
Java Code:
TreeSet ts = new TreeSet(); ts.add( "String1" ); ts.add( "String3" ); ts.add( "String2" ); Iterator it = ts.iterator(); while (it.hasNext()) System.out.println(it.next().toString()); |
Java Code:
String1 String2 String3 |
We cannot insert element at a particular index and also cannot fetch element from a particular index. This is done in List. Also duplicates are not allowed in TreeSet. This is a property of Sets. If you try to add duplicate value, no exception will be thrown. Lets do this with an example:
Java Code:
TreeSet ts = new TreeSet(); ts.add( "Australia" ); ts.add( "Germany" ); ts.add( "Zambia" ); ts.add( "Australia" ); ts.add( "Zambia" ); Iterator it = ts.iterator(); while (it.hasNext()) System.out.println(it.next().toString()); |
Java Code:
Australia Germany Zambia |
Contents of TreeList can be cleared using clear() method. Following example creates a TreeSet from a Vector and then clears the contents.
Java Code:
Vector vec = new Vector(); vec.add( "String3" ); vec.add( "String1" ); vec.add( "String3" ); TreeSet ts = new TreeSet(vec); Iterator it = ts.iterator(); while (it.hasNext()) System.out.println( "TreeSet: " + it.next().toString()); ts.clear(); while (it.hasNext()) System.out.println( "TreeSet_Cleared: " + it.next().toString()); |
Java Code:
TreeSet: String1 TreeSet: String3 |
A TreeSet can be created using any collection. TreeSet(Collection c) is the constructor to use in such condition. Any collection for example: Vector, ArrayList, HashSet or any can be used to create a TreeSet. Following is an interesting example:
Java Code:
Vector vec = new Vector(); vec.add( "String3" ); vec.add( "String1" ); vec.add( "String3" ); Iterator it = vec.iterator(); while (it.hasNext()) System.out.println( "Vector: " + it.next().toString()); TreeSet ts = new TreeSet(vec); it = ts.iterator(); while (it.hasNext()) System.out.println( "TreeSet: " + it.next().toString()); |
Java Code:
Vector: String3 Vector: String1 Vector: String3 TreeSet: String1 TreeSet: String3 |
Can we create a TreeList from an exiting HashSet? Since HashSet is also a collection, so answer is yes. The following example first creates a HashSet, populates it and then creates a TreeSet using already created HashSet.
Java Code:
HashSet hs = new HashSet(); hs.add( "String2" ); hs.add( "String3" ); hs.add( "String1" ); hs.add( "String2" ); TreeSet ts = new TreeSet(hs); Iterator it = ts.iterator(); while (it.hasNext()) System.out.println( "TreeSet: " + it.next().toString()); |
Java Code:
TreeSet: String1 TreeSet: String2 TreeSet: String3 |
Java 5 introduced generics which adds more power to Java. TreeSet stores objects in it, but that object can be of any type. It can be a String, Integer, Float or even a custom object. With generics, you can specify the type of objects that are allowed to be stored in a TreeSet. In the following example, I tried to store an integer into a Vector that is bounded by String.
Java Code:
TreeSet <Integer>ts = new TreeSet<Integer>(); ts.add( 1 ); ts.add( 2 ); ts.add( 3 ); ts.add( "String2" ); |
Java Code:
1
| The method add(Integer) in the type TreeSet<Integer> is not applicable for the arguments (String) |
You now will have some understanding of TreeSet. Review the example below. It has some interesting lines of code.
Java Code:
SortedSet sortedSet = new TreeSet(Arrays. asList( "one two three four five six seven eight" .split( " " ))); System.out.println(sortedSet); Object low = sortedSet.first(); Object high = sortedSet.last(); System.out.println(low); System.out.println(high); Iterator it = sortedSet.iterator(); System.out.println( "low: " + low); System.out.println( "high: " + high); System.out.println(sortedSet.subSet(low, high)); System.out.println(sortedSet.headSet(high)); System.out.println(sortedSet.headSet(low)); System.out.println(sortedSet.tailSet(low)); System.out.println(sortedSet.tailSet(high)); |
Java Code:
[eight, five, four, one, seven, six, three, two] eight two low: eight high: two [eight, five, four, one, seven, six, three] [eight, five, four, one, seven, six, three] [] [eight, five, four, one, seven, six, three, two] [two] |
Method headSet() returns a view of the portion of this set whose elements are strictly less than toElement.
Method tailSet() returns a view of the portion of this set whose elements are greater than or equal to fromElement.
Differences between HashSet and TreeSet
Many developer are confused about sets and ask when to use HashSet and when to use TreeSet. Keep following points in mind for deciding the right set.
- HashSet is much faster than TreeSet.
- HashSet offers no ordering guarantees.
- TreeSet is preffered when performing operations in the SortedSet.
- HashSet can be tuned by specifying the right initial capacity using the constructor HashSet(int initialCapacity).
- HashSet can also be tuned used load factor parameter. If you choose right value for this, it can boost the performance a bit but there there won't be a big difference.
- TreeSet has no tuning parameters.
LinkedHashSet
LinkedHashSet handles collisions by using hash-bucket approach to avoid linear probing and rehashing. It uses doubly-linked list to maintain the insertion order. LinkedHashSet can accept the null entry and it is not synchronized.
LinkedHashSet implements Cloneable, Collection, Serializable and Set interface. It’s a set to no duplicates are allowed. LinkedHashSet has following 4 constructors:
LinkedHashSet()
LinkedHashSet(Collection c)
LinkedHashSet(int initialCapacity)
LinkedHashSet(int initialCapacity, float loadFactor
Thing will become obvious when you start using LinkedHashSet. Lets take an example to make things easier to understand.
Java Code:
LinkedHashSet lhs = new LinkedHashSet(); lhs.add( "String1" ); lhs.add( "String9" ); lhs.add( "String7" ); lhs.add( "String6" ); lhs.add( "String7" ); Iterator it = lhs.iterator(); while (it.hasNext()) System.out.println( "LinkedHashSet: " + it.next().toString()); |
Java Code:
LinkedHashSet: String1 LinkedHashSet: String9 LinkedHashSet: String7 LinkedHashSet: String6 |
Set Example
Now we know what HashSet, TreeSet and LinkedHashSet are and when to use these. Lets review the following example that will use all of the 3 sets and will clear a lot of confusions:
Java Code:
package set; import java.util.Arrays; import java.util.HashSet; import java.util.LinkedHashSet; import java.util.Set; import java.util.TreeSet; public class HastSetTest { static void fillSet(Set set) { set.addAll(Arrays.asList( "Zambia Australia Germany France" .split( " " ))); } public static void testSet(Set set) { System.out.println(set.getClass().getName().replaceAll( "\\w+\\." , "" )); fillSet(set); fillSet(set); fillSet(set); System.out.println(set); set.addAll(set); set.add( "Germany" ); set.add( "Germany" ); System.out.println(set); System.out.println( "s.contains(\"Zambia\"): " + set.contains( "Zambia" )); } public static void main(String[] args) { testSet( new HashSet()); testSet( new TreeSet()); testSet( new LinkedHashSet()); } } |
Java Code:
HashSet [Germany, France, Zambia, Australia] [Germany, France, Zambia, Australia] s.contains( "Zambia" ): true TreeSet [Australia, France, Germany, Zambia] [Australia, France, Germany, Zambia] s.contains( "Zambia" ): true LinkedHashSet [Zambia, Australia, Germany, France] [Zambia, Australia, Germany, France] s.contains( "Zambia" ): true |
nice tutorials
ReplyDelete