As you can see, the keys are hashed into the hash table using the hash function h(key) = key % table_size. The keys are then stored in linked lists in the hash table. The linked lists are used to store the keys that have the same hash value.
For example, the key 9 has a hash value of 1, so it is stored in the first bucket of the hash table. The keys 7 and 13 also have a hash value of 1, so they are stored in the same bucket.
The keys 5, 8, and 15 have a hash value of 4, so they are stored in the same bucket.
Difference between processing using hash function and hashtable (using separate chaining)
Processing using hash function: In this approach, the keys are hashed into the hash table using the hash function. The keys are then stored in the hash table at the index of their hash value. This approach is simple and efficient, but it can lead to collisions, which occur when two or more keys have the same hash value.
Hashtable (using separate chaining): In this approach, the keys are hashed into the hash table using the hash function. The keys are then stored in linked lists in the hash table. The linked lists are used to store the keys that have the same hash value. This approach does not suffer from collisions, but it is more complex and less efficient than the processing using hash function approach.
differences between the two approaches:
Feature
Processing using hash function
Hashtable (using separate chaining)
Collisions
Can occur
Does not occur
Complexity
Simple
Complex
Efficiency
Efficient
Less efficient
Hashing is a technique for storing and retrieving data in a way that is efficient and fast.
Hash tables are data structures that use hashing to store data.
Hash functions are functions that take a key as input and return an index into the hash table.
Collisions occur when two or more keys have the same hash value.
There are two main ways to deal with collisions:
Separate chaining: This approach stores all keys with the same hash value in a linked list.
Linear probing: This approach searches for the next available slot in the hash table when a collision occurs.
Here are some of the advantages of using hashing data structures:
Efficiency: Hashing can be very efficient for storing and retrieving data.
Speed: Hashing can be very fast for storing and retrieving data.
Space: Hashing can be very space-efficient for storing data.
Here are some of the disadvantages of using hashing data structures:
Collisions: Collisions can occur, which can slow down the performance of the hash table.
Heterogeneous data: Hashing is not as efficient for storing heterogeneous data as it is for storing homogeneous data.
Hash functions: Hash functions can be difficult to design, and they can be sensitive to the distribution of the data.
Overall, hashing data structures are a powerful tool for storing and retrieving data. They are efficient, fast, and space-efficient. However, they can be sensitive to the distribution of the data and collisions can occur.
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequency of occurrence.
The most frequent characters are assigned the shortest codes, and the least frequent characters are assigned the longest codes.
Huffman coding works by creating a binary tree of the characters, where the leaves of the tree represent the characters and the internal nodes represent the frequencies of the characters.
The codes for the characters are then generated by traversing the tree from the root to the leaf node for the corresponding character.
Huffman coding is a very efficient data compression algorithm, and it is often used to compress text files.
Here is an example of how Huffman coding works:
Let's say we have the following string of characters: "abcabc".
The frequencies of the characters in the string are: "a" = 2, "b" = 2, and "c" = 2.
The binary tree for the characters would be as follows:
0
/ \
1 1
/ \ / \
0 0 1
The codes for the characters would be: "a" = "0", "b" = "10", and "c" = "11".
===============
How Huffman Coding works?
Suppose the string below is to be sent over a network.
Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of 8 * 15 = 120 bits are required to send this string.
Using the Huffman Coding technique, we can compress the string to a smaller size.
Huffman coding first creates a tree using the frequencies of the character and then generates code for each character.
Once the data is encoded, it has to be decoded. Decoding is done using the same tree.
Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix code ie. a code associated with a character should not be present in the prefix of any other code. The tree created above helps in maintaining the property.
Huffman coding is done with the help of the following steps.
Calculate the frequency of each character in the string.
Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.
Make each unique character as a leaf node.
Create an empty node z. Assign the minimum frequency to the left child of z and assign the second minimum frequency to the right child of z. Set the value of the z as the sum of the above two minimum frequencies.
Remove these two minimum frequencies from Q and add the sum into the list of frequencies (* denote the internal nodes in the figure above).
Insert node z into the tree.
Repeat steps 3 to 5 for all the characters.
For each non-leaf node, assign 0 to the left edge and 1 to the right edge.
For sending the above string over a network, we have to send the tree as well as the above compressed-code. The total size is given by the table below.
Character
Frequency
Code
Size
A
5
11
5*2 = 10
B
1
100
1*3 = 3
C
6
0
6*1 = 6
D
3
101
3*3 = 9
4 * 8 = 32 bits
15 bits
28 bits
Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32 + 15 + 28 = 75.
Decoding the code
For decoding the code, we can take the code and traverse through the tree to find the character.
Let 101 is to be decoded, we can traverse from the root as in the figure below.
Java Example
// Huffman Coding in Java
import java.util.PriorityQueue;
import java.util.Comparator;
class HuffmanNode {
int item;
char c;
HuffmanNode left;
HuffmanNode right;
}
// For comparing the nodes
class ImplementComparator implements Comparator<HuffmanNode> {
public int compare(HuffmanNode x, HuffmanNode y) {
return x.item - y.item;
}
}
// IMplementing the huffman algorithm
public class Huffman {
public static void printCode(HuffmanNode root, String s) {
The weight of each edge is shown next to the edge. The goal is to find the shortest path from A to all other vertices.
Here are the steps on how to apply Dijkstra's algorithm on this graph:
Initialize the distance of all vertices to infinity except A, which is initialized to 0.
Vertex | Distance from A
------- | --------
A | 0
B | INF
C | INF
D | INF
E | INF
F | INF
Create a set of visited vertices and initialize it to be empty.
Visited vertices = {}
Add A to the visited vertices set.
Visited vertices = {A}
For each unvisited neighbor of A, calculate the tentative distance to that neighbor as the distance to A plus the weight of the edge between A and that neighbor.
Vertex | Distance from A
------- | --------
B | 2 (0 + 2)
C | 3 (0 + 3)
If the tentative distance is less than the current distance to that neighbor, update the current distance to that neighbor.
Vertex | Distance from A
------- | --------
B | 2
C | 3
Repeat steps 4 and 5 for all unvisited neighbors of A.
Vertex | Distance from A
------- | --------
B | 2
C | 3
D | 4 (2 + 2)
Remove A from the unvisited vertices set.
Visited vertices = {B, C}
Repeat steps 3 to 7 until all vertices have been visited.
Vertex | Distance from A
------- | --------
B | 2
C | 3
D | 4
E | 5 (4 + 1)
F | 6 (4 + 2)
The shortest path from A to each vertex is as follows: