In the below video, following were my learning.
I learned entropy in information theory to check and encode bits to improve entropy and in decision tree formation based on test data (supervised learning).
Never combined entropy with probability distribution. To some extend I could remember Huffman encoding relation with probability and never with change in distribution.
Mostly I have considered probability distribution to be static, but in reality it is definitely it is dynamic except for rare cases.
Cross Entropy is an entropy with respect to actual vs predicted distribution and KL divergence is the change in entropy with predicted distribution from actual distribution.
Entropy = Cross Entropy + KL Divergence
https://www.youtube.com/watch?v=ErfnhcEV1O8