Rectified Linear Unit (ReLU)
Rectified Linear Unit (ReLU) ReLU stands for rectified linear unit, and is a type of activation function. Mathematically, it is defined as y = max(0, x) . Visually, it looks like the following: ReLU is the most commonly used activation function in neural networks, especially in CNNs. If you are unsure what activation function to use in your network, ReLU is usually a good first choice. How does ReLU compare ReLU is linear (identity) for all positive values, and zero for all negative values. This means that: It’s cheap to compute as there is no complicated math. The model can therefore take less time to train or run. It converges faster. Linearity means that the slope doesn’t plateau, or “saturate,” when x gets large. It doesn’t have the vanishing gradient problem suffered by other activation functions like sigmoid or tanh. It’s sparsely activated. Since ReLU is zero for all negative inputs, it’s likely for any given unit to not activate at all. This is often des...