Rectified Linear Unit (ReLU)

ReLU stands for rectified linear unit, and is a type of activation function. Mathematically, it is defined as y = max(0, x). Visually, it looks like the following:

ReLU is the most commonly used activation function in neural networks, especially in CNNs. If you are unsure what activation function to use in your network, ReLU is usually a good first choice.

How does ReLU compare

ReLU is linear (identity) for all positive values, and zero for all negative values. This means that:

It’s cheap to compute as there is no complicated math. The model can therefore take less time to train or run.
It converges faster. Linearity means that the slope doesn’t plateau, or “saturate,” when x gets large. It doesn’t have the vanishing gradient problem suffered by other activation functions like sigmoid or tanh.
It’s sparsely activated. Since ReLU is zero for all negative inputs, it’s likely for any given unit to not activate at all. This is often desirable (see below).

Sparsity

Note: We are discussing model sparsity here. Data sparsity (missing information) is different and usually bad.

Why is sparsity good? It makes intuitive sense if we think about the biological neural network, which artificial ones try to imitate. While we have billions of neurons in our bodies, not all of them fire all the time for everything we do. Instead, they have different roles and are activated by different signals.

Sparsity results in concise models that often have better predictive power and less overfitting/noise. In a sparse network, it’s more likely that neurons are actually processing meaningful aspects of the problem. For example, in a model detecting cats in images, there may be a neuron that can identify ears, which obviously shouldn’t be activated if the image is about a building.

Finally, a sparse network is faster than a dense network, as there are fewer things to compute.

In a neural network, the activation function is responsible for transforming the summed weighted input from the node into the activation of the node or output for that input.

The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.

تعد دالة الوحدة الخطية المصححة (Relu Activation Function (ReLU إحدى دوال التنشيط الشائعة المستخدمة في الشبكات العصبية الاصطناعية (ANNs). تعتبر دالة ReLU دالة غير خطية تُرجع قيمة الإدخال إذا كانت موجبة ، و تحول قيمته إلى 0 إذا كانت سالبة. رياضيا ، يتم تمثيل دالة ReLU على النحو التالي:

ReLU(x) = max(0, x)حيث x هي قيمة الإدخال.
دالة ReLU هي دالة متعددة التعريف خطية و بسيطة الحساب ، مما يجعلها دالة تنشيط فعالة حسابيًا. و من المعروف أيضًا أنها تعمل بشكل جيد في الشبكات العصبية العميقة وقد اثبتت جدارة عالية في مجموعة واسعة من التطبيقات ، بما في ذلك تطبيقات التعرف على الصور و الكلام و معالجة اللغات الطبيعية وغيرها.
تتمثل إحدى المزايا الرئيسية لدالة ReLU في أنها تساعد في التخفيف من مشكلة تلاشي التدرج الإشتقاقي الذي يمكن أن يحدث مع دوال التنشيط الأخرى مثل الدالة السيني و دالة الظل الزائدية. تحدث مشكلة تلاشي التدرج الإشتقاقي عندما يقترب مشتق دالة التنشيط من الصفر ، مما يجعل من الصعب إعادة نشر التدرجات الإشتقاقية عبر العديد من طبقات الشبكة العصبية العميقة أثناء التدريب. لا تعاني دالة ReLU من هذه المشكلة لأن مشتقها إما 0 أو 1 ، مما يبسط حسابات التدرج الإشتقاقي.

Search This Blog

Shehab Artificial Intelligence (AI)