StudentsEducators

Transformer Self-Attention Scaling

In Transformer-Architekturen spielt die Self-Attention eine zentrale Rolle, um die Beziehungen zwischen verschiedenen Eingabeworten zu erfassen. Um die Berechnung der Aufmerksamkeitswerte zu stabilisieren und zu verbessern, wird ein Scaling-Mechanismus verwendet. Dieser besteht darin, die Dot-Products der Query- und Key-Vektoren durch die Quadratwurzel der Dimension dkd_kdk​ der Key-Vektoren zu teilen, was mathematisch wie folgt dargestellt wird:

Scaled Attention=QKTdk\text{Scaled Attention} = \frac{QK^T}{\sqrt{d_k}}Scaled Attention=dk​​QKT​

Hierbei sind QQQ die Query-Vektoren und KKK die Key-Vektoren. Durch diese Skalierung wird sichergestellt, dass die Werte für die Softmax-Funktion nicht zu extrem werden, was zu einer besseren Differenzierung zwischen den Aufmerksamkeitsgewichten führt. Dies trägt dazu bei, das Problem der Gradientenexplosion zu vermeiden und ermöglicht eine stabilere und effektivere Trainingsdynamik im Modell. In der Praxis führt das Scaling zu einer besseren Leistung und schnelleren Konvergenz beim Training von Transformer-Modellen.

Other related terms

contact us

Let's get started

Start your personalized study experience with acemate today. Sign up for free and find summaries and mock exams for your university.

logoTurn your courses into an interactive learning experience.
Antong Yin

Antong Yin

Co-Founder & CEO

Jan Tiegges

Jan Tiegges

Co-Founder & CTO

Paul Herman

Paul Herman

Co-Founder & CPO

© 2025 acemate UG (haftungsbeschränkt)  |   Terms and Conditions  |   Privacy Policy  |   Imprint  |   Careers   |  
iconlogo
Log in

Hotelling’S Law

Hotelling's Law is a principle in economics that explains how competing firms tend to locate themselves in close proximity to each other in a given market. This phenomenon occurs because businesses aim to maximize their market share by positioning themselves where they can attract the largest number of customers. For example, if two ice cream vendors set up their stalls at opposite ends of a beach, they would each capture a portion of the customers. However, if one vendor moves closer to the other, they can capture more customers, leading the other vendor to follow suit. This results in both vendors clustering together at a central location, minimizing the distance customers must travel, which can be expressed mathematically as:

Distance=1n∑i=1ndi\text{Distance} = \frac{1}{n} \sum_{i=1}^{n} d_iDistance=n1​i=1∑n​di​

where did_idi​ represents the distance each customer travels to the vendors. In essence, Hotelling's Law illustrates the balance between competition and consumer convenience, highlighting how spatial competition can lead to a concentration of firms in certain areas.

Nyquist Sampling Theorem

The Nyquist Sampling Theorem, named after Harry Nyquist, is a fundamental principle in signal processing and communications that establishes the conditions under which a continuous signal can be accurately reconstructed from its samples. The theorem states that in order to avoid aliasing and to perfectly reconstruct a band-limited signal, it must be sampled at a rate that is at least twice the maximum frequency present in the signal. This minimum sampling rate is referred to as the Nyquist rate.

Mathematically, if a signal contains no frequencies higher than fmaxf_{\text{max}}fmax​, it should be sampled at a rate fsf_sfs​ such that:

fs≥2fmaxf_s \geq 2 f_{\text{max}}fs​≥2fmax​

If the sampling rate is below this threshold, higher frequency components can misrepresent themselves as lower frequencies, leading to distortion known as aliasing. Therefore, adhering to the Nyquist Sampling Theorem is crucial for accurate digital representation and transmission of analog signals.

Cation Exchange Resins

Cation exchange resins are polymers that are used to remove positively charged ions (cations) from solutions, primarily in water treatment and purification processes. These resins contain functional groups that can exchange cations, such as sodium, calcium, and magnesium, with those present in the solution. The cation exchange process occurs when cations in the solution replace the cations attached to the resin, effectively purifying the water. The efficiency of this exchange can be affected by factors such as temperature, pH, and the concentration of competing ions.

In practical applications, cation exchange resins are crucial in processes like water softening, where hard water ions (like Ca²⁺ and Mg²⁺) are exchanged for sodium ions (Na⁺), thus reducing scale formation in plumbing and appliances. Additionally, these resins are utilized in various industries, including pharmaceuticals and food processing, to ensure the quality and safety of products by removing unwanted cations.

Three-Phase Rectifier

A three-phase rectifier is an electrical device that converts three-phase alternating current (AC) into direct current (DC). This type of rectifier utilizes multiple diodes (typically six) to effectively manage the conversion process, allowing it to take advantage of the continuous power flow inherent in three-phase systems. The main benefits of a three-phase rectifier include improved efficiency, reduced ripple voltage, and enhanced output stability compared to single-phase rectifiers.

In a three-phase rectifier circuit, the output voltage can be calculated using the formula:

VDC=33πVLV_{DC} = \frac{3 \sqrt{3}}{\pi} V_{L}VDC​=π33​​VL​

where VLV_{L}VL​ is the line-to-line voltage of the AC supply. This characteristic makes three-phase rectifiers particularly suitable for industrial applications where high power and reliability are essential.

Hyperbolic Discounting

Hyperbolic Discounting is a behavioral economic theory that describes how people value rewards and outcomes over time. Unlike the traditional exponential discounting model, which assumes that the value of future rewards decreases steadily over time, hyperbolic discounting suggests that individuals tend to prefer smaller, more immediate rewards over larger, delayed ones in a non-linear fashion. This leads to a preference reversal, where people may choose a smaller reward now over a larger reward later, but might later regret this choice as the delayed reward becomes more appealing as the time to receive it decreases.

Mathematically, hyperbolic discounting can be represented by the formula:

V(t)=V01+k⋅tV(t) = \frac{V_0}{1 + k \cdot t}V(t)=1+k⋅tV0​​

where V(t)V(t)V(t) is the present value of a reward at time ttt, V0V_0V0​ is the reward's value, and kkk is a discount rate. This model helps to explain why individuals often struggle with self-control, leading to procrastination and impulsive decision-making.

Transcendence Of Pi And E

The transcendence of the numbers π\piπ and eee refers to their property of not being the root of any non-zero polynomial equation with rational coefficients. This means that they cannot be expressed as solutions to algebraic equations like axn+bxn−1+...+k=0ax^n + bx^{n-1} + ... + k = 0axn+bxn−1+...+k=0, where a,b,...,ka, b, ..., ka,b,...,k are rational numbers. Both π\piπ and eee are classified as transcendental numbers, which places them in a special category of real numbers that also includes other numbers like eπe^{\pi}eπ and ln⁡(2)\ln(2)ln(2). The transcendence of these numbers has profound implications in mathematics, particularly in fields like geometry, calculus, and number theory, as it implies that certain constructions, such as squaring the circle or duplicating the cube using just a compass and straightedge, are impossible. Thus, the transcendence of π\piπ and eee not only highlights their unique properties but also serves to deepen our understanding of the limitations of classical geometric constructions.