Tweak programming environment

Tweak programming environment

Tweak is a graphical user interface (GUI) layer written by Andreas Raab for the Squeak development environment, which in turn is an integrated development environment based on the Smalltalk-80 computer programming language. Tweak is an alternative to an earlier graphic user interface layer called Morphic. Development began in 2001. Applications that use the Tweak software include Sophie (version 1), a multimedia and e-book authoring system, and a family of virtual world systems: Open Cobalt, Teleplace, OpenQwaq, 3d ICC's Immersive Terf and the Croquet Project. == Influences == An experimental version of Etoys, a programming environment for children, used Tweak instead of Morphic. Etoys was a major influence on a similar Squeak-based programming environment known as Scratch.

Conduit (company)

Conduit Ltd. is an international software company. From its founding in 2005 to 2013, its most well-known product was the Conduit toolbar, which was widely-described as malware. In 2013, it spun off its toolbar business; today, its main product is a mobile development platform that allows users to create native and web mobile applications for smartphones. == Products == From 2005 to 2013, the company's most well-known product was the Conduit toolbar, which is flagged by most antivirus software as potentially unwanted and adware. Conduit's toolbar software is often downloaded by malware packages from other publishers. The company spun off the toolbar division that manages the Conduit toolbar in 2013. Today, the company's main product is a mobile development platform that allows users to create native and web mobile applications for smartphones. App creation for its App Gallery is free, but it charges a monthly subscription fee to place apps on the App Store or Google Play. == History == Conduit was founded in 2005 by Shilo, Dror Erez, and Gaby Bilcyzk. Between years 2005 and 2013, it ran a successful but controversial toolbar platform business. Conduit was part of the so-called Download Valley companies monetizing free software and downloads by bundling adware. The toolbars were criticized by some as being very difficult to uninstall. The toolbar software was referred to as a "potentially unwanted program" by some in the computer industry because it could be used to change browser settings. The company had more than 400 employees in 2013. In September same year, Conduit spun off its entire website toolbar business division, which combined with Perion Network. After the deal, Conduit shareholders owned 81% of Perion's existing shares and both Perion and Conduit remained independent companies. The substantial size of the Conduit user base allowed Perion to immediately surpass AOL in U.S. searches. In 2015, Conduit announced it would purchase Keeprz, a mobile customer loyalty platform, for $45 million.

Optimal discriminant analysis and classification tree analysis

Optimal Discriminant Analysis (ODA) and the related classification tree analysis (CTA) are exact statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confirmatory hypothesis, optimal discriminant analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability. Optimal discriminant analysis may be applied to > 0 dimensions, with the one-dimensional case being referred to as UniODA and the multidimensional case being referred to as MultiODA. Optimal discriminant analysis is an alternative to ANOVA (analysis of variance) and regression analysis.

Arabic Speech Corpus

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with recorded speech on the phoneme level. The annotations include word stress marks on the individual phonemes. The Arabic Speech Corpus was built as part of a doctoral project by Nawar Halabi at the University of Southampton funded by MicroLinkPC who own an exclusive license to commercialise the corpus, but the corpus is available for strictly non-commercial purposes through the official Arabic Speech Corpus website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. == Purpose == The corpus was mainly built for speech synthesis purposes, specifically Speech Synthesis, but the corpus has been used for building HMM based voices in Arabic. It was also used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The package contains the following: 1813 .wav files containing spoken utterances. 1813 .lab files containing text utterances. 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line. orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic. There is an extra 18 minutes of fully annotated corpus (separate from above but with the same structure as above) which was used to evaluated the corpus (see PhD thesis). The corpus was also used to prove that using automatically extracted, orthography-based stress marks improve the quality of speech synthesis in MSA.

Mathematics of neural networks in machine learning

An artificial neural network (ANN) or neural network combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways. == Structure == === Neuron === A neuron with label j {\displaystyle j} receiving an input p j ( t ) {\displaystyle p_{j}(t)} from predecessor neurons consists of the following components: an activation a j ( t ) {\displaystyle a_{j}(t)} , the neuron's state, depending on a discrete time parameter, an optional threshold θ j {\displaystyle \theta _{j}} , which stays fixed unless changed by learning, an activation function f {\displaystyle f} that computes the new activation at a given time t + 1 {\displaystyle t+1} from a j ( t ) {\displaystyle a_{j}(t)} , θ j {\displaystyle \theta _{j}} and the net input p j ( t ) {\displaystyle p_{j}(t)} giving rise to the relation a j ( t + 1 ) = f ( a j ( t ) , p j ( t ) , θ j ) , {\displaystyle a_{j}(t+1)=f(a_{j}(t),p_{j}(t),\theta _{j}),} and an output function f out {\displaystyle f_{\text{out}}} computing the output from the activation o j ( t ) = f out ( a j ( t ) ) . {\displaystyle o_{j}(t)=f_{\text{out}}(a_{j}(t)).} Often the output function is simply the identity function. An input neuron has no predecessor but serves as input interface for the whole network. Similarly an output neuron has no successor and thus serves as output interface of the whole network. === Propagation function === The propagation function computes the input p j ( t ) {\displaystyle p_{j}(t)} to the neuron j {\displaystyle j} from the outputs o i ( t ) {\displaystyle o_{i}(t)} and typically has the form p j ( t ) = ∑ i o i ( t ) w i j . {\displaystyle p_{j}(t)=\sum _{i}o_{i}(t)w_{ij}.} === Bias === A bias term can be added, changing the form to the following: p j ( t ) = ∑ i o i ( t ) w i j + w 0 j , {\displaystyle p_{j}(t)=\sum _{i}o_{i}(t)w_{ij}+w_{0j},} where w 0 j {\displaystyle w_{0j}} is a bias. == Neural networks as functions == Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision) f : X → Y {\displaystyle \textstyle f:X\rightarrow Y} or a distribution over X {\displaystyle \textstyle X} or both X {\displaystyle \textstyle X} and Y {\displaystyle \textstyle Y} . Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity). Mathematically, a neuron's network function f ( x ) {\displaystyle \textstyle f(x)} is defined as a composition of other functions g i ( x ) {\displaystyle \textstyle g_{i}(x)} , that can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. A widely used type of composition is the nonlinear weighted sum, where f ( x ) = K ( ∑ i w i g i ( x ) ) {\displaystyle \textstyle f(x)=K\left(\sum _{i}w_{i}g_{i}(x)\right)} , where K {\displaystyle \textstyle K} (commonly referred to as the activation function) is some predefined function, such as the hyperbolic tangent, sigmoid function, softmax function, or rectifier function. The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output. The following refers to a collection of functions g i {\displaystyle \textstyle g_{i}} as a vector g = ( g 1 , g 2 , … , g n ) {\displaystyle \textstyle g=(g_{1},g_{2},\ldots ,g_{n})} . This figure depicts such a decomposition of f {\displaystyle \textstyle f} , with dependencies between variables indicated by arrows. These can be interpreted in two ways. The first view is the functional view: the input x {\displaystyle \textstyle x} is transformed into a 3-dimensional vector h {\displaystyle \textstyle h} , which is then transformed into a 2-dimensional vector g {\displaystyle \textstyle g} , which is finally transformed into f {\displaystyle \textstyle f} . This view is most commonly encountered in the context of optimization. The second view is the probabilistic view: the random variable F = f ( G ) {\displaystyle \textstyle F=f(G)} depends upon the random variable G = g ( H ) {\displaystyle \textstyle G=g(H)} , which depends upon H = h ( X ) {\displaystyle \textstyle H=h(X)} , which depends upon the random variable X {\displaystyle \textstyle X} . This view is most commonly encountered in the context of graphical models. The two views are largely equivalent. In either case, for this particular architecture, the components of individual layers are independent of each other (e.g., the components of g {\displaystyle \textstyle g} are independent of each other given their input h {\displaystyle \textstyle h} ). This naturally enables a degree of parallelism in the implementation. Networks such as the previous one are commonly called feedforward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where f {\displaystyle \textstyle f} is shown as dependent upon itself. However, an implied temporal dependence is not shown. == Backpropagation == Backpropagation training algorithms fall into three categories: steepest descent (with variable learning rate and momentum, resilient backpropagation); quasi-Newton (Broyden–Fletcher–Goldfarb–Shanno, one step secant); Levenberg–Marquardt and conjugate gradient (Fletcher–Reeves update, Polak–Ribiére update, Powell–Beale restart, scaled conjugate gradient). === Algorithm === Let N {\displaystyle N} be a network with e {\displaystyle e} connections, m {\displaystyle m} inputs and n {\displaystyle n} outputs. Below, x 1 , x 2 , … {\displaystyle x_{1},x_{2},\dots } denote vectors in R m {\displaystyle \mathbb {R} ^{m}} , y 1 , y 2 , … {\displaystyle y_{1},y_{2},\dots } vectors in R n {\displaystyle \mathbb {R} ^{n}} , and w 0 , w 1 , w 2 , … {\displaystyle w_{0},w_{1},w_{2},\ldots } vectors in R e {\displaystyle \mathbb {R} ^{e}} . These are called inputs, outputs and weights, respectively. The network corresponds to a function y = f N ( w , x ) {\displaystyle y=f_{N}(w,x)} which, given a weight w {\displaystyle w} , maps an input x {\displaystyle x} to an output y {\displaystyle y} . In supervised learning, a sequence of training examples ( x 1 , y 1 ) , … , ( x p , y p ) {\displaystyle (x_{1},y_{1}),\dots ,(x_{p},y_{p})} produces a sequence of weights w 0 , w 1 , … , w p {\displaystyle w_{0},w_{1},\dots ,w_{p}} starting from some initial weight w 0 {\displaystyle w_{0}} , usually chosen at random. These weights are computed in turn: first compute w i {\displaystyle w_{i}} using only ( x i , y i , w i − 1 ) {\displaystyle (x_{i},y_{i},w_{i-1})} for i = 1 , … , p {\displaystyle i=1,\dots ,p} . The output of the algorithm is then w p {\displaystyle w_{p}} , giving a new function x ↦ f N ( w p , x ) {\displaystyle x\mapsto f_{N}(w_{p},x)} . The computation is the same in each step, hence only the case i = 1 {\displaystyle i=1} is described. w 1 {\displaystyle w_{1}} is calculated from ( x 1 , y 1 , w 0 ) {\displaystyle (x_{1},y_{1},w_{0})} by considering a variable weight w {\displaystyle w} and applying gradient descent to the function w ↦ E ( f N ( w , x 1 ) , y 1 ) {\displaystyle w\mapsto E(f_{N}(w,x_{1}),y_{1})} to find a local minimum, starting at w = w 0 {\displaystyle w=w_{0}} . This makes w 1 {\displaystyle w_{1}} the minimizing weight found by gradient descent. == Learning pseudocode == To implement the algorithm above, explicit formulas are required for the gradient of the function w ↦ E ( f N ( w , x ) , y ) {\displaystyle w\mapsto E(f_{N}(w,x),y)} where the function is E ( y , y ′ ) = | y − y ′ | 2 {\displaystyle E(y,y')=|y-y'|^{2}} . The learning algorithm can be divided into two phases: propagation and weight update. === Propagation === Propagation involves the following steps: Propagation forward through the network to generate the output value(s) Calculation of the cost (error term) Propagation of the output activations back through the network using the training pattern target to generate the deltas (the difference between the targeted and actual output values) of all output and hidden neurons. === Weight update === For each weight: Multiply the weight's output delta and input activation to find the gradient of the weight. Subtract the ratio (percentage) of the weight's gradient from the weight. The learning rate is the ratio (percentage) that influences the speed and quality of learning. The greater the ratio, the faster the neuron trains, but the lower the ratio, the more accurat

GoodRx

GoodRx Holdings, Inc. is an American healthcare company that operates a telemedicine platform and free-to-use website and mobile app that track prescription drug prices in the United States and provide drug coupons for discounts on medications. GoodRx compares prescription drug prices at more than 75,000 pharmacies in the United States. The platform allows users to consult a doctor online and obtain a prescription for certain types of medications. == History == === Financial performance === GoodRx was founded in Santa Monica, California in 2011. GoodRx experienced substantial growth in net income in 2017 ($9 million), 2018 ($44 million), and 2019 ($66 million), but recorded a loss of $293.6 million in 2020 due to IPO-related expenses. In September 2020, GoodRx went public on the Nasdaq under the ticker symbol GDRX. The company priced its initial public offering at $33 per share, above the expected range of $24 to $28, raising more than $1.1 billion at an initial valuation of approximately $12.7 billion. In the first half of 2020, the company reported revenues of $257 million and net income of $55 million. GoodRx generated $745.4 million in revenue for the full year 2021, a 35.36% increase over 2020. During the first half of 2021, the company’s share price declined by 10.7%. The decline was attributed to increased competition in online pharmacy services and slower user growth. GoodRx reported full-year revenue of $766.6 million, with adjusted EBITDA reaching $213.5 million, exceeding guidance in the fourth quarter. GoodRx reported that 41% of prescriptions filled using its coupons were newly adherent, meaning they would not have been filled without the service. GoodRx reported a full-year 2023 revenue of $750.3 million, a decrease of 2.1% from 2022. However, its fourth-quarter revenue increased by 7% year-over-year. GoodRx achieved an Adjusted EBITDA of $217.4 million for the year and an Adjusted EBITDA Margin of 28.6%. In 2024, GoodRx achieved 6% revenue growth with $792.3 million for the full year and turned a net loss into a positive net income of $16.4 million. The company also demonstrated strong operational efficiency, with a 32.8% increase in full-year Adjusted EBITDA. In Q2 2025, GoodRx reported revenue of $203.1 million, a 1.2% increase from the previous year, and a net income of $12.8 million, a significant 92% jump, which resulted in a 6.3% net income margin. However, prescription transaction revenue declined by 3% due to a decrease in monthly active consumers, but this was offset by strong 32% growth in its Pharma Manufacturer Solutions business. GoodRx also saw a 7% decrease in subscription revenue. === Mergers and acquisitions === In 2019, GoodRx acquired HeyDoctor, a telemedicine company, to integrate virtual healthcare services into the platform. In 2021, a health video content producer, HealthiNation was acquired by GoodRx, which helped provide consumers with health information and offered pharmaceutical manufacturers new ways to reach relevant audiences. In April 2022, GoodRx acquired VitaCare Prescription Services from TherapeuticsMD to strengthen its pharma manufacturer solutions business. === Partnerships === In 2017, the company announced partnerships with major pharmaceutical companies to negotiate lower prescription drug costs. GoodRx has deep relationships with major pharmacy chains, including Walgreens, Walmart, CVS Caremark, and Publix, to allow customers to use GoodRx discounts and Gold benefits. GoodRx began its partnership with CVS Caremark in July 2023 to automatically apply coupons to insured CVS customers purchasing generic prescriptions at certain locations. In April 2024, GoodRx added Publix into its network, allowing GoodRx Gold members to use their cards at Publix Pharmacies. GoodRx partners with Pharmacy Benefit Management like Caremark, Express Scripts, and MedImpact to apply their savings directly to eligible insurance plans and members. GoodRx partners with companies like Affirm, Benefitfocus, and DoorDash to integrate their services that offer members discounts and financial flexibility for prescriptions. GoodRx also partners with organizations like the American Academy of Family Physicians Foundation to support broader access to care. In October 2022, GoodRx launched Provider Mode, which allows healthcare providers to use the app to compare costs of drugs for patients based on different payment methods and drug alternatives. In 2025, GoodRx partnered with Novo Nordisk to offer discounted cash-pay access to semaglutide products like Ozempic and Wegovy through its platform and participating pharmacies. == Products and services == GoodRx started its telemedicine service GoodRx Care in September 2019. It lets people talk to a licensed provider online for common issues and get prescriptions even if they don't have insurance. They also run condition-specific subscription plans that bundle online doctor visits, FDA-approved meds, and home delivery into one monthly payment. On the weight management side, GoodRx offers prescriptions for GLP-1 drugs like semaglutide through their telemedicine platform. This got a boost when the oral version of Wegovy became widely available in the US in early 2026. GoodRx works with drug makers like Novo Nordisk to make some medications (including semaglutide options) more affordable for people paying cash. The telemedicine part took off after GoodRx bought HeyDoctor in 2019 and brought their virtual care tools into the main platform. == Key people == The Santa Monica-based startup was founded in September 2011 by Trevor Bezdek and former Facebook executives Doug Hirsch and Scott Marlette. Marlette was one of the first 20 employees at Facebook and built Facebook's photo application. In 2005, Hirsch was the Vice President of Product at Facebook, working closely with Mark Zuckerberg. Bezdek and Hirsch served as co-chief executive officers until April 2023, when they stepped down from those roles and technology executive Scott Wagner was appointed interim chief executive officer. Bezdek became chair of the board, while Hirsch took on the role of chief mission officer. In December 2024, GoodRx announced that healthcare executive Wendy Barnes would become president and chief executive officer effective January 1, 2025. As of 2025, Barnes serves as the company’s CEO, while Trevor Bezdek and Scott Wagner serve as co-chairs of the board, and Doug Hirsch remains involved as a co-founder and senior executive. == Controversy == On February 25, 2020, Consumer Reports published an article stating that GoodRx shared user data—specifically, pseudonymized advertising ID numbers that companies use to track the behavior of web users across websites, the names of the drugs that users browsed, and the pharmacies where users sought to fill prescriptions—with Google, Facebook, and around twenty other Internet-based companies. A few days later, GoodRx released a statement saying that it had made changes to prevent user search data on medical conditions and pharmaceuticals from being shared with Facebook. In March 2020, GoodRx stopped sending data about user prescriptions to Facebook. On February 1, 2023, the Federal Trade Commission fined GoodRx US$1.5 million for violations of the Breach Notification Rule and the Federal Trade Commission Act for allegedly failing to obtain specific, informed, and unambiguous consent from users before disclosing health-related information to Facebook and Google. In November 2024, independent pharmacies filed at least three class action lawsuits against GoodRx and major pharmacy benefit managers. The cases, brought by independent pharmacies in California, Michigan, Pennsylvania, and Rhode Island, allege that GoodRx and the PBMs collaborated to suppress reimbursements for generic prescription drugs. They allege that agreements using GoodRx’s software suppressed reimbursements for generic drugs and violated the Sherman Antitrust Act. The suits claim the practices amount to price fixing which harms small pharmacies while benefiting PBMs and their affiliates. GoodRx settled both the 2023 FTC action and the 2025 class action lawsuit without admitting wrongdoing.

Triplet loss

Triplet loss is a machine learning loss function widely used in one-shot learning, a setting where models are trained to generalize effectively from limited examples. It was conceived by Google researchers for their prominent FaceNet algorithm for face detection. Triplet loss is designed to support metric learning. Namely, to assist training models to learn an embedding (mapping to a feature space) where similar data points are closer together and dissimilar ones are farther apart, enabling robust discrimination across varied conditions. In the context of face detection, data points correspond to images. == Definition == The loss function is defined using triplets of training points of the form ( A , P , N ) {\displaystyle (A,P,N)} . In each triplet, A {\displaystyle A} (called an "anchor point") denotes a reference point of a particular identity, P {\displaystyle P} (called a "positive point") denotes another point of the same identity in point A {\displaystyle A} , and N {\displaystyle N} (called a "negative point") denotes a point of an identity different from the identity in point A {\displaystyle A} and P {\displaystyle P} . Let x {\displaystyle x} be some point and let f ( x ) {\displaystyle f(x)} be the embedding of x {\displaystyle x} in the finite-dimensional Euclidean space. It shall be assumed that the L2-norm of f ( x ) {\displaystyle f(x)} is unity (the L2 norm of a vector X {\displaystyle X} in a finite dimensional Euclidean space is denoted by ‖ X ‖ {\displaystyle \Vert X\Vert } .) We assemble m {\displaystyle m} triplets of points from the training dataset. The goal of training here is to ensure that, after learning, the following condition (called the "triplet constraint") is satisfied by all triplets ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} in the training data set: ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 + α < ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 {\displaystyle \Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}+\alpha <\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}} The variable α {\displaystyle \alpha } is a hyperparameter called the margin, and its value must be set manually. In the FaceNet system, its value was set as 0.2. Thus, the full form of the function to be minimized is the following: L = ∑ i = 1 m max ( ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 − ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 + α , 0 ) {\displaystyle L=\sum _{i=1}^{m}\max {\Big (}\Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}-\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}+\alpha ,0{\Big )}} == Intuition == A baseline for understanding the effectiveness of triplet loss is the contrastive loss, which operates on pairs of samples (rather than triplets). Training with the contrastive loss pulls embeddings of similar pairs closer together, and pushes dissimilar pairs apart. Its pairwise approach is greedy, as it considers each pair in isolation. Triplet loss innovates by considering relative distances. Its goal is that the embedding of an anchor (query) point be closer to positive points than to negative points (also accounting for the margin). It does not try to further optimize the distances once this requirement is met. This is approximated by simultaneously considering two pairs (anchor-positive and anchor-negative), rather than each pair in isolation. == Triplet "mining" == One crucial implementation detail when training with triplet loss is triplet "mining", which focuses on the smart selection of triplets for optimization. This process adds an additional layer of complexity compared to contrastive loss. A naive approach to preparing training data for the triplet loss involves randomly selecting triplets from the dataset. In general, the set of valid triplets of the form ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} is very large. To speed-up training convergence, it is essential to focus on challenging triplets. In the FaceNet paper, several options were explored, eventually arriving at the following. For each anchor-positive pair, the algorithm considers only semi-hard negatives. These are negatives that violate the triplet requirement (i.e, are "hard"), but lie farther from the anchor than the positive (not too hard). Restated, for each A ( i ) {\displaystyle A^{(i)}} and P ( i ) {\displaystyle P^{(i)}} , they seek N ( i ) {\displaystyle N^{(i)}} such that: The rationale for this design choice is heuristic. It may appear puzzling that the mining process neglects "very hard" negatives (i.e., closer to the anchor than the positive). Experiments conducted by the FaceNet designers found that this often leads to a convergence to degenerate local minima. Triplet mining is performed at each training step, from within the sample points contained in the training batch (this is known as online mining), after embeddings were computed for all points in the batch. While ideally the entire dataset could be used, this is impractical in general. To support a large search space for triplets, the FaceNet authors used very large batches (1800 samples). Batches are constructed by selecting a large number of same-category sample points (40), and randomly selected negatives for them. == Extensions == Triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous relevance degree with a chain (i.e., ladder) of distance inequalities. This leads to the Ladder Loss, which has been demonstrated to offer performance enhancements of visual-semantic embedding in learning to rank tasks. In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture. Other extensions involve specifying multiple negatives (multiple negatives ranking loss).