Singular values of matrices show up in many different places, and studying them has significant importance in math. As per usual, it has a fairly simple definition but requires quite diffficult tools to analyze comprehensively. In this article I aim to demonstrate how would norm of a Gaussian random matrix behave and answer questions such as “How does it’s mean behave?” or “Is it concentrated around it’s mean?”. A crucial assumption employed on the entries of is independence. Indeed, the results can be extended easily to subgaussians as well, however stating theorems with gaussians evaporates the annoying constants! Let’s set up some notations, shall we?
Definition. For any matrix we define spectral norm (i.e. largest singular value) as:
It is worth mentioning, the reason that I exchanged maximum instead of suprema is merely because suprema of a continuous function will be achieved on a compact set. This norm is a measure of how much a vector can be dialated through the linear transform in the worst case.
Definition. An unbiased random variable is called -subgaussian if one of the following holds. Moreover, all the statements are equivalent with different constants:
- (Moment Generating Bound) for all .
- (Tail Bound) for all .
- (Moment Bound) for all .
- (Another MGF Bound)
In a nutshell, this property is roughly saying that the random variable behaves like a normal and it is desirable since it’s basically equivalent to concentration around mean (or median). For fixed vectors one can easily check that is also sub-gaussian with the same parameter of entries of , thus concentrated around it’s mean. However, it’s not clear what can be said about suprema of this collection of random variables which is the quantity of interest. Note that these random variables are locally correlated, i.e. for a pair close to we expect to be close to . More precisely, one can write:
Surprisingly, the inverse is also true meaning if one take two pairs far apart from each other the corrseponding random variables will also be uncorrelated, thus behave as though they were independent since uncorrelatedness is equivalent to independence in gaussian fields. This sort of argument motivated proofs based on -net methods which I do not intend to go through here, however the intuition is very insightful. In order to make this statement more precise one may use the following comparison inequality for gaussian random variables.
Theorem (Sudakov-Fernique Inequality). Suppose are two independent centered Gaussian vectors such that for . Then:
As a special case if we assume that has the same variance coordinatewise, then the condition in the theorem verbally translates into is more correlated than , thus should possess a lower maximum which aligns with intuition. In addition, it is easy to extend this to infinite case using a simple truncation and then applying monotone convergence theorem provided that the index set is seperable. I am not going to provide a proof for this theorem here since the ideas overlap with the theorems mentioned below but this theorem has an direct application to upper bounding spectral norm of a matrix. Take as defined above and assume entries of matrix has standard normal distribution:
Now, one can define the following process for independent random variables and as:
Therefore, by Fernique’s inequality we obtain:
This is a sharp upper bound when since we can lower bound spectral norm by the norm of its first column:
Now that we established a sharp behaviour of largest singular value in expectation we move onto the next question on the list which is how well this quantity is conentrated around its mean?
A Hammer (Lipschitz Concentration)
Before we continue, we need to introduce a powerful tool which is widely used in the literature of measure concentration. Suppose is -dimensional Gaussian measure where expectation is deonted by for . Then Poincaré’s inequality suggest that functions with small local variations should expect small variances. In other words, some sort of gradient of the function as an indication of local fluctuation can be translated into a bound on global fluctuation.
This mainly suggest that one should expect dimension-independent concentration bounds for Lipschitz functions.
Theorem. Suppose is a -Lipschitz function. Then, we have the following concentration inequality:
There are in fact many ways to approach this problem namely Interpolation Method due to Pisier-Mauray, Smart Path due to Talagrand, Isoperimetry due to Borell and Sudakov, Transporation Method due to Marton, etc. However, I will provide a neat proof usng Stochastic Calculus which is mathematically beautiful but may seem like a magical juggling performed by the very Brownian Motion! Honestly, I get super excited whenever Brownian Motion comes up and it almost never disappoints.
Proof. (Requires Knowledge in SDE)
Proof. The idea is to think of our gaussian vector as a point from a -dimensional brownian path at time 1 and define the martingale which interpolates continuously between and . Since, Brownian Motion is also a Markov Process we can rewrite as where is a smooth function and in fact inherits smoothness properties of . Assuming it’s valid to exchange integration and differentiation, one can write:
where the last inequality is due to Jensen’s inequality. Thus, we can now use Fundamental Theorem of Calculus and write the difference as an integral of and this is the part Ito’s Formula kicks in:
Where is the quadratic variation of two martingales such that is again martingale. Now quadratic variation between coordinate and is zero due to independence for distnct . However, by the uniqueness of quadratic variation for we have:
Thus, and we obtain:
Since is martingale the second term (which has finite variation) must be zero which implies that satisfies heat equation and again we can rewrite:
Now by linearity of quadratic variations:
This might not seem significant however a lot of information is burried deep within it. Again using Ito’s formula one can prove that the following process is a martingale:
Therefore:
Now by rearranging terms:
Which proves is -subgaussian.
Subsequently, This theorem allows us to derive dimension-independent concentration for spectral norm based on the fact that it is a 1-Lipschitz function. Note that this inequality is in fact sharp for the special case of linear functions in the sense that RHS is the asymptotic tail of , though this sharpness might drop for some other class of Lipschitz functions. An immediate implication of this theorem is that the variance of is bounded above by but what happens as we change the dimensions? One way for testing sharpness is via finding true order of the variance and see if it’s bounded below by a constant or shrinks toward zero as increases. In most of the cases, actually variance vanishes to zero! This phenomena is usually referred to as Superconcentration which provides novel methods to compute true order of the variance with respect to . All this, arise the question of how does the variance of spectral norm behaves? In the following I intend to specify some results surrounding this question without providing a proof. As an illuminating example, we may take a look at maximum of a gaussian vector which is a special case of the problem at hand.
Finite Maximum Over a Gaussian Vector
Consider to be a Gaussian vector with covariance matrix . It is a well established result that the mean of behaves roughly as and the variance is bounded by . In order to obtain the former one can simply use Jensen’s inequality and optimize over afterwards:
However, the variance bound requires extra work and can be proved via Poincaré’s Inequality mentioned earlier. Note that where is almost surely differentiable. Therefore, if we define then one can prove is -Lipschitz via the chain rule and obtain the bound:
Even though the concentration theorem suggest that is -subgaussian however it’s sharpness is merely contingent upon Poincare’s inequality being unbeatable. In fact Talagrand proved an extention of Poincare’s inequality which provides a sufficient condition on how to beat it. In particular, the case of maximum over independent gaussians satisfy this condition thus concludes the following bound:
Below is a simple simulation study with 1000 replications for each (between 10 to 100) which verifies the theoretical results.

Variance Bound for Spectral Norm
Thus far we proved is 1-subgaussian, moreover its variance is bounded by one in the case of standard normal entries due to the fact that it’s 1-Lipschitz. However the following simulation indicates that spectral norm enjoys superconcentration and its variance behaves roughly as , thus Lipschitz concentration once again has failed to capture the underlying truth! In this simulation and the variance was estimated over 1000 replications. Note that the slope of the line should be close to .
It demands much more theory to prove this variance bound which I personally believe is one of the cases that the amount of effort that should be invested into proving it does not pay off proportionally!