I recently completed my Ph.D. in Electrical and Computer Engineering at New York University, advised by Prof. Brandon Reagen. My thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, studies how nonlinear transformations, architectural choices, and optimization dynamics shape representation geometry, spectral scaling behavior, and efficient private inference.
My research studies representation learning and high-dimensional learning dynamics in language models. I am interested in internal structure that is not visible from aggregate metrics alone: how latent geometry, spectra, entropy, and data movement determine what a model can represent and how efficiently it can be executed.
This agenda has led to three connected lines of work. NerVE and Spectral Scaling Laws quantify nonlinear feed-forward transformations and realized capacity in LLMs; recent work on optimizer-induced spectral scaling laws studies how optimizers change capacity allocation across token regimes. AERO studies entropy dynamics in attention and uses entropy-guided regularization to make private LLM inference more stable and efficient. Earlier, through the DARPA DPRIVE program, DeepReDuce and DeepReShape redesigned neural networks for efficient encrypted inference.
Selected papers, talks, and media coverage are listed below.
Ph.D. in Electrical and Computer Engineering, 2020 - 2026
New York University
M.Tech. (Research) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat
I study representation learning, scaling laws, and high-dimensional learning dynamics in foundation models. My work focuses on how optimization, architecture, nonlinearities, and systems constraints shape representation geometry, entropy and eigenspectrum dynamics, and realized capacity. Across these settings, I am interested in the internal structures that aggregate metrics like validation loss, latency, and compute leave invisible, yet which strongly shape model behavior, efficiency, and adaptability.
I develop frameworks for understanding how language models transform and allocate representational capacity across layers, token regimes, optimizers, and model scale. While classical scaling laws relate loss to compute, my work studies how internal capacity itself scales through nonlinear eigenspectrum dynamics, spectral scaling laws, and optimizer-induced spectral capacity. A key finding is that the optimizer, not only the architecture, determines how much nominal capacity a model actually realizes: models with nearly identical validation loss can still differ sharply in their internal representation geometry. This points toward architecture–optimizer co-design, where architectural choices define the available representational degrees of freedom, while optimizer geometry shapes which directions become active, variance-carrying, and ultimately realized during training.
I design neural architectures, training methods, and regularization techniques for efficient private inference — inference performed directly on encrypted data. This work studies the latency and communication overhead incurred by nonlinear operations, and develops inference-efficient substitutes for these components without sacrificing model quality. In language models, I show that removing nonlinearities can destabilize attention dynamics, producing entropy collapse in deeper layers and entropic overload in earlier layers. I address these failures through static normalization alternatives and hierarchical entropy regularization, which restore attention stability and preserve head diversity under private-inference constraints. Taken together, this work uses private inference as a lens to understand how nonlinearities regulate information flow and stabilize attention dynamics.
My earlier work studied hardware-aware co-design for DNNs through roofline performance modeling and data-reuse-aware compact architectures. I showed that conventional arithmetic-intensity metrics can obscure the data-movement structure that determines inference efficiency, since weights and activations have different reuse patterns across architectures and accelerator memory hierarchies. To capture these effects more faithfully, I proposed data-reuse-aware arithmetic intensity. This systems background shapes how I think about compute bottlenecks, data movement, and the interaction between algorithms, model structure, and hardware.
I am interested in pursuing a broader research program around representation integrity in foundation models: understanding when learned representations preserve usable degrees of freedom, stable information flow, and adaptability as models scale, train, and are deployed under real constraints. The analogy draws on signal integrity in chip and hardware design verification: a signal being transmitted is not enough; it must stay reliable under noise, timing, routing, and physical constraints.
Similarly, in foundation models, parameters, compute, and low loss do not guarantee usable internal structure. In my work, this appears in failure modes such as entropy collapse, which is visible through attention-entropy dynamics rather than loss curves, and in matched-loss models that nevertheless differ sharply in realized spectral capacity. My goal is to develop measurements and training methods that diagnose and improve this integrity across optimization, architecture, data regimes, privacy constraints, and continual adaptation.
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Nandan Kumar Jha, Brandon Reagen
Under review, 2026
arXiv · Project · Code · Blog
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Nandan Kumar Jha, Brandon Reagen
ICLR 2026
arXiv · Project · Code
Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?
Nandan Kumar Jha, Brandon Reagen
EMNLP 2025, Main Conference
arXiv · Related code
A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
Nandan Kumar Jha, Brandon Reagen
HiLD Workshop at ICML 2025
arXiv · News
AERO: Entropy-Guided Attention for Private LLM Inference
Nandan Kumar Jha, Brandon Reagen
Under review, 2026; earlier version at AAAI PPAI 2025
Earlier arXiv · Code · Video · Press release
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Nandan Kumar Jha, Brandon Reagen
TMLR 2024
arXiv · Slides
DeepReDuce: ReLU Reduction for Fast Private Inference
Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
ICML 2021, Spotlight
arXiv · Slides · ICML video · Press release
Circa: Stochastic ReLUs for Private Deep Learning
Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
NeurIPS 2021
arXiv · Poster
Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen
ASPLOS 2023
arXiv · Code
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
Rajat Saini*, Nandan Kumar Jha*, Bedanta Das, Sparsh Mittal, C. Krishna Mohan (*equal contribution)
WACV 2020
Paper · Code · Video
Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance
Nandan Kumar Jha, Sparsh Mittal
IEEE Transactions on Computers 2020
arXiv
DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator
Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik, Dipan Mandal, Mahesh Chandra
ISVLSI 2020
arXiv · Slides
For the complete publication list, see Google Scholar.
Random Matrix Analysis Reveals Capacity Bottlenecks in Transformer Multi-Head Attention
Quantum Zeitgeist · July 2025
Cracking the code of private AI: The role of entropy in secure language models
NYU Tandon School of Engineering · March 2025
Team streamlines neural networks to be more adept at computing on encrypted data
NYU Tandon · TechXplore · ScienceDaily · 2021
Making Private AI Practical: A Review of “Entropy-Guided Attention for Private LLM”
by Roma Shusterman, CTO at Brain Electrophysiological Laboratory (BEL) · March 2025
NYU Tandon graduate students bring a wealth of experience to Brooklyn
NYU Tandon School of Engineering · March 2025
Conferences — NeurIPS (2023–2026), ICLR (2024–2026), ICML (2024–2026), CVPR 2024, ICCV 2025, AISTATS 2025, AAAI 2025
Journals — TMLR (2025–2026), TIFS 2025, JETC 2020