Nandan Kumar Jha

Ph.D., Electrical and Computer Engineering

New York University

Available for Research Scientist roles

Foundation-model pre-training, representation learning, scaling laws

About me

I recently completed my Ph.D. in Electrical and Computer Engineering at New York University, advised by Prof. Brandon Reagen. My thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, examines how architecture and learning dynamics jointly shape representation geometry, and how to preserve it under privacy constraints.

My research broadly focuses on how foundation models create, realize, and preserve representational capacity as they scale. I investigate how architectures and nonlinearities shape latent capacity, how optimizers realize it across token regimes, and how to preserve it under deployment constraints. In LLMs, NerVE (ICLR 2026) and Spectral Scaling Laws (EMNLP 2025) characterize how nonlinear transformations within FFNs govern realized capacity, while recent work on optimizer-induced spectral scaling laws reveals how optimizer choice reallocates that capacity across token regimes. On the efficiency side, AERO brings attention-entropy dynamics to private LLM inference, extending earlier work on ReLU-efficient architecture design for private inference, including DeepReDuce (ICML 2021, Spotlight) and DeepReShape (TMLR 2024).

Before my Ph.D., I spent two years at Seagate working on hardware design and signal-integrity verification for solid-state drives and NAND/DRAM memory. That engineering background now informs how I study representation integrity in foundation models.

Education

Ph.D. in Electrical and Computer Engineering, 2020 - 2026
New York University
M.Tech. (Research) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat

Research Themes

My research investigates how foundation models learn, scale, and adapt beyond what aggregate metrics alone can reveal. I analyze model behavior and dynamics through representation geometry, focusing on how architecture, learning methods, nonlinear transformations, and systems constraints jointly determine what capacity becomes usable in practice. This perspective connects scaling, efficiency, and adaptability by treating learned representations as the medium in which capacity is created, realized, and preserved.

Representation Learning and Scaling Laws for LLMs

I develop frameworks for understanding how language models build representational capacity across layers, token regimes, optimizers, and model scale. While classical scaling laws relate loss to compute, my work characterizes how internal capacity itself scales through nonlinear eigenspectrum dynamics, spectral scaling laws, and optimizer-induced capacity allocation. A key finding is that optimizer choice, not only architecture, determines how much nominal capacity a model actually realizes: models with nearly identical validation loss can still differ sharply in their internal representation geometry. This points toward architecture-optimizer co-design, where architectural choices define the available representational degrees of freedom, while optimizer geometry determines which ones become active and carry variance during training.

See a detailed discussion in my blog, Architecture Creates Capacity, Optimizers Realize It.

Cryptographically Secure and Efficient Private Inference

I design neural architectures, training methods, and regularization techniques for efficient private inference, where models perform inference directly on encrypted data. This work targets the latency and communication overhead introduced by nonlinear operations, and develops inference-efficient substitutes for these components without sacrificing model quality. In language models, removing nonlinearities can distort attention dynamics: deeper layers can undergo entropy collapse, destabilizing training, while earlier layers can exhibit entropic overload, leaving attention heads under-utilized. I address these failures through normalization alternatives and hierarchical entropy regularization, which restore attention stability and preserve head diversity under private-inference constraints. This line of work uses encrypted inference as a lens to understand how nonlinearities regulate information flow and stabilize attention dynamics.

Hardware-Aware and Efficient ML

My earlier work explored hardware-aware co-design for DNNs through roofline performance modeling and data-reuse-aware compact architectures. I found that conventional arithmetic-intensity metrics can obscure the data-movement structure that determines inference efficiency, since weights and activations have different reuse patterns across architectures and accelerator memory hierarchies. To capture these effects more faithfully, I proposed data-reuse-aware arithmetic intensity. This systems background guides how I think about compute bottlenecks, data movement, and the interaction between algorithms, model structure, and hardware.

Current Direction: Representation Integrity for Foundation Models

My current research direction is a broader program around representation integrity in foundation models: understanding when learned representations preserve usable degrees of freedom, stable information flow, and adaptability as models scale, train, and operate under real deployment constraints. The analogy draws on signal integrity in chip and hardware design verification: transmitting a signal is not enough; it must remain reliable under noise, timing, routing, and physical constraints.

Similarly, in foundation models, parameters, compute, and low loss do not guarantee usable internal structure. In my work, this appears in failure modes such as entropic overload, which becomes visible through attention-entropy dynamics rather than loss curves, and in matched-loss models that nevertheless differ sharply in realized spectral capacity. My goal is to develop measurements and training methods that diagnose and improve this integrity across optimization, architecture, token regimes, privacy constraints, and continual adaptation.

Selected Papers

Representation Learning and Scaling Laws for LLMs

Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Nandan Kumar Jha, Brandon Reagen
Under review, 2026
arXiv · Project · Code · Blog

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Nandan Kumar Jha, Brandon Reagen
ICLR 2026
arXiv · Project · Code

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?
Nandan Kumar Jha, Brandon Reagen
EMNLP 2025, Main Conference
arXiv · Related code

A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
Nandan Kumar Jha, Brandon Reagen
HiLD Workshop at ICML 2025
arXiv · News

Cryptographically Secure and Efficient Private Inference

AERO: Entropy-Guided Attention for Private LLM Inference
Nandan Kumar Jha, Brandon Reagen
Under review, 2026; earlier version at AAAI PPAI 2025
Earlier arXiv · Code · Video · Press release

DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Nandan Kumar Jha, Brandon Reagen
TMLR 2024
arXiv · Slides

DeepReDuce: ReLU Reduction for Fast Private Inference
Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
ICML 2021, Spotlight
arXiv · Slides · ICML video · Press release

Circa: Stochastic ReLUs for Private Deep Learning
Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
NeurIPS 2021
arXiv · Poster

Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen
ASPLOS 2023
arXiv · Code

Hardware-Aware and Efficient ML

ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
Rajat Saini*, Nandan Kumar Jha*, Bedanta Das, Sparsh Mittal, C. Krishna Mohan (*equal contribution)
WACV 2020
Paper · Code · Video

Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance
Nandan Kumar Jha, Sparsh Mittal
IEEE Transactions on Computers 2020
arXiv

DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator
Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik, Dipan Mandal, Mahesh Chandra
ISVLSI 2020
arXiv · Slides

For the complete publication list, see Google Scholar.

Recent Highlights

2026

Jun 2026

My Ph.D. thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, is now online. Thesis

Jun 2026

Released Optimizer-Induced Spectral Scaling Laws with a preprint, project page, code, and blog post. arXiv · Project · Code · Blog

Apr 2026

Successfully defended my Ph.D. thesis at New York University.

Apr 2026

Presented NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks at ICLR 2026 (Rio, Brazil). arXiv · Project

Feb 2026

Served as a panelist on Under the Hood of AI, an expert panel on AI infrastructure at NYU School of Law. Event

2025

Dec 2025

Presented Regularizing the Entropy Landscape of Self-Attention at the OPT Workshop, NeurIPS 2025 (San Diego). Workshop

Nov 2025

Presented Spectral Scaling Laws in Language Models at EMNLP 2025 (Suzhou, China). arXiv

Jul 2025

Presented two works at ICML 2025 workshops (Vancouver): Spectral Scaling Laws at AIW, and RMT analysis of Multi-head Latent Attention at HiLD. AIW · arXiv

May 2025

Received the ECE Student Research Poster Day Award at New York University, including a $1,000 cash prize.

Apr 2025

Gave the CILVR seminar Entropy and Private Language Models at the NYU Center for Data Science. Seminar · Video

Mar 2025

Presented Entropy-Guided Attention for Private LLMs at the AAAI PPAI Workshop. arXiv

Selected Talks

2026

Under the Hood of AI

Panelist, Expert Panel on AI Infrastructure, NYU School of Law

Event page

2025

Entropy and Private Language Models

CILVR Seminar Series, NYU Center for Data Science

Seminar · Video · Slides

2025

Entropy-Guided Attention for Private LLMs

Ploutos AI Fireside Chat

Video

2021

DeepReDuce: ReLU Reduction for Fast Private Inference

ICML Spotlight Talk

Video · Slides

Press & Media

Research Coverage

Random Matrix Analysis Reveals Capacity Bottlenecks in Transformer Multi-Head Attention
Quantum Zeitgeist · July 2025

Cracking the code of private AI: The role of entropy in secure language models
NYU Tandon School of Engineering · March 2025

Team streamlines neural networks to be more adept at computing on encrypted data
NYU Tandon · TechXplore · ScienceDaily · 2021

Article

Making Private AI Practical: A Review of “Entropy-Guided Attention for Private LLM”
by Roma Shusterman, CTO at Brain Electrophysiological Laboratory (BEL) · March 2025

Interview

NYU Tandon graduate students bring a wealth of experience to Brooklyn
NYU Tandon School of Engineering · March 2025

Service

Invited Reviewer

Conferences — NeurIPS (2023–2026), ICLR (2024–2026), ICML (2024–2026), CVPR 2024, ICCV 2025, AISTATS 2025, AAAI 2025
Journals — TMLR (2025–2026), TIFS 2025, JETC 2020

Teaching and Outreach

2024

Guest Instructor, K12 Machine Learning Summer School, New York University

2023

Lead Instructor, K12 Machine Learning Summer School, New York University

2019

Mentor, Artificial Intelligence Summer School, IIT Hyderabad

Contact

nj2049@nyu.edu
1032-3, 10th floor, 370 jay street, Brooklyn, NY 11201
Book an appointment
DM Me