Mo Shakiba · NeuroAI researcher

Mo Shakiba


Dated: 3 April 2025 Essay

Novel Statistical Learning Methods in Neural Coding


Introduction

Statistical learning is about finding the right frame for the picture we have taken—our data—through the lens of statistical models. The chosen frame can later be used to capture a similar picture—our new data—helping us generalize and make inferences about unseen information. The study of neural coding aims to capture neural activity—across various settings—to uncover the underlying structure of the brain’s activity in cognition, memory, language processing, and more. The challenge arises when the number of pictures captured, exceeds the framing power in number or complexity. In this essay, I justify the use of novel approaches to study neural activity and explain why they are necessary. Second, I discuss how novel statistical learning methods such as machine learning algorithms, manifold theory, and probabilistic models provide a robust framework for understanding neural coding in modern neuroscience, where data is often vast and complex. Finally, I conclude by arguing that before rushing to set up new experiments to collect more data, we should first utilize existing data to benchmark and validate these novel methods.


A Shift In Paradigm: The Neural Population Doctrine

Most brain functions rely on the activity and interactions of many neurons. Recent advances allow neuroscientists to record from a vast number of neurons, and “there are settings in which data fundamentally cannot be understood on a single neuron” ‹a›. For instance, in macaques, individual neurons exhibit highly variable and often inconsistent responses during motor learning tasks, whereas the entire neural population reveals a coherent pattern of activity ‹b›. In the 2019 review paper Towards the Neural Population Doctrine ‹c›, the authors propose “a future where the central scientific theme is not the neuron doctrine, but the neural population doctrine.” That future has arrived, but the caveat is that the data we have is high-dimensional (thousands of neurons), complex, and incompatible with traditional statistical methods. If recording from just 20 neurons is thought of as having two states—ON and OFF—then the number of possible states exceeds one million (220). Despite the number of observed state patterns being smaller and constrained by neuronal connectivity, we still require novel statistical frameworks to extract meaningful insights and study brain functions from the perspective of neural populations.


Neural Manifolds

Novel statistical learning methods hold promise in unraveling the information contained within high-dimensional data. Machine learning algorithms can learn the underlying patterns present in the data and are more powerful as the amount of available data increases. Namely, when it comes to neural decoding, neural networks, and ensemble methods surpass the performance of conventional approaches ‹d›. Another example is the power of these algorithms when neural population activity is examined in lower-dimensional spaces. Manifold theory in neuroscience posits that neural activity often resides in a much lower-dimensional (latent) space than the data suggests. If we could project the activity onto this space, then the geometry and structure of the manifold we acquire would be informative about how the brain represents information. This idea, initially confirmed in the fruit fly head direction system, has also been successfully applied to studies of decision-making, the motor system, and the olfactory system, as well as working memory, visual attention, the auditory system, rule learning, speech, and more ‹a›. A novel deep learning algorithm—CEBRA—has been introduced to the field of neuroscience with the ability “to produce both consistent and high-performance latent spaces, that jointly uses behavioral and neural data.” ‹e›. CEBRA benchmarks are significantly higher than traditional methods such as Principal Component Analysis (PCA), demonstrating the sheer power of new approaches to answer long-standing questions. Probabilistic models are also essential in neuroscience since they provide computationally tractable approximations for analyzing high-dimensional population activity ‹f›. All of the preceding discussion ties back to why these novel approaches are becoming increasingly omnipresent in neuroscience and why they should be adopted when navigating the vast complexity of neural data.


DATA, DATA everywhere

The modern neuroscience landscape is filled with an overwhelming number of digital repositories. DANDI and BossDB are just a few examples, collectively providing thousands of terabytes of data. Most contain high-quality data from experiments involving various types of recordings. One could argue that before rushing to design experiments to analyze and utilize the aforementioned novel statistical methods, we should first explore these datasets, benchmark our tools, and reevaluate our approaches. Collecting more data would only be feasible if high-quality data is lacking to answer our questions. We know very little about our available data while being overwhelmed with tools and techniques, therefore I believe in a future where scientists ensure continuity in the knowledge they acquire and the tools they utilize.


Conclusion

In conclusion, novel statistical methods provide an exciting avenue for understanding, hypothesizing, and testing models of the brain. The shift towards understanding brain function through population activity rather than single neurons, in tandem with large-scale neural recording data, offers an unprecedented opportunity for neuroscientists to push the boundaries further. In this essay, I first expanded on the emerging paradigm shift from studying single neurons to populations of neurons, with a mention of the neural population doctrine. I then mentioned machine learning and manifold theory as emerging techniques to understand the high-dimensional data dilemma. Lastly, I argued against setting up new experiments to test novel statistical paradigms, insisting instead on focusing on the available data. The path to deciphering the brain’s complexities lies not in reinventing the experimental wheel, but in applying our statistical tools to the vast neural datasets already before us.

“Data is not information, information is not knowledge and knowledge is not understanding” — Clifford Stoll


a. Dimensionality reduction for large-scale neural recordings
b. A Neural Population Mechanism for Rapid Learning
c. Towards the neural population doctrine
d. Machine Learning for Neural Decoding
e. Learnable latent embeddings for joint behavioural and neural analysis
f. Building population models for large-scale neural recordings