Using classification as a tool for discovery

Brent R. Stockwell, Ph.D.
7 min readDec 5, 2022

--

Classification and pattern recognition are crucial for seeing similarities among seemingly unrelated phenomena and hidden differences among seemingly identical phenomena. Classification is a powerful tool for discovery.

As I described in a recent article:

classifying the types of cell death caused by lethal compounds enabled me to discover and name a new type of cell death that I termed ferroptosis.

Indeed, our work on this new type of cell death is largely what put me in the top 1% of cited researchers, as I described recently:

Classification was key to this discovery, because my students and I needed to know that the cell death that we were observing was different from the well-studied types of cell death. By knowing the existing classes of cell death, we could recognize something new.

Classification was crucial in the centuries-long search to identify the cause of movement disorders. The quest began when a young monk, Peter of Herental, in the German town of Aachen observed a strange phenomenon of unrestrained dancing around the time of St. John’s Day in the summer of 1374. Hundreds of men and women held hands and danced and writhed for hours in their homes, in churches and in the streets. Some dancers were cured at a monastery named after St. Vitus, a Christian saint from Sicily and the patron saint of dancers, but the non-stop “dancing” spread to other towns. The phenomenon became known as St. John’s Dance, or St. Vitus’s Dance. Possible causes they considered: demonic possession, failed baptism, or spiritual illness.

St John’s Dance in 1374 (created with Jasper.ai)

The scientist Paracelsus, born in 1493 in the area of Switzerland and who humbly chosen this name for himself due to his expectation of surpassing the influence of the 1st century writer Celsus who summarized all of known medicine in a comprehensive encyclopedia, coined the term chorea, from the ancient Greek word for dance, to refer to uncontrolled dancing, and specifically chorea Sancti Viti, to refer to St. Vitus’s Dance. Naming a phenomenon is often a first step to understanding it.

Paracelsus recognized different types of chorea, including one that he named chorea naturalis, which he assumed to arise from natural causes. This was a key insight — that a similar phenomenon can have different causes and therefore be categorized on the basis of root causes. Paracelsus introduced us to the power of organizing data by classification, allowing us to find patterns, trends, and unexpected outliers that we would miss.

Over a century and a half later, another physician would make use of classification to advance our knowledge of movement disorders. Thomas Sydenham was born to a well-off family in England in 1624 and revealed a possible cause of the uncontrolled dancing in the middle ages: now known as Sydenham’s Chorea, caused by antibodies in children that cause brain damage, resulting in the inability to control movements. Indeed, the outbreaks of St. Vitus’s dance in the middle ages may have been due to bacterial infections driving this type of chorea.

In the field of data science, machine learning is used to classify data and identify hidden patterns that are too subtle for humans to detect. This approach has been used for automatic facial recognition in photos and videos, to create machines that can drive cars, translate foreign languages, and more trivially, suggest new movies, TV shows, and songs that you will probably enjoy, but haven’t heard about yet. Efforts are underway to see whether machine learning can outperform physicians at interpreting clinical imaging data, providing superior diagnoses by seeing patterns in an x-ray or MRI image that are too subtle for human doctors to detect.

Photo by Ali Shah Lakhani on Unsplash

In 2018, Alphabet developed a deep learning artificial intelligence named AlphaZero. More effective than the earlier IBM AI tool named Watson (of Jeopardy fame), AlphaZero taught itself how to win at chess, Go, and the Japanese chess game shogi through hundreds of thousands of practice games. AlphaZero beat the reigning world champion of chess, a specialized computer program known as Stockfish, which itself was capable of beating human competitors.

However, unlike Stockfish, which had no real understanding of chess, AlphaZero developed complex strategies and an intuitive ability to master complex chess maneuvers. This ability came from its reinforcement learning of its own game playing, based on recognizing the patterns and future possibilities in each board position. This improved its capability to master these games, and perhaps many other games, at a previously unseen level of performance within a few hours of being introduced to each game. Thus, for certain applications, machine learning can be very effective at spotting subtle and critical patterns.

Photo by Haroon Ameer on Unsplash

Human physicians are excellent at spotting patterns related to medical issues. In the case of inherited forms of chorea, the recognition of an inherited chorea as a distinct disease entity was made in East Hampton, New York. A young man named George, born in 1850 to a family of physicians, observed his first case of inherited chorea when he was eight years old:

“Driving with my father through a wooded road leading from East Hampton to Amagansett, we suddenly came upon two women, mother and daughter, both tall, thin, almost cadaverous, both bowing, twisting, grimacing. I stared in wonderment, almost in fear. What could it mean? My father paused to speak with them and we passed on. Then my Gamaliel-like instruction began; my medical education had its inception. From this point on, my interest in this disease has never wholly ceased.”

Mother and daughter in the woods (created with Jasper.ai)

Here was a medical mystery. What was the cause of this inherited chorea, which George knew was well established in East Hampton even prior to 1797, when George’s grandfather arrived in the town. Efforts to trace the origin of the disease suggested it may have originated in England, in the county of Suffolk, with the emigration of 700 people to Salem, Massachusetts under the guidance of John Winthrop. It has been debated whether the uncontrolled movements of individuals with this chorea were taken as signs of witchcraft during the witch trials in England and Salem.

George published his first paper in 1872 analyzing chorea, including hereditary cases. The 22-year old physician worked with his father and grandfather in their medical practice, and although he would publish just three papers in his entire career, his first paper was a major breakthrough. The paper offered a precise and clear description of inherited chorea, which he and his family had observed in the local population. He suggested that this inherited chorea was a distinct disease entity, despite its superficial similarity to other movement disorders. In time, this would be known as Huntington’s chorea, and then Huntington’s Disease or Huntington Disease, in his honor. George Huntington and his father and grandfather saw this pattern and classified these patients as having a disease distinct from other movement disorders — this was the first step to understanding what caused their uncontrolled movements.

Discovering the genetic change that causes Huntington’s Chorea would require more than a century of research. In 1993, an international consortium used samples from Venezuelan patients to discover the single gene that is altered in patients with Huntington’s Disease, now known as the huntingtin gene. This provided a diagnostic test to unambiguously identify patients with the disease, as well as those who might develop the disease as they age. Moreover, hope still abounds that with the identity of the affected gene in hand, experimental drugs being developed will eventually yield a cure that reverses the effects of the mutant gene.

The insights of Paracelsus, Sydenham, and Huntington resulted in major advances in our understanding of the cause of uncontrolled movements. Classification and pattern recognition were essential for these discoveries of the causes of movement disorders. By recognizing different forms of movement disorders, we are able to see that some, but not all, types of movement disorders are of a common type. This ability to correctly classify objects of study is essential to making many breakthroughs.

Classifying movement disorders (created with Jasper.ai)

Nobel Laureate Herbert Simon argued that human intuition is a type of subconscious pattern recognition. Simon noted that chess grandmasters often say that they play by intuition, which may be their subconscious analyzing and recognizing patterns on the chess board that provide them an advantageous move. According to this view, the way to get better at recognizing patterns in a specific field of interest is to practice, and specifically to pay attention to the patterns that occur in each situation, and what the later outcome is.

When doing research, generating art, or even creating a business strategy, it is helpful to classify your observations, to see if they fit existing models. If not, you may have found a new model that breaks the mold.

Dancers and a breakthrough idea (created with Jasper.ai)

--

--

Brent R. Stockwell, Ph.D.
Brent R. Stockwell, Ph.D.

Written by Brent R. Stockwell, Ph.D.

Chair and Professor of Biological Sciences at Columbia University. Top Medium writer in Science, Creativity, Health, and Ideas

Responses (1)