About this course
Data science is playing an ever increasing role in physics. In this two day tutorial, we will introduce data science as it applies to a variety of fields in physics. The first day of the course is an introduction to the fields of data science and machine learning (ML) as they apply to physics data. We will then provide an introduction to machine learning, including both regression and classification algorithms. This session will explain why neural networks work and describe the practical steps needed to train a model, such as feature engineering, hyperparameter tuning, and validation. We will conclude the first day of the tutorial with an introduction to unsupervised learning techniques (including clustering and random forests), as well as a session which will introduce both neural networks (NNs) and convolutional networks (CNNs). The second day of this course will provide sessions on advanced topics in data science and machine learning. The first three sessions will cover graph neural networks (GNNs) and large language models (LLMs), introducing the topics and then focusing on their applications to physics. The final four sessions of the tutorial will cover a range of applications of both machine learning and data science. The session “Assessing Training Data: Material Data APIs” will cover accessing large, online databases of materials data to use as training data for machine learning algorithms. The session “Introduction to neural-network quantum states (NQS)” aims to provide a clear understanding of NQS and their broader applications in quantum many-body physics by introducing the theoretical and computational background necessary for constructing NQS, focusing on the quantum harmonic oscillator. The third session of the afternoon, “Using Data Science to Understand Complexity in Soft Matter Systems”, will discuss recent applications of data science and machine learning to understanding complexity in soft matter systems. Finally, the session “Applications of Machine Learning to Biology” will focus on using AI to build “mechanistic foundation models” capable of physics simulations of the brain and the body of the fruit fly.
Who should attend?
Graduate students, post-docs, and other scientists interested in learning how to apply data science to their research should attend this tutorial. The lectures will provide an introduction to data science and its applications in physics. There will also be a large hands-on component during the sessions where participants will be provided with real physics codes to manipulate and use throughout the course. We assume that participants will have some experience with Python, Numpy, and Matplotlib at the level of a software carpentry course and we will provide a link to learning materials before the tutorial.
APS Members
Grad students: $200.00
EC: $250.00
APS members/Retired: $250.00
Non-APS members
Grad students: $200.00
EC: $250.00
Other non-members/Retired: $250.00
Topics
- Data visualization and exploratory data analysis
- Regression and classification models
- Unsupervised machine learning
- Neural networks
- Convolutional neural networks
- Graph neural networks
- Large language models
- Databases and APIs
Co-Sponsors
DBIO; DCOMP; DPF: DSOFT
Speakers
- Julie Butler (University of Mount Union), Data Exploration and Visualization
- Jim Pivarski (Princeton University), Introduction to Machine Learning
- Trevor Rhone (Rensselaer Polytechnic Institute), Introduction to Unsupervised Learning
- William Ratcliff (NIST), Introduction to Neural Networks and Convolutional Neural Networks
- Savannah Thais (Columbia University), Introduction to Graph Neural Networks
- John McNally (Wolfram Research), Introduction to Large-Language Models and Retrieval Augmented Generation
- Benjamin Nachman (Lawerence Berkley National Laboratory), Physics Applications of Large Language Models
- Cormac Toher (The University of Texas at Dallas), Assessing Training Data: Material Data APIs
- Jane Kim (Ohio University), Introduction to neural-network quantum states
- Helen Ansell (Northwestern University), Using Data Science to Understand Complexity in Soft Matter Systems
- Srinivasa Turaga (HHMI Janelia Research Campus), Applications of Machine Learning to Biology
Organizers
- Julie Butler, University of Mount Union
- William Ratcliff, NIST