One of NCAR’s Strategic Imperatives is the training and preparation of the next generation of scientists to continue NCAR’s work. We believe that this training needs to start early, and one of the key elements is preparing students still in high school for the challenges of working with the huge amounts of data required to make progress in our field. Science of all flavors is destined to be based on ever-larger datasets as our technologies and methods improve, so programming skills, the ability to proficiently manipulate these data sets, and a strong understanding of statistics (collectively known as “data analytics”) are all vital skills for tomorrow’s scientists.
This past summer, NCAR’s Computing and Information Systems Laboratory (CISL) ran the “Data Analytics Bootcamp for High School Students” – an opportunity for 10 Boulder Valley School District high school sophomores and juniors to gain a hands-on introduction to being a Data Scientist. This five-day workshop was held 22–26 June 2015 at NCAR’s Mesa Lab facility in Boulder.
Demand for data scientists continues to increase as the Big Data era produces data in varieties and volumes far exceeding anything scientists and engineers have ever had to manage before. Effective data analysis – using data to answer practical questions – underpins decision making in many fields and is the power behind many of the most successful web enterprises including Google, Facebook, Amazon, and Orbitz. For NCAR researchers, effective data analysis also promises to unlock more scientific information from observations and numerical simulations in the geosciences. This bootcamp introduced data analysis concepts by presenting exercises using real data applied to real-life situations. Some of the examples covered concepts in climate, but others were just fun, for example analyzing the performances of basketball players and pricing used cars.
The workshop was sponsored by CISL’s Institute for Mathematics Applied to the Geosciences (IMAGe) and was provided at no cost to the students. Organizer Dorit Hammerling (IMAGe Project Scientist II) and sponsor/co-organizer Doug Nychka (IMAGe Director) designed the curriculum to be a hands-on and engaging experience for the students. Supported by a team of instructors and programming coaches from NCAR, UCAR, CU Boulder, Colorado School of Mines, and Columbia University, the students were presented with a sequence of 15-minute lessons: 5 minutes of teaching followed by 10 minutes of hands-on exercises for students to apply their new knowledge. Students used a research-level software package called R to carry out the data analysis, while covering six fundamental concepts in data analytics:
By connecting with self-motivated young people as early in their lives as possible, Hammerling and Nychka aim to stimulate their interest and build their skills in using data analysis to solve real problems and prepare them for future careers in science. Hammerling summarized this new workshop’s outcome: “All the students learned about data analysis and developed skills using the R statistical programming environment to solve problems. They left the workshop with R skills that they can readily apply in internships or other employment opportunities. And IMAGe hopes to hire some of these freshly trained people as student assistants to help advance our current research projects.”