Training Seminars

PharmaSUG 2022 U.S.

Enhance your PharmaSUG experience by attending optional pre- and post-conference training seminars taught by seasoned experts. Half-day courses are only $175 with a conference registration, or $225 without a conference registration. You can sign up for classes through the conference registration system. Space is limited!

Saturday, May 21, 2022

	Course Title (click for description)	Instructor(s) (click for bio)	Time	Room *
#12	Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Beyond the Basics)	Matthew Slaughter & Isaiah Lankham	1:00 PM - 5:00 PM	205

Sunday, May 22, 2022

	Course Title (click for description)	Instructor(s) (click for bio)	Time	Room *
#21	Oncology Study Seminar for Programmers and Biostatisticians	Kevin Lee	8:00 AM - 12:00 PM	202
#22	Hands-On Data-Driven Design: Developing More Flexible, Reusable, Configurable SAS Software	Troy Hughes	8:00 AM - 12:00 PM	203
#23	Deep Dive into Electronic Submission Components for Regulatory Submission of Clinical Study Data	Prafulla Girase	8:00 AM - 12:00 PM	204
#31	Python Programming Seminar – Advanced with Machine Learning	Kevin Lee	1:00 PM - 5:00 PM	202
#32	Introduction to R for the Statistical Programmer	Mike Stackhouse	1:00 PM - 5:00 PM	203
#33	Clinical Tables in R with GT	Phil Bowsher	1:00 PM - 5:00 PM	204
#34	FDA & PMDA Submission Data Requirements	David Izard	1:00 PM - 5:00 PM	205

Wednesday, May 25, 2022

	Course Title (click for description)	Instructor(s) (click for bio)	Time	Room *
#41	Driving Miss Data: Data-Driven Techniques	Richann Watson	1:00 PM - 5:00 PM	205

* All rooms are located near Griffin Hall

Seminar Registration, Attendance, and Cancellation Policy

You must register for a seminar via the PharmaSUG 2022 conference registration form online.
You may cancel a seminar on or before May 13, 2022, and receive a full refund minus a $25 administration fee per cancelled seminar.
You may add a seminar on or before May 13, 2022 for no additional fee. To sign up for an additional seminar after you have already registered for the conference, please contact the This email address is being protected from spambots. You need JavaScript enabled to view it..
On or before May 13, 2022, you may swap one seminar for another; however, this is considered a change in conference registration and will incur a $25 administration fee.
After May 13, 2022, you MAY NOT SWAP seminars; however, a new seminar may be added depending on space and availability.
There will be NO REFUNDS after May 13, 2022. However, if you are unable to attend, the seminar material will be provided to you (either by postal mail or email) without additional charge.
Should a seminar be cancelled at any time for any reason, the sole liability of PharmaSUG and the instructor is a refund of the seminar fee, and they are NOT liable for any special or consequential damages arising from the cancellation of the seminar.
On-site registration will be permitted based on space and availability, and payable by major credit card (MC, VISA, Discover, AMEX). However, seminar materials may not be available on-site but will be provided later to paid attendees.
You may sign up for seminars occurring at the same time, i.e., you can attend one class and ask for material for another class, bearing in mind that tuition must be paid for both seminars.

For questions about the above seminar policy and availability, please contact Cindy Song and Natalie Martinez, Seminar Coordinators, at This email address is being protected from spambots. You need JavaScript enabled to view it..

Course Descriptions

Using SAS in Python Applications with SASPy and Open-Source Tooling
Matthew Slaughter, Isaiah Lankham
Saturday, May 21, 2022, 1:00 PM - 5:00 PM

Are you familiar with Python syntax? Want to go beyond the basics, and use SAS and Python together like a pro?

In this hands-on class, we'll practice writing Python scripts in Google Colab (an online implementation of JupyterLab). These Python scripts will link to SAS OnDemand for Academics using the Python package SASPy developed by SAS Institute. We'll also practice using the popular Python package pandas, whose DataFrame objects are the Python equivalent of SAS datasets.

Along the way, we'll work through common data-analysis tasks using both regular SAS code and Python together with the SASPy package, highlighting important tradeoffs for each and emphasizing the value of being a polyglot programmer fluent in multiple languages. Specific examples include advanced data-manipulation techniques, using SASPy as an interface for SAS/STAT, rectangularizing complex JSON-formatted data returned by web APIs, and creating simple Python web applications incorporating SAS analytics.

This class is aimed at intermediate to advanced SAS programmers, but assumes only basic familiarity with Python syntax and pandas DataFrames. However, no knowledge of JupyterLab is assumed. Accounts for Google and SAS OnDemand for Academics will be needed to interact with code examples. All class materials, including complete setup instructions, will be made available through https://github.com/saspy-bffs/pharmasug-2022-class.
Back to top

Oncology Study Seminar for Programmers and Biostatisticians
Kevin Lee
Sunday, May 22, 2022, 8:00 AM - 12:00 PM

Compared to other therapeutic studies, oncology studies are generally complex and difficult for programmers and statisticians. There is more to understand and to know such as different clinical study types, specific data collection points and analysis. In this seminar, programmers and statisticians will learn oncology specific knowledge in clinical studies and will understand a holistic view of oncology studies from data collection, CDISC datasets, and analysis. Programmers and statisticians will also find out what makes oncology studies unique and learn how to lead oncology study projects effectively.

The seminar will cover four different sub-types and their response criteria guidelines. The first sub-type, solid tumor studies, usually follows RECIST (Response Evaluation Criteria in Solid Tumor). The second sub- type, immunotherapy studies, usually follows irRC (immune-related Response Criteria). The third sub-type, lymphoma studies, usually follows Cheson. Lastly, leukemia studies follow study-specific guidelines (e.g., IWCLL for chronic lymphocytic leukemia). The seminar will show how to use response criteria guidelines for data collection and response evaluation.

Programmers and statisticians will learn how to create SDTM tumor specific datasets (RS, TU, TR), what SDTM domains are used for certain data collection, and what Controlled Terminology (e.g., CR, PR, SD, PD, NE) will be applied. They will also learn how to create time-to-event ADaM datasets from SDTM domains and how to use ADaM datasets to derive efficacy analysis (e.g., OS, PFS, TTP, ORR, DFS) and Kaplan Meier curves using SAS procedures such as PROC LIFETEST and PHREG.

Finally, programmers and statistician will understand how to build end-to-end standards-driven oncology studies from protocol, study sub-types, response criteria, data collection, SDTM and ADaM to analysis.
Back to top

Hands-On Data-Driven Design: Developing More Flexible, Reusable, Configurable SAS Software
Troy Hughes
Sunday, May 22, 2022, 8:00 AM - 12:00 PM

Attend and receive a FREE copy of the author's 550-page book, SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality, Second Edition, released in May 2022. Students will receive the physical book at the course, and can run all code during the course using SAS Display Manager, SAS Enterprise Guide, SAS University Edition, or SAS OnDemand for Analytics.

This HANDS-ON workshop installs the student as the new SAS consultant within Scranton, Pennsylvania’s most infamous paper supply company — charged with improving software functionality and performance through data-driven software design. Navigate office intrigue and antics to gather software requirements, analyze hardcoded legacy SAS programs, and refactor and improve software through data-driven design. Students can run all examples. Help Jim, Dwight, Phyllis, and Stanley sell more paper through higher quality data-driven software!

Data-driven design describes software in which configuration items, business rules, data validation rules, data models, data dictionaries, report style, and other dynamic elements are maintained in external data structures – NOT in underlying code. Benefits include increased software flexibility, reusability, maintainability, modularity, readability, interoperability, extensibility, and configurability.

Topics include:

Compare preferred data-driven design with undesirable hardcoded design
Build reusable procedures, functions, and call routines (subroutines) using SAS macros and PROC FCMP (the SAS function compiler)
Demonstrate built-in and user-defined data structures (e.g., parameters, macro lists, arrays, has objects, control tables, configuration files, data sets, Excel, CSV, CSS)
Use SAS components that support data-driven development (e.g., CALL EXECUTE, CNTLIN option in PROC FORMAT, SYSPARM option, SAS dictionary tables, SAS arrays, CSSSTYLE option in PROC REPORT)
Ingest positional flat files, CSV files, SAS data sets, and other transactional files, and dynamically identify altered file format/structure through prescriptive data dictionaries
Create color-coded, “traffic light” quality control reports that automatically identify bad data while standardizing good data
Configure the style (e.g., format, font, color scheme, graphics) of data products using user-defined SAS formats and CSS files

Deep Dive into Electronic Submission Components for Regulatory Submission of Clinical Study Data
Prafulla Girase
Sunday, May 22, 2022, 8:00 AM - 12:00 PM

A regulatory submission of clinical study data also needs to be accompanied by various other electronic submission (eSUB) components such as Define-XML, annotated CRF, study data reviewer’s guide, analysis data reviewer’s guide etc. This seminar will take a deep dive into each of these components and educate attendees about key contents, best practices and Global considerations (i.e. FDA & PMDA) during preparation of these components. For example, attendees will learn characteristics of a submission-ready annotated CRF (i.e. annotations, validated bookmarks/links, document properties etc.). It will also go over key considerations related to preparation of a whole eSUB package for a submission such as folder structure considerations, PDF validation practices, final package checklist, regulatory hand-off etc. The author also plans to share general insights from his practical experience of attending face to face data format consultation meeting with PMDA.
Back to top

Python Programing Seminar – Advanced with Machine Learning
Kevin Lee
Sunday, May 22, 2022, 1:00 PM - 5:00 PM

The advanced Python programming seminar will cover more advanced Python programming. It is recommended for those who took last year’s Python course, or for those who have some knowledge, but want to learn more advanced Python programming. This seminar will also cover Machine Learning implementation using Python.

Agenda for advanced Python programming seminar

Simple review of basic Python Programing seminar
Metadata analysis (PROC CONTENT)
Advanced programming – transpose, remove duplicate record, group-by
Statistical analysis – paired t-test, Fisher's Exact Test, survival analysis
Data visualization - scatterplot, histogram, Kaplan Meier curves
Machine learning introduction – concepts and theory
Machine learning algorithms – regression, logistic regression, decision trees, random forest, XGBoost, K-means clustering, KNN
Deep learning algorithms – Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN)
Python machine learning modules – Sklearn, Tensorflow, Keras
Python machine learning workshop using image data

Through the seminars, programmers and statisticians will learn the following:

Deeper understanding of Python programming
Jupyter Notebook download and experience
Real time Python coding exercise
Differences and similarities with SAS programming
Data manipulation and analysis in Python
Machine learning programming in Python

Introduction to R for the Statistical Programmer
Mike Stackhouse
Sunday, May 22, 2022, 1:00 PM - 5:00 PM

Clinical Tables in R with GT
Phil Bowsher
Sunday, May 22, 2022, 1:00 PM - 5:00 PM

FDA & PMDA Submission Data Requirements
David Izard
Sunday, May 22, 2022, 1:00 PM - 5:00 PM

Audience Level:

Driving Miss Data: Data-Driven Techniques
Richann Watson
Wednesday, May 25, 2022, 1:00 PM - 5:00 PM

Instructor Biographies

Phil Bowsher Phil is the Director of Healthcare and Life Sciences at RStudio and founder of the R in Pharma gathering at Harvard University. Phil is a published author and award-winning speaker, having given over 100 R talks and workshops in 4 countries to an estimated 20,000 people. His work focuses on innovation in the pharmaceutical industry, with an emphasis on interactive web applications, reproducible research and open-source education. He is interested in the use of R with applications in drug development and is a contributor to conferences promoting science through open data and software. Phil (RStudio Shiny Train-the-Trainer certified) has been one of the foremost promoters of Shiny, R Markdown, and the Tidyverse in the drug development process, documenting and explaining each in detail. He has experience at a number of technology and consulting corporations working in data science teams and delivering innovative data products. Phil has over 15 years’ experience implementing analytical programs, specializing in interactive web application initiatives and reporting needs for life science companies.
Prafulla Girase Prafulla Girase has 20+ years of experience in Biotech industry including experience in statistical programming and data standards space. He has worked as an electronic submission (eSUB) lead or co-lead on five NDA/BLA clinical data submission packages that are currently approved therapies in the market. Prafulla has experience attending meetings with regulatory agencies (FDA/PMDA) regarding data standards including attendance of face-to-face data format consultation meeting with PMDA. He currently works as a Director, Data Standards and Governance at Alexion AstraZeneca Rare Disease where he is responsible for leading data standards and governance within Statistical Programming.
Troy Hughes Troy Martin Hughes has been a SAS practitioner for more than 20 years, has managed SAS projects in support of federal, state, and local government initiatives, and is a SAS Certified Advanced Programmer, SAS Certified Base Programmer, SAS Certified Clinical Trials Programmer, and SAS Professional V8. Since 2013, he has given more than 100 presentations, trainings, and hands-on workshops at SAS conferences, including at SAS Global Forum, SAS Analytics Experience, WUSS, SCSUG, SESUG, MWSUG, PharmaSUG, BASAS, and BASUG. He has authored two groundbreaking books that model software design and development best practices:
David Izard Dave Izard frequently finds himself at the intersection of clinical data standards, regulatory expectations and sponsor organization needs and desires. A pharmaceutical professional since 1997, he currently serves as Programming Director at GlaxoSmithKline, supporting Infectious Disease clinical asset development and GSK’s efforts to expand their regulatory submission capabilities. Earlier opportunities include serving as Senior Director of Clinical Data Standards at Chiltern (Covance), Clinical Data Consulting Lead at Accenture, Head of Octagon Research Solutions' SDTM practice, and a variety of Clinical Programming leadership roles at both GSK and Shire.
Isaiah Lankham Isaiah Lankham is an Advanced SAS Certified Programmer and a polyglot data analyst for the University of California's systemwide office in Oakland, CA, specializing in data management/warehousing using Salesforce and data analysis/visualization using Tableau, SAS, and Python. Initially trained as a mathematician and educator, Isaiah is also an adjunct faculty member for the Statistics Department at California State University, East Bay, and enjoys regularly teaching graduate SAS programming courses.
Kevin Lee Kevin Lee is a Data Scientist, statistician, Machine Learning working group lead, corporate/university trainer and evangelist in new technology. Kevin supports the pharmaceutical industry as AVP of AI/Machine Learning Consultant at Genpact. Among all the therapeutic areas, Kevin always loves oncology studies, and he is an active supporter of oncology-specific standards such as CDISC tumor datasets, controlled terminology and response criteria on each study type. Kevin wants to innovate the pharmaceutical industry with AI/Machine Learning technology, and he currently leads the PHUSE AI/Machine Learning Working Group. He also teaches Machine Learning and Python programming in university and corporations. Kevin has presented about 100 papers at various conferences including many oncology-related and Machine Learning based papers. Kevin earned an M.S. in Applied Statistics at Villanova University following a B.S. from University of Pennsylvania. Kevin is a life-time learner who loves to learn and share.
Matthew Slaughter Matthew Slaughter, MSBA is an Advanced SAS Certified Programmer and a Data Scientist at the Kaiser Permanente Center for Health Research in Portland, Oregon. With a focus on clinical prediction modeling, Matthew provides data management, programming, and analytical support to research projects in various topic areas.
Mike Stackhouse Mike Stackhouse, Chief Innovation Officer of Atorus Research, is at the cutting edge of data technology within the pharmaceutical industry. He has extensive CDISC experience, working with both Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) standards, and serving as a subject matter expert for Define.xml. He holds a bachelor’s degree from Arcadia University, where he studied business administration, economics, and statistics. He is a 2020 UC Berkeley School of Information Master of Information and Data Science (MIDS) program graduate, where he worked on projects involving computer vision, natural language processing, cluster computing, and deep learning. Currently, Mike serves as the co-lead of the PHUSE working group Data Visualization and Open-source Technology. Mike and his team at Atorus have developed several open-source R packages, including the Atorus packages Tplyr and pharmaRTF.
Richann Watson Richann Watson is an independent statistical programmer and CDISC consultant based in Ohio. She has been using SAS since 1996 with most of her experience being in the life sciences industry. She specializes in analyzing clinical trial data and implementing CDISC standards. Additionally, she is a member of the CDISC ADaM team and various sub-teams. Richann loves to code and is an active participant and leader in the SAS User Group community. She has presented numerous papers, posters, and training seminars at SAS Global Forum, PharmaSUG, and various regional and local SAS user group meetings. Richann holds a bachelor’s degree in mathematics and computer science from Northern Kentucky University and master’s degree in statistics from Miami University.