Enhance your PharmaSUG experience by attending optional pre- and post-conference training seminars taught by seasoned experts. Half-day courses are only $175 with a conference registration, or $225 without a conference registration. You can sign up for classes through the conference registration system. Space is limited!

Saturday, May 21, 2022

Course Title (click for description) Instructor(s) (click for bio) Time Room *
#12 Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Beyond the Basics) Matthew Slaughter
& Isaiah Lankham
1:00 PM - 5:00 PM 205

Sunday, May 22, 2022

Course Title (click for description) Instructor(s) (click for bio) Time Room *
#21 Oncology Study Seminar for Programmers and Biostatisticians Kevin Lee 8:00 AM - 12:00 PM 202
#22 Hands-On Data-Driven Design: Developing More Flexible, Reusable, Configurable SAS Software Troy Hughes 8:00 AM - 12:00 PM 203
#23 Deep Dive into Electronic Submission Components for Regulatory Submission of Clinical Study Data Prafulla Girase 8:00 AM - 12:00 PM 204
#31 Python Programming Seminar – Advanced with Machine Learning Kevin Lee 1:00 PM - 5:00 PM 202
#32 Introduction to R for the Statistical Programmer Mike Stackhouse 1:00 PM - 5:00 PM 203
#33 Clinical Tables in R with GT Phil Bowsher 1:00 PM - 5:00 PM 204
#34 FDA & PMDA Submission Data Requirements David Izard 1:00 PM - 5:00 PM 205

Wednesday, May 25, 2022

Course Title (click for description) Instructor(s) (click for bio) Time Room *
#41 Driving Miss Data: Data-Driven Techniques Richann Watson 1:00 PM - 5:00 PM 205

* All rooms are located near Griffin Hall



Seminar Registration, Attendance, and Cancellation Policy

  1. You must register for a seminar via the PharmaSUG 2022 conference registration form online.
  2. You may cancel a seminar on or before May 13, 2022, and receive a full refund minus a $25 administration fee per cancelled seminar.
  3. You may add a seminar on or before May 13, 2022 for no additional fee. To sign up for an additional seminar after you have already registered for the conference, please contact the This email address is being protected from spambots. You need JavaScript enabled to view it..
  4. On or before May 13, 2022, you may swap one seminar for another; however, this is considered a change in conference registration and will incur a $25 administration fee.
  5. After May 13, 2022, you MAY NOT SWAP seminars; however, a new seminar may be added depending on space and availability.
  6. There will be NO REFUNDS after May 13, 2022. However, if you are unable to attend, the seminar material will be provided to you (either by postal mail or email) without additional charge.
  7. Should a seminar be cancelled at any time for any reason, the sole liability of PharmaSUG and the instructor is a refund of the seminar fee, and they are NOT liable for any special or consequential damages arising from the cancellation of the seminar.
  8. On-site registration will be permitted based on space and availability, and payable by major credit card (MC, VISA, Discover, AMEX). However, seminar materials may not be available on-site but will be provided later to paid attendees.
  9. You may sign up for seminars occurring at the same time, i.e., you can attend one class and ask for material for another class, bearing in mind that tuition must be paid for both seminars.

For questions about the above seminar policy and availability, please contact Cindy Song and Natalie Martinez, Seminar Coordinators, at This email address is being protected from spambots. You need JavaScript enabled to view it..




Course Descriptions

Using SAS in Python Applications with SASPy and Open-Source Tooling
Matthew Slaughter, Isaiah Lankham
Saturday, May 21, 2022, 1:00 PM - 5:00 PM


Are you familiar with Python syntax? Want to go beyond the basics, and use SAS and Python together like a pro?

In this hands-on class, we'll practice writing Python scripts in Google Colab (an online implementation of JupyterLab). These Python scripts will link to SAS OnDemand for Academics using the Python package SASPy developed by SAS Institute. We'll also practice using the popular Python package pandas, whose DataFrame objects are the Python equivalent of SAS datasets.

Along the way, we'll work through common data-analysis tasks using both regular SAS code and Python together with the SASPy package, highlighting important tradeoffs for each and emphasizing the value of being a polyglot programmer fluent in multiple languages. Specific examples include advanced data-manipulation techniques, using SASPy as an interface for SAS/STAT, rectangularizing complex JSON-formatted data returned by web APIs, and creating simple Python web applications incorporating SAS analytics.

This class is aimed at intermediate to advanced SAS programmers, but assumes only basic familiarity with Python syntax and pandas DataFrames. However, no knowledge of JupyterLab is assumed. Accounts for Google and SAS OnDemand for Academics will be needed to interact with code examples. All class materials, including complete setup instructions, will be made available through https://github.com/saspy-bffs/pharmasug-2022-class.
Back to top


Oncology Study Seminar for Programmers and Biostatisticians
Kevin Lee
Sunday, May 22, 2022, 8:00 AM - 12:00 PM


Compared to other therapeutic studies, oncology studies are generally complex and difficult for programmers and statisticians. There is more to understand and to know such as different clinical study types, specific data collection points and analysis.  In this seminar, programmers and statisticians will learn oncology specific knowledge in clinical studies and will understand a holistic view of oncology studies from data collection, CDISC datasets, and analysis.  Programmers and statisticians will also find out what makes oncology studies unique and learn how to lead oncology study projects effectively.

The seminar will cover four different sub-types and their response criteria guidelines.  The first sub-type, solid tumor studies, usually follows RECIST (Response Evaluation Criteria in Solid Tumor). The second sub- type, immunotherapy studies, usually follows irRC (immune-related Response Criteria).  The third sub-type, lymphoma studies, usually follows Cheson.  Lastly, leukemia studies follow study-specific guidelines (e.g., IWCLL for chronic lymphocytic leukemia).  The seminar will show how to use response criteria guidelines for data collection and response evaluation.

Programmers and statisticians will learn how to create SDTM tumor specific datasets (RS, TU, TR), what SDTM domains are used for certain data collection, and what Controlled Terminology (e.g., CR, PR, SD, PD, NE) will be applied.  They will also learn how to create time-to-event ADaM datasets from SDTM domains and how to use ADaM datasets to derive efficacy analysis (e.g., OS, PFS, TTP, ORR, DFS) and Kaplan Meier curves using SAS procedures such as PROC LIFETEST and PHREG.

Finally, programmers and statistician will understand how to build end-to-end standards-driven oncology studies from protocol, study sub-types, response criteria, data collection, SDTM and ADaM to analysis.
Back to top


Hands-On Data-Driven Design: Developing More Flexible, Reusable, Configurable SAS Software
Troy Hughes
Sunday, May 22, 2022, 8:00 AM - 12:00 PM


Attend and receive a FREE copy of the author's 550-page book, SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality, Second Edition, released in May 2022. Students will receive the physical book at the course, and can run all code during the course using SAS Display Manager, SAS Enterprise Guide, SAS University Edition, or SAS OnDemand for Analytics.

This HANDS-ON workshop installs the student as the new SAS consultant within Scranton, Pennsylvania’s most infamous paper supply company — charged with improving software functionality and performance through data-driven software design. Navigate office intrigue and antics to gather software requirements, analyze hardcoded legacy SAS programs, and refactor and improve software through data-driven design. Students can run all examples. Help Jim, Dwight, Phyllis, and Stanley sell more paper through higher quality data-driven software!

Data-driven design describes software in which configuration items, business rules, data validation rules, data models, data dictionaries, report style, and other dynamic elements are maintained in external data structures – NOT in underlying code. Benefits include increased software flexibility, reusability, maintainability, modularity, readability, interoperability, extensibility, and configurability.

Topics include:
  • Compare preferred data-driven design with undesirable hardcoded design
  • Build reusable procedures, functions, and call routines (subroutines) using SAS macros and PROC FCMP (the SAS function compiler)
  • Demonstrate built-in and user-defined data structures (e.g., parameters, macro lists, arrays, has objects, control tables, configuration files, data sets, Excel, CSV, CSS)
  • Use SAS components that support data-driven development (e.g., CALL EXECUTE, CNTLIN option in PROC FORMAT, SYSPARM option, SAS dictionary tables, SAS arrays, CSSSTYLE option in PROC REPORT)
  • Ingest positional flat files, CSV files, SAS data sets, and other transactional files, and dynamically identify altered file format/structure through prescriptive data dictionaries
  • Create color-coded, “traffic light” quality control reports that automatically identify bad data while standardizing good data
  • Configure the style (e.g., format, font, color scheme, graphics) of data products using user-defined SAS formats and CSS files

Back to top


Deep Dive into Electronic Submission Components for Regulatory Submission of Clinical Study Data
Prafulla Girase
Sunday, May 22, 2022, 8:00 AM - 12:00 PM


A regulatory submission of clinical study data also needs to be accompanied by various other electronic submission (eSUB) components such as Define-XML, annotated CRF, study data reviewer’s guide, analysis data reviewer’s guide etc. This seminar will take a deep dive into each of these components and educate attendees about key contents, best practices and Global considerations (i.e. FDA & PMDA) during preparation of these components. For example, attendees will learn characteristics of a submission-ready annotated CRF (i.e. annotations, validated bookmarks/links, document properties etc.). It will also go over key considerations related to preparation of a whole eSUB package for a submission such as folder structure considerations, PDF validation practices, final package checklist, regulatory hand-off etc. The author also plans to share general insights from his practical experience of attending face to face data format consultation meeting with PMDA.
Back to top


Python Programing Seminar – Advanced with Machine Learning
Kevin Lee
Sunday, May 22, 2022, 1:00 PM - 5:00 PM


The advanced Python programming seminar will cover more advanced Python programming. It is recommended for those who took last year’s Python course, or for those who have some knowledge, but want to learn more advanced Python programming. This seminar will also cover Machine Learning implementation using Python.

Agenda for advanced Python programming seminar
  • Simple review of basic Python Programing seminar
  • Metadata analysis (PROC CONTENT)
  • Advanced programming – transpose, remove duplicate record, group-by
  • Statistical analysis – paired t-test, Fisher's Exact Test, survival analysis
  • Data visualization - scatterplot, histogram, Kaplan Meier curves
  • Machine learning introduction – concepts and theory
  • Machine learning algorithms – regression, logistic regression, decision trees, random forest, XGBoost, K-means clustering, KNN
  • Deep learning algorithms – Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN)
  • Python machine learning modules – Sklearn, Tensorflow, Keras
  • Python machine learning workshop using image data
Through the seminars, programmers and statisticians will learn the following:
  • Deeper understanding of Python programming
  • Jupyter Notebook download and experience
  • Real time Python coding exercise
  • Differences and similarities with SAS programming
  • Data manipulation and analysis in Python
  • Machine learning programming in Python

  • Back to top


    Introduction to R for the Statistical Programmer
    Mike Stackhouse
    Sunday, May 22, 2022, 1:00 PM - 5:00 PM


    In this workshop, statistical programmers will be introduced to the R programming language and the tidyverse, using familiar clinical examples. Attendees will leave with a basic understanding of what R is, what the tidyverse is and why it’s important, and what the open-source landscape has to offer us in the world of clinical statistical programming. Hands-on programming examples will be offered to give attendees some basic knowledge of the tools available in R to support common clinical workflows, such as SDTM, ADaM and clinical TFLs. If you’ve never worked in R before, but want to see how it can be used in your day-to-day tasks, come join us and see what this powerful open-source language has to offer!
    Back to top


    Clinical Tables in R with GT
    Phil Bowsher
    Sunday, May 22, 2022, 1:00 PM - 5:00 PM


    RStudio will be presenting an overview of the GT R package for the R user community at PharmaSUG. This is a great opportunity to learn and get inspired about new capabilities for generating TFLs (Tables, Figures, and Listings) for inclusion in Clinical Study Reports created in R. In this workshop, we will review and reproduce a subset of common table outputs used in clinical reporting containing descriptive statistics, counts and or percentages.

    No prior knowledge of R or RStudio is needed. This short talk will provide an introduction to gt as a flexible and powerful package for generating tables as part of your research and reporting TFL programming. The talk will provide an introduction to TFL-producing R programs and include an overview of the gt R package with applications in drug development such as safety analysis and Adverse Events. A live environment will be available for attendees to explore the tables real-time.
    Back to top


    FDA & PMDA Submission Data Requirements
    David Izard
    Sunday, May 22, 2022, 1:00 PM - 5:00 PM


    The binding guidance documents requiring you to provide data and related documentation based on US FDA endorsed data standards as part of your electronic submission are in effect for both clinical and non-clinical assets. These documents have moved the needle with respect to Sponsor and CRO organization obligations in terms of how they plan and execute studies as well as prepare study assets for inclusion in a regulatory submission. But it is not just the US FDA when it comes to including data in a submission; Japan's PMDA has moved beyond the pilot phase into the voluntary phase with an eye on requiring submissions based on their endorsed data standards in 2020.

    This highly interactive seminar will review each asset, its role in the submission and the impact that these final guidance documents have on how the asset is handled as it weaves its way through the drug development lifecycle on its way to regulators. Simultaneously we will review the similarities and key differences executing these same tasks when interacting with Japan's PMDA. A portion of the seminar will be dedicated to a discussion of "hot off the press" topics, including a review of FDA & PMDA behavior since these documents have been finalized including Sponsor feedback during the review period. We will also explore how other global regulatory bodies are embracing standards, with a focus on Canada, Europe and China.

    Audience Level: Beginner to Intermediate - individuals who are new to the Pharmaceutical industry would benefit greatly for the opportunity to put their hard work creating analysis datasets and TLFs into the context of a regulatory submission. Conversely, experienced professionals who have created submission assets in the past who are looking for a refresher on recent changes to FDA & PMDA requirements, CDISC standards and the outlook on submission data requirements for other global regulatory bodies would also enjoy this seminar.
    Back to top


    Driving Miss Data: Data-Driven Techniques
    Richann Watson
    Wednesday, May 25, 2022, 1:00 PM - 5:00 PM


    We have all been there. We write a program based on the data we have. Then, we get new data and we must update the program. Making these updates can be time consuming. Not only must you update the production version of the program, but someone must also update any associated validation or QC programs. Wouldn’t it be nice if there were ways around this? This is where data-driven techniques come in handy. Using detailed examples, you will learn how to write robust code that is ready to handle an unexpected bend in the road! This half-day course will cover advanced techniques such as: discovering and using information about data sets and variables even if it's not known in advance; generating dynamic formats that are based on the data instead of hard-coded into your program; using complex looping structures to control your program flow based on the data; building code on the fly, even from within a DATA step; and much more!
    Back to top





    Instructor Biographies

    Phil Bowsher

    Phil is the Director of Healthcare and Life Sciences at RStudio and founder of the R in Pharma gathering at Harvard University. Phil is a published author and award-winning speaker, having given over 100 R talks and workshops in 4 countries to an estimated 20,000 people. His work focuses on innovation in the pharmaceutical industry, with an emphasis on interactive web applications, reproducible research and open-source education. He is interested in the use of R with applications in drug development and is a contributor to conferences promoting science through open data and software. Phil (RStudio Shiny Train-the-Trainer certified) has been one of the foremost promoters of Shiny, R Markdown, and the Tidyverse in the drug development process, documenting and explaining each in detail. He has experience at a number of technology and consulting corporations working in data science teams and delivering innovative data products. Phil has over 15 years’ experience implementing analytical programs, specializing in interactive web application initiatives and reporting needs for life science companies.


    Prafulla Girase

    Prafulla Girase has 20+ years of experience in Biotech industry including experience in statistical programming and data standards space. He has worked as an electronic submission (eSUB) lead or co-lead on five NDA/BLA clinical data submission packages that are currently approved therapies in the market. Prafulla has experience attending meetings with regulatory agencies (FDA/PMDA) regarding data standards including attendance of face-to-face data format consultation meeting with PMDA. He currently works as a Director, Data Standards and Governance at Alexion AstraZeneca Rare Disease where he is responsible for leading data standards and governance within Statistical Programming.


    Troy Hughes

    Troy Martin Hughes has been a SAS practitioner for more than 20 years, has managed SAS projects in support of federal, state, and local government initiatives, and is a SAS Certified Advanced Programmer, SAS Certified Base Programmer, SAS Certified Clinical Trials Programmer, and SAS Professional V8. Since 2013, he has given more than 100 presentations, trainings, and hands-on workshops at SAS conferences, including at SAS Global Forum, SAS Analytics Experience, WUSS, SCSUG, SESUG, MWSUG, PharmaSUG, BASAS, and BASUG. He has authored two groundbreaking books that model software design and development best practices:


    David Izard

    Dave Izard frequently finds himself at the intersection of clinical data standards, regulatory expectations and sponsor organization needs and desires. A pharmaceutical professional since 1997, he currently serves as Programming Director at GlaxoSmithKline, supporting Infectious Disease clinical asset development and GSK’s efforts to expand their regulatory submission capabilities. Earlier opportunities include serving as Senior Director of Clinical Data Standards at Chiltern (Covance), Clinical Data Consulting Lead at Accenture, Head of Octagon Research Solutions' SDTM practice, and a variety of Clinical Programming leadership roles at both GSK and Shire.


    Isaiah Lankham

    Isaiah Lankham is an Advanced SAS Certified Programmer and a polyglot data analyst for the University of California's systemwide office in Oakland, CA, specializing in data management/warehousing using Salesforce and data analysis/visualization using Tableau, SAS, and Python. Initially trained as a mathematician and educator, Isaiah is also an adjunct faculty member for the Statistics Department at California State University, East Bay, and enjoys regularly teaching graduate SAS programming courses.


    Kevin Lee

    Kevin Lee is a Data Scientist, statistician, Machine Learning working group lead, corporate/university trainer and evangelist in new technology. Kevin supports the pharmaceutical industry as AVP of AI/Machine Learning Consultant at Genpact. Among all the therapeutic areas, Kevin always loves oncology studies, and he is an active supporter of oncology-specific standards such as CDISC tumor datasets, controlled terminology and response criteria on each study type. Kevin wants to innovate the pharmaceutical industry with AI/Machine Learning technology, and he currently leads the PHUSE AI/Machine Learning Working Group. He also teaches Machine Learning and Python programming in university and corporations. Kevin has presented about 100 papers at various conferences including many oncology-related and Machine Learning based papers. Kevin earned an M.S. in Applied Statistics at Villanova University following a B.S. from University of Pennsylvania. Kevin is a life-time learner who loves to learn and share.


    Matthew Slaughter

    Matthew Slaughter, MSBA is an Advanced SAS Certified Programmer and a Data Scientist at the Kaiser Permanente Center for Health Research in Portland, Oregon. With a focus on clinical prediction modeling, Matthew provides data management, programming, and analytical support to research projects in various topic areas.


    Mike Stackhouse

    Mike Stackhouse, Chief Innovation Officer of Atorus Research, is at the cutting edge of data technology within the pharmaceutical industry. He has extensive CDISC experience, working with both Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) standards, and serving as a subject matter expert for Define.xml. He holds a bachelor’s degree from Arcadia University, where he studied business administration, economics, and statistics. He is a 2020 UC Berkeley School of Information Master of Information and Data Science (MIDS) program graduate, where he worked on projects involving computer vision, natural language processing, cluster computing, and deep learning. Currently, Mike serves as the co-lead of the PHUSE working group Data Visualization and Open-source Technology. Mike and his team at Atorus have developed several open-source R packages, including the Atorus packages Tplyr and pharmaRTF.


    Richann Watson

    Richann Watson is an independent statistical programmer and CDISC consultant based in Ohio. She has been using SAS since 1996 with most of her experience being in the life sciences industry. She specializes in analyzing clinical trial data and implementing CDISC standards. Additionally, she is a member of the CDISC ADaM team and various sub-teams. Richann loves to code and is an active participant and leader in the SAS User Group community. She has presented numerous papers, posters, and training seminars at SAS Global Forum, PharmaSUG, and various regional and local SAS user group meetings. Richann holds a bachelor’s degree in mathematics and computer science from Northern Kentucky University and master’s degree in statistics from Miami University.