PharmaSUG Single-Day Event
Thursday, September 28, 2023
Standards, Automation, Technology, AI, and Beyond

PharmaSUG's 7th NC SDE was held on September 28, 2023 at the NC Biotechnology Center at RTP. We had over 100 attendees for the in-person event and virtual seminars. A big "Thank you" to all our attendees and sponsors for supporting this 1-day event. Our next NC SDE is scheduled for the Fall of 2025. Please feel free to This email address is being protected from spambots. You need JavaScript enabled to view it. your comments and ideas.

Check out the picture gallery from the SDE!


Sponsored by:


Thursday, September 28, 2023 Single-Day Event Presentations

Presentation Title (click for description) Speaker(s) Slides
What is Machine Learning, Anyway? Jim Box, SAS Slides (PDF, 3.6 MB)
BIMO: Ups, Downs, and FDA Crowns - Lessons Learned from BIMO Submission Adventures Sushant Thakare, AstraZeneca Slides (PDF, 1.2 MB)
A Paradigm Shift in Clinical Data Preparation: The Power of Graphical Data Flow Vineet Jain, Nimble Clinical Research Slides (PDF, 619 KB)
Handling Anti-Drug Antibody (ADA) Data for Efficient Analysis Sabarinath Sundaram, Seagen Inc. Slides (PDF, 1.5 MB)
AI-Automation for Post-text to In-text Table Conversion Mark Pittman, Symbiance Slides (PDF, 3.9 MB)
Multiple Subject Id/ Rescreened Subjects Challenges in ADaM Programming? Pradeep Bangalore, Cesta, Inc. Slides (PDF, 830 KB)
Standardized MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options Richann Watson, DataRich Consulting Paper
Enforcing Standards in an Organization: A Practical 6 Step-Approach Dani Guerendo Christian, STATProg Inc.
Priscilla Gathoni, Wakanyi Enterprises Inc.
Slides (PDF, 1.6 MB)
Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot’s Algorithm: Step-by-Step Programming in R Ajay Gupta, Daiichi Sankyo Inc.
Natalie Dennis, Daiichi Sankyo Inc.
Slides (PDF, 1.1 MB)
Submission Standards for Real World Data: Gaps, Limitations and Recommendations Jeff Abolafia, Pinnacle 21 Slides (PDF, 1.9 MB)
Implementing SDTM More Easily and More Broadly Diane Wold, CDISC Slides (PDF, 518 KB)
Experiences of CDER Statistical Analysts with NDA/BLA Reviews: Some Helpful Tips for Sponsors Liping Sun, FDA CDER Slides (PDF, 610 KB)

Pallavi Sadhab and Pradeep Bangalore

Matt Becker and Priscilla Gathoni

Panel discussion participants


Training Class and Presentation Descriptions


CDISC ADaM – Principles, Rules, and Complex Examples
Richann Watson, DataRich Consulting
Wednesday, September 27, 2023, 8:30 AM - 12:30 PM


This course will provide a high-level overview of some of the basic ADaM concepts; however, it is assumed that the attendee will be familiar with the different ADaM structures and principles. The course will delve into what is meant by traceability and analysis ready as well as look at some rules and best practices. However, the primary focus is to illustrate the implementation of some of the more difficult or less common concepts found in both the ADaM Implementation Guide (ADaMIG) and the ADaM Structure for Occurrence Data (OCCDS) documents. The course includes an illustration of the use of criterion variables (CRITy and MCRITy) and record-level and parameter-level population flags (-RFL and -PFL), as well as a demonstration of how to set up time-to-event and questionnaire/rating/scales analysis data sets. In addition, it will go into depth about AEs of special interest and the use of Standard MedDRA Queries (SMQ) and provide an illustration of how the OCCDS can be used to handle the non-typical analysis for events data.

Reasons to Attend:
  1. You have an understanding of basic ADaM structures and principals, but those nuances have a tendency to trip you up or maybe you just need a refresher on standards and best practices.
  2. You have been asked to implement some less common or more difficult concepts, such as criterion variables (CRITy/MCRITy), record-/parameter-level population flags (-RFL/-PFL).
  3. You are tasked to create a data set that deals with adverse events of special interest (AESI) or a non-typical analysis of events.
  4. You need to set up time-to-event, questionnaire, rating and/or scales analysis data set, and would like to use the most effective techniques.
Back to top


Python Programming Masterclass with Comparison to SAS®
Kirk Paul Lafler
Wednesday, September 27, 2023, 1:30 PM - 5:30 PM ET


As a general-purpose programming language used by millions of users and developers around the world, Python offers clear syntax, scalability, versatility, and powerful libraries that add tremendous value for anyone to incorporate into their skill sets. Python’s use cases include programming, analytics, data science, web development, web scraping, text processing, image recognition, game development, artificial intelligence, machine learning, and Internet of Things. What seems to be propelling Python’s dominance is that it is relatively easy to learn, consists of a large and growing user community, and is freely available as open source. This course is designed for beginners looking to learn Python and/or SAS programming, as well as those seeking to enhance their skill set and career opportunities by learning Python and/or SAS software programming techniques.

Course topics include:
  • What is Python used for.
  • How to develop powerful Python applications using Spyder – an Integrated Development Environment (IDE).
  • Essential Python keywords, operators, statements, and expressions to fully understand what you’re coding and why.
  • Understand the building blocks of basic programs, how to use Python as a programming language, and how to debug programs for accuracy and correctness.
  • Introduction to program control flow using statements, conditions, operators, and loops (if, elif, else, in, not in, for, while, step through, break, continue, search).
  • Call and define functions, specify parameters and arguments, and debug with functions.
  • Open, read, and write different data file types (text, CSV, XLSX, SAS7BDAT, JSON, binary, image), process data files, search a file, print data to a text file, and debug a data file.
  • Introduction to strings, determine the length of strings, traverse strings, compare strings, parse strings, slice strings, and format strings.
  • Introduction to lists and tuples, iterate over a list, create a list, add items to a list, sort a list, delete items from a list, and nested lists.
  • Introduction to Dictionaries and Sets, add items to a dictionary and set, change values in a dictionary and set, remove items from a dictionary and set, and construct subsets.
  • Introduction to dates, times, and time zones; and Python functions.
  • Introduction to Object-oriented Programming, classes as types, object lifecycle, instances, and inheritance.
  • Python program code to access and open data files, conduct exploratory data analysis, perform data cleaning and data transformation, apply logic with comparison and logical operators, use functions as powerful building blocks, identify the most frequent value in a list, produce descriptive statistics, create subsets, sort data, append / concatenate data, transpose data structures, merge / join data, create output files, and produce reports.
Back to top


Submission Standards for Real World Data: Gaps, Limitations and Recommendations
Jeff Abolafia, Pinnacle 21


There has been a rapid increase in the use of RWD to support marketing applications. In 2020 over 75% of NDAs and BLAs submitted to FDA included a RWD study to support safety and/or efficacy claims. Use of such data pose several challenges in terms of bias, traceability, transparency, and clinical data compliance to current submission standards. While submission standards for data generated from controlled clinical trials are mature and well understood, submission standards for RWD are still in development and not well established. This has the potential to make both regulatory submission and review more difficult. As a result, sponsors must develop an appropriate strategy for submitting RWD.

This presentation presents a high-level summary of the various types of RWD, an environmental scan of the current and possible standards for submitting RWD, and the current standards (CDISC) required for submitting RWD in a marketing application to FDA. Next, given that CDISC standards were designed to represent clinical trial data and not optimized for RWD, we examine the gaps, issues, and limitations with using CDISC standards to represent and submit RWD. The bulk of the presentation will focus on providing recommendations and solutions, based on FDA guidance and recent marketing applications containing RWD, for submitting studies containing RWD. These recommendations are intended to help sponsors submit RWD that are more compliant with current submission standards and enhance the traceability and documentation required for regulatory review.
Back to top


Multiple Subject Id/ Rescreened Subjects Challenges in ADaM Programming?
Pradeep Bangalore, Cesta, Inc.


Enrollment of patients in clinical trials for rare diseases is quite a challenge. This means that there are often only a small number of patients with a particular rare disease, making it difficult to find enough patients to enroll in a clinical trial. The symptoms and progression of rare diseases can vary widely from patient to patient, making it challenging to define eligibility criteria for clinical trials. As a result of these challenges, the development of new treatments for these diseases slows and can leave patients without access to the latest therapies. To overcome this challenge, sometimes a patient is rolled over from another similar study if the eligibility criteria are met, or in some cases, the same patient is rescreened after failing for the first time. Per the norm, screen failed subjects’ participation should be collected in the database. In subsequent attempts, the same patient might pass all the criteria and enroll in the study. SDTM programming team added these repeated/rolled over patients in Demog Qualifiers (DQ) to be compliant with SDTMIG. However, it is difficult to bring “this multiple participation of the same subject” to ADSL and summarize them in tables. This presentation provides guidance to handle such scenarios.
Back to top


What is Machine Learning, Anyway?
Jim Box, SAS


Heard lots of talk about Machine Learning and Artificial Intelligence, but not really sure what it all means and how it is different from the statistics you learned once? In this presentation, we'll look at what machine learning is, look at the basics of the approach and give an overview of some of the more popular algorithms and when they might be used.
Back to top


Enforcing Standards in an Organization: A Practical 6 Step-Approach
Dany Guerendo Christian, STATProg Inc.
Priscilla Gathoni, Wakanyi Enterprises Inc.


Is your organization struggling to enforce standards? Is complacency and siloed programming within? Are functional units haunting your organization? This paper will explore a 6-step practical approach for organizations to assess standards using CDISC and FDA guidance, show the importance of standards, the possible repercussions to institute for lack standards adherence, show the importance of a gatekeeper for capturing standards adherence metrics, and finally present a generic macro adherence utility for checking the usage of standards in a study folder. A clear communication for the location and type of standards available within an organization will help eliminate excuses for not using standards. Additionally, an explicit message on the value that standardization brings in increasing efficiency, reducing the need for mundane tasks, and efficient resource utilization is explored. Further, creative methods for encouraging the use of standards are mentioned in the paper. Who is the best suited person to be the gatekeeper in your organization? We investigate the role of a gatekeeper, which is crucial in bridging the gap between the data acquisition stage and the practical implementation of standards.

Consequently, several utilities whose main posit is to check the adherence of standards are available. We present a generic utility that can be adopted with a few adjustments allowable to complement your organization's platform. Furthermore, we propose that organizations assess periodically the effectiveness of the standards, tools, and utilities in use. In conclusion, we recommend that organizations utilize this 6-step approach and build on it to suit the organizational standards enforcement needs.
Back to top


Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot’s Algorithm: Step-by-Step Programming in R
Ajay Gupta, Daiichi Sankyo Inc.
Natalie Dennis, Daiichi Sankyo Inc.


Secondary analysis may require the use of reconstructed patient-level data from published Kaplan-Meier (KM) curves to support a number of different objectives, including indirect treatment comparisons within the context of economic evaluations. Guyot (2012) developed an algorithm that reconstructs individual patient data (IPD) for time-to-event endpoints using published KM curves. This presentation will provide step-by-step instructions and a use case for executing the Guyot (2012) algorithm to reconstruct IPD from published KM curves in R.
Back to top


A Paradigm Shift in Clinical Data Preparation: The Power of Graphical Data Flow
Vineet Jain, Nimble Clinical Research


This presentation explores the paradigm shift from manual statistical programming to a graphical data flow approach in data wrangling & generating SDTM & ADaM datasets, while adhering to CDISC standards. Harnessing visual interfaces, data functions are intuitively connected, streamlining the process and improving efficiency. This innovative method fosters better transparency, traceability, and accuracy, thereby enabling real-time analysis & significantly reducing time spent on dataset generation in clinical trials. Our positive experience indicates the benefits and potential of this graphical approach.
Back to top


AI-Automation for Post-text to In-text Table Conversion
Mark Pittman, Symbiance


The flurry of interest around AI can bring both excitement and fear to the future of Medical Writing. As in other industries, the reality of automation is usually a mechanism that helps a human accomplish their job faster. We are here to talk about how AI can help bridge the gap between Biostats and Medical Writing.

We'll discuss the gap between the hope (& fear) of AI and the realities that Medical Writing faces, as well as the challenges of open-source resources. With specific AI capabilities, it is possible to take multiple formats for post-text table input (.doc, .pdf, .rtf, etc.) and provide an option to the user to select font style or custom styles, edit titles and footnotes of post text tables, merge multiple relevant post-text tables into a single in-text table, and choose footnotes from any one of the merged tables or from all of the tables used for merging. Both Medical Writing and Biostatistics will achieve time savings and budget reductions using AI-Automation.
Back to top


Handling Anti-Drug Antibody (ADA) Data for Efficient Analysis
Sabarinath Sundaram, Seagen Inc.


Large molecules have revolutionized the pharmaceutical industry. The complex nature of these therapeutics can be mistaken by the human body as foreign substances and their interactions with various endogenous proteins in the human body may induce an immunogenicity effect to produce anti-drug antibodies (ADAs). Based on their interaction with antigen binding sites, ADAs are classified into as non-neutralizing antibodies (non-Nabs) and neutralizing antibodies (Nabs). These could impair the functionality of the drug by interfering with PK performance, decrease drug efficacy, and trigger serious hypersensitivity reactions. Monitoring ADA is key to evaluating safety, post-marketing surveillance, and defining risk mitigation strategies. High-quality programming support with solid understanding of ADA data is critical for the programmers to map it to relevant CDISC standard tests that serves as a base to create efficient and impactful ADA analysis. This paper will illustrate the mapping of unique raw data such as ADA Screening, ADA Confirmation, Nabs data, titer results from various sources into the Immunogenicity Specimen Assessments (IS) SDTM domain, deriving relevant ADA variables at the ADaM level, and share highlights of standard ADA reporting. Moreover, few unique scenarios like how to handle baseline positive and post baseline positive results in relation to their titer values in summary report with oncology example data will be demonstrated. Additionally, this paper briefly touches upon the foundational mechanics of ADA, its impact in clinical trials, and relevant regulatory guidelines.
Back to top


Experiences of CDER Statistical Analysts with NDA/BLA Reviews: Some Helpful Tips for Sponsors
Liping Sun, FDA CDER


Overview of Division of Analytics and Informatics (DAI); challenges to review the submissions; examples from our real work; summary
Back to top


BIMO: Ups, Downs, and FDA Crowns - Lessons Learned from BIMO Submission Adventures
Sushant Thakare, AstraZeneca


Hold onto your SAS expert coats and buckle up for a wild ride! In this presentation, we dive headfirst into the thrilling world of the Bioresearch Monitoring (BIMO) package submission to the FDA. Brace yourself for a rollercoaster of emotions as we share the lessons we learned, the pitfalls we encountered, and the triumphs that made it all worthwhile. Starting with a lightning-fast overview, we introduce BIMO like an enigmatic superhero: a formidable force that monitors and safeguards the integrity of clinical research. But what happens when mere mortals, like us, attempt to conquer the bureaucratic maze of FDA regulations? Chaos, confusion, and a pinch of panic, of course!

Our experiences will both entertain and enlighten, ensuring that you're better prepared for the challenges ahead. So, fasten your seatbelts and prepare for a ride through the amusing, frustrating, and ultimately rewarding realm of BIMO!
Back to top


Standardized MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options
Richann Watson, DataRich Consulting


Ordinarily, Standardized MedDRA Queries (SMQs) aim to group specific MedDRA terms for a defined medical condition or area of interest at the Preferred Term (PT) level, which most would consider to be the basic use of SMQs. However, what if your study looks to implement the use of SMQs that goes beyond the basic use? Whether grouping using algorithmic searching, using weighted terms or not, or through the use of hierarchical relationships, this paper looks to cover advanced searches that will take you beyond the basics of working with SMQs. Gaining insight to this process will help you become more familiar in working with all types of SMQs and will put you in a position to become the ""goto"" person for helping others within your company.
Back to top


Implementing SDTM More Easily and More Broadly
Diane Wold, CDISC


This talk will cover long-awaited upcoming additions to the SDTMIG like representing multiple subject participations, a horizontal format for non-standard variables, a dataset for event adjudication, and changing variable metadata. A number of current projects are aimed at making implementation easier, these include executable conformance rules (CORE), Biomedical Concepts (COSMOS) and documents to help researchers in academia (CDISC Basic) and considerations for representing observational studies study data. Finally, it will discuss the Tobacco Implementation Guide, which has tackled in vitro data and product information as well as clinical study data. For programmers in the pharmaceutical industry, this talk will provide previews of what’s coming in SDTM in the traditional area of human clinical pharmaceutical trials and also review documents intended to help users represent observational and real world data. It will provide an overview of the Tobacco Implementation Guide, which includes approaches to kinds of data which they may encounter in the future.
Back to top



Presenter Biographies

Jeff Abolafia

Jeff Abolafia is currently Director of Product Innovation at Pinnacle 21. Previously Jeff held the position of Chief Strategist of Data Standards and was a member of the faculty in the Department of Biostatistics at the University of North Carolina. Jeff has been involved with public health research and data standards for over thirty years. Jeff co-founded the RTP CDISC User's Group and is a member of the CDISC ADaM and Analysis Results teams and several PHUSE Real World Evidence working groups. His areas of interest include real world evidence, mobile health, data standards, and regulatory submissions.


Pradeep Bangalore

Pradeep Bangalore is the Co-founder of Cesta Inc., which provides Software Solutions and IT Consultancy services to the Pharmaceutical, Biotech, and CRO industries. Pradeep’s passion for data analysis and his commitment to providing clients with the best possible service have made him a highly sought-after consultant. He is known for his ability to translate complex data into actionable insights that can help clients improve their business performance. He is also a highly effective communicator, and he is able to explain complex concepts in a clear and concise way. He is a frequent speaker at industry conferences, and he is always eager to share his knowledge and expertise with others.


Jim Box

Jim Box is a data scientist with the Life Sciences Industry Consultants at SAS. Prior to that he spent 20 years in the CRO industry primarily as a study statistician. He holds Masters Degrees in Statistics from Duke University and Analytics from North Carolina State University and is a frequent presenter at PharmaSUG and other industry conferences.


Dany Guerendo Christian

Dany Guerendo Christian, M.A, is an accomplished Data Analyst (20+ years) and Programmer. She holds a master’s in mathematics from Indiana University and a Graduate Certificate in Applied Statistics from Texas A&M. She is a certified project management professional, PMP®. She programs in SAS, R, and Python. She has extensive leadership experience in clinical data standards (CDISC) for regulatory submission.


Priscilla Gathoni

Priscilla Gathoni is an executive leadership coach, facilitator, author, inspirational and motivational speaker, educator, and voice artist. Dr. Gathoni has delivered inspirational and unforgettable talks at global conferences and community events. As a coach, Dr. Gathoni is a charismatic and energetic coach who uses a non-judgmental approach and proven coaching tools, techniques, and frameworks to enable clients to advance in their thinking, ideas, goals, and aspirations to unlock their true potential. As an educator, Dr. Gathoni is a data science professor at the University of Maryland Global Campus. Dr. Gathoni is the Academic Chair for PharmaSUG 2024.


Ajay Gupta

Ajay Gupta is an Associate Director at Daiichi Sankyo. He has around 17 years of experience as data standard lead, project lead, technical lead, and system developer in CRO/Pharmaceutical industry. He received his master's degree in Biomedical Engineering from Louisiana Tech University in 2006. Since 2010, he has been a regular presenter at SAS conferences. He is interested in topics related to CDISC, RWD/RWE, Spotfire, R, Python, Pinnacle21, Visual Basic for Applications, SAS Grid and SAS Application development.


Vineet Jain

Vineet Jain is a seasoned expert in the field of clinical research, with over 17 years of experience supporting statistical programming, statistics, and data management. He is the Founder and CEO of Nimble Clinical Research, a company that provides statistical services to the pharmaceutical, biotechnology, and medical device industries. Vineet's background in computer science, statistics, and machine learning gives him a unique perspective on the use of technology in clinical research. Vineet has established Nimble Clinical Research as a leading provider of clinical research solutions, leveraging cutting-edge technology and industry expertise to deliver high-quality results to clients.


Kirk Paul Lafler

Kirk Paul Lafler is a SAS, SQL and Python consultant, application developer, programmer, and educator; an adjunct professor at San Diego State University; an advisor and adjunct professor at the University of California San Diego Extension; and teaches SAS, SQL, Python, Excel and cloud-based courses, workshops, and webinars around the world. Kirk has been a SAS consultant, application developer, and programmer since 1979; an SQL user since 1985; a Python programmer since 2017, and an author of several books including, Exploratory Data Analysis (EDA) By Example (PB&J Press. 2023) and PROC SQL: Beyond the Basics Using SAS, Third Edition (SAS Press. 2019) along with numerous papers and articles on a variety of SAS, SQL, and Python topics. Kirk has served as an Invited speaker, educator, keynote, and section leader at SAS conferences; and is the recipient of 27 “Best” contributed paper, hands-on workshop (HOW), and poster awards.


Mark Pittman

For the past three decades, Mark Pittman has been focused on bringing efficient processes together with effective communication. Now, working with the cutting-edge AI-driven Medical Writing tool, ZYLiQ, Mark is able to share how AI is effectively impacting the completion of CSRs and getting FDA submissions across the finish line weeks ahead of schedule.


Liping Sun

Liping Sun is a Statistical Analyst at the FDA, working in the Division of Analytics and Informatics within the Office of Biostatistics in CDER (Center for Drug Evaluation and Research). She has both medicine and statistics background. She is interested in enhancing data quality and implementing data standards to improve review experience for statistical reviewers and analysts. She is involved in efforts to improve the implementation and utilization of data standards within the agency, as well as participating in outside agency collaborations. Before joining the FDA, she performed data analysis for academic epidemiology research at the NIH (National Institutes of Health).


Sabarinath Sundaram

Sabarinath Sundaram has over 10 years of statistical programming experience working in exploratory research studies to Phase III studies, CDISC standards, handling PK/PD/ADA data, and across multiple therapeutic areas. He has a Ph.D. degree in Life Sciences (Biochemistry). He is the Principal Statistical Programmer at Seagen, Inc. as well as PK/PD center of excellence lead.


Sushant Thakare

Sushant Thakare is an accomplished pharmaceutical professional with more than 16 years of experience in the industry. Currently serving as an Associate Director at AstraZeneca, he has made significant contributions in the vaccine and immune therapy unit. Notably, Sushant led the successful submission of a Biologics License Application (BLA) to the FDA, showcasing his expertise in regulatory compliance. He has actively participated in regulatory responses and served as a valuable member of the FDA Ad-comm AZ team. Sushant is responsible for overseeing the Statistical Programming team, ensuring the efficient execution of projects under his purview.


Richann Watson

Richann Watson is an independent statistical programmer and CDISC consultant based in Ohio. She has been using SAS since 1996 and specializes in analyzing clinical trial data and implementing CDISC standards. She is a member of the CDISC ADaM team and various sub-teams. She has presented numerous papers, posters, and training seminars at various conferences. Richann holds a bachelor’s degree in mathematics and computer science and master’s degree in statistics.


Diane Wold

Diane Wold received her Ph.D. in Statistics from the University of North Carolina at Chapel Hill. She worked for Burroughs Wellcome/Glaxo Wellcome/Glaxo Smith Kline in a variety of roles for over 30 years. At the Glaxo Smith Kline merger, she joined the data standards group, and in 2002 she joined the CDISC SDS team. She was also involved in other CDISC teams, including the Protocol Representation Group and SHARE. In 2012 she became involved in the CFAST initiative to develop therapeutic area standards. In 2015 she joined CDISC as an employee. Among other activities, she is currently working on SDTM variable definitions, SDTM variable roles, and the CDISC Knowledge Base.


Single-Day Event Co-Chairs


Pradeep Bangalore
Cesta, Inc.

Pallavi Sadhab
AstraZeneca

Conference Committee: Matt Becker (SAS), Margaret Hung (MLW Consulting), Eric Larson (IQVIA)
Social Media: Inka Leprince, Emily Hansel

Questions? This email address is being protected from spambots. You need JavaScript enabled to view it.!