PharmaSUG Hybrid Single Day Event
October 20-21, 2022
Exploring the Next Phase of Data Analytics

Our 2022 SDE, the first hybrid conference, was a blast! Thank you to our sponsors for their financial support, our presenters for their insightful talks, our volunteers for helping out and most of all, our attendees for supporting this hybrid event. All registrants have been provided with secured access to the live recording of the presentations. If you have not received the email with instructions for access, please contact This email address is being protected from spambots. You need JavaScript enabled to view it.. Links to all of the slide presentations are provided below. You can also view our Photo Gallery!

Sponsored by:

Friday, October 21, 2022 Single-Day Event Presentations

Presentation (click for abstract) Presenter(s) (click for bio) Slides
Federated Learning and Virtual Data Lake Sanjay S. Jaiswal, Accenture Slides (PDF, 1.8 MB)
ADaM Automation Roadblocks Gustav Bernard, IQVIA Slides (PDF, 2.1 MB)
Using Git with Your SAS Projects Chris Hemedinger, SAS Slides (PDF, 1.7 MB)
Technical Rejection Criteria for Study Data (TRC) and Beyond Lina Cong, FDA/CDER Slides (PDF, 748 KB)
Clinical Tables with the Latest in Tplyr Michael Stackhouse, Atorus Research Slides (PDF, 579 KB)
A Quick Look at Fuzzy Matching Programming Techniques Using SAS Software Stephen Sloan, Accenture Slides (PDF, 1.2 MB)
CDISC SDTM IG v3.4: Subject Visits Ajay Gupta, Daiichi Sankyo Slides (PDF, 1.1 MB)
SAS and Open Source Working Together Jim Box, SAS Slides (PDF, 1.6 MB)
An Expeditious Approach for Handling Pinnacle 21 Messages Malini Narreddy, Sanofi Slides (PDF, 1.2 MB)
CDISC Update: Enhancing Metadata, Documenting Relationships Diane Wold, CDISC Slides (PDF, 547 KB)
CDISC Analysis Results Standard Jeff Abolafia, Pinnacle 21 Slides (PDF, 1.9 MB)

SDE Committee

SDE Committee (L to R): Matt Becker, Pradeep Bangalore, Margaret Hung, Pallavi Sadhab

Presentation and Seminar Descriptions

ADaM Automation Roadblocks
Gustav Bernard, IQVIA


One of the main roadblocks for ADaM Automation is the ADaM standards themselves which gives teams a lot of flexibility and multiple ways to produce the same results while still applying the ADaM standard correctly.

ADSL can be very study-specific but about 65% can be automated. Occurrence datasets such as ADAE, ADMH and ADCM are a lot more standard from study to study and > 80% can be automated. BDS datasets can be a bit more study-specific but breaking down the contents of BDS dataset by complexity helps identify which datasets can be automated. For example, a BDS dataset that is only retaining parameters from SDTM to ADaM can have > 80% of content automated.
Back to top


CDISC SDTM IG v3.4: Subject Visits
Ajay Gupta, Daiichi Sankyo


The Study Data Tabulation Model Implementation Guide for Human Clinical Trials (SDTMIG) Version 3.4 has been prepared by the Submissions Data Standards (SDS) team of the Clinical Data Interchange Standards Consortium (CDISC). Like its predecessors, v3.4 is intended to guide the organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority. Version 3.4 supersedes all prior versions of the SDTMIG. In this presentation, I will do a quick walk-through on the updates within the SDTM IG v3.4 from his predecessor. Later, I will go over the updated SUBJECT VISITS (SV) with examples e.g., new proposed mapping to include missed visits, how to use additional variables in SV.
Back to top


Federated Learning and Virtual Data Lake
Sanjay S. Jaiswal, Accenture


Federated learning is a relatively new approach that leverages edge computing & data analytics, enabling collaboration across platforms, technologies, data standards, and territories. A virtual data lake provides data access without the requirement to physically share or transfer data. It is platform and cloud agnostic, designed as a plug-in component for existing infrastructures. The combination of virtual data lakes and federated learning allow in-situ access and data analytics. This approach enables Life Sciences and Healthcare organizations to offer personalized insights & services by providing access to holistic individual data, at scale and helps identify high-value applications in real world. Innovation is supported by this approach by developing new AI-based modules/services by having access to a wider variety of data and developing advanced solutions while preserving data privacy. Finally, this approach enhances competitiveness by improving current AI algorithms which now have access to a larger volume of (training) data and enhancing services based on improved insights. Business, functional and technical details along with actual client case studies in early R&D, clinical development, regulatory approval, drug launch, pharmacovigilance, disease prevention, diagnosis and treatment and long-term disease management will be discussed proving wide efficacy of this data analytics approach.  
Back to top


A Quick Look at Fuzzy Matching Programming Techniques Using SAS Software
Stephen Sloan, Accenture


Data comes in all forms, shapes, sizes and complexities. Stored in files and datasets, SAS users across industries recognize that data can be, and often is, problematic and plagued with a variety of issues. Data files can be joined without problem when each file contains identifiers, or “keys”, with unique values. However, many files do not have unique identifiers and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates datasets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or “messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.
Back to top


Clinical Tables with the Latest in Tplyr
Michael Stackhouse, Atorus Research


A lot of the work that goes into creating clinical safety tables is redundant. At its core, many summaries can be broken down into the basics of creating descriptive statistics or counting events and looking at the proportion against some denominator. Tplyr is an R package created to make this process simple, from summarizing results to formatting the data to be presentation ready for output. This presentation will walk through what Tplyr is, what is does, and how it can be used in an organizations clinical reporting process. A special focus will be given to the latest features in Tplyr, looking at the newest tools that have been added and how Tplyr is built to work with Shiny applications, allowing you to not just look at summary results but dive deeper into the source.
Back to top


CDISC Update: Enhancing Metadata, Documenting Relationships
Diane Wold, CDISC


SDTM metadata was enhanced in 2.0. and further enhancements are planned for the next version of the SDTM. Those metadata changes will be carried into the next version of the SDTMIG. That version will be SDTMIG v4.0, the first major version since SDTMIG 1.0 in 2004.
Back to top


CDISC Analysis Results Standard
Jeff Abolafia, Pinnacle 21


The CDISC Analysis Results Standard (ARS) Team is charged with enhancing Analysis Results Metadata, automating the generation of analysis results, and providing better traceability and understandability of analysis results and reporting. In this presentation we report on the progress of the CDISC ARS Team. We will provide an update of the proposed data model for storing most analysis results generated from the ADaM ADSL, BDS, and OCCDS data structures and the enhanced metadata model required to generate and understand these results.
Back to top


An Expeditious Approach for Handling Pinnacle 21 Messages
Malini Narreddy, Sanofi


We, as a sponsor understand that Pinnacle 21 report sometimes might be a mountain to climb during study conduct. As we all know, Pinnacle 21 validation checks play a vital role in confirming regulatory submission data to ensure its compliance with the CDISC standards, FDA, etc. Through these checks, we could potentially identify data issues and compliance issues, which could be evaluated and communicated to the respective departments for a resolution. However, sometimes the checks triggered might not be easily comprehensible due to the lack of vividness in the Pinnacle 21 rules/explanations or the programmer's inexperience for the specific scenario. As a result, excess time and effort are spent to generate a solution, fostering an inefficient use of budget and resources. To alleviate this tedious process, our company has implemented "Pinnacle 21 Message Action Plan", which serves as an informative guide for the programmers on how to tackle each validation check. Our company's experienced programmers have collaborated in generating this detailed guide, where each Pinnacle 21 message has been provided with additional information, which will be deeply explored with examples in the presentation. In addition to the action plan, we have created a robust tool to have these columns appended to a Pinnacle 21 report when generated during the conduct of the study. A walk-through of a report and described additions will be showcased in the presentation.
Back to top


Using Git with Your SAS Projects
Chris Hemedinger, SAS


Few technologies have done more to advance code collaboration and automation than Git. GitHub's popularity has drawn the attention of all types of programmers including SAS programmers. Many SAS products have direct integration with Git – extending to GitHub. In this session we will cover:
  • What is Git and why do I care?
  • Using Git with SAS Enterprise Guide
  • Using Git with SAS Studio
  • Git functions in Base SAS
  • Where to learn more

Back to top


SAS and Open Source Working Together
Jim Box, SAS


Open source languages like R and Python are immensely popular and quite useful. Did you know you could write code blocks of Python and R inside of SAS programs? You can also invoke SAS analytics from open source programs. In this presentation, we will summarize all the ways SAS and open source can be used together to solve problems.
Back to top


Technical Rejection Criteria for Study Data (TRC) and Beyond
Lina Cong, FDA/CDER


Study data is the most important part of the drug application submission. Submitting standardized study data can accelerate the drug review process and make the review more efficient. In order to comply with FDA Study Data Guidance and enforce the CDISC data submission standards, the FDA developed Technical Rejection Criteria for Study Data (TRC) to help industry understand how FDA is using eCTD validations to check conformance. On Sept 15, 2021 eCTD validations for study data in TRC took effect. If a submission fails eCTD validations for study data in TRC, the submissions will be rejected. This presentation contains the TRC background, SEND requirements for TRC, the update of the rejection trend for eCTD validations for study data, TRC rejections & top error reasons. Beyond TRC, Frequently Asked Questions related study data submissions and study data standards from eData mailbox will be included.
Back to top


Introduction to R for the Statistical Programmer
Michael Stackhouse and Jessica Higgins, Atorus Research


In this workshop, statistical programmers will be introduced to the R programming language and the tidyverse, using familiar clinical examples. Attendees will leave with a basic understanding of what R is, what the tidyverse is and why it’s important, and what the open-source landscape has to offer us in the world of clinical statistical programming. Hands-on programming examples will be offered to give attendees some basic knowledge of the tools available in R to support common clinical workflows, such as SDTM, ADaM and clinical TFLs. If you’ve never worked in R before, but want to see how it can be used in your day-to-day tasks, come join us and see what this powerful open-source language has to offer!

Benefits of taking this class
If you’re scared of R or learning a new language, coming out of this class it shouldn’t be scary anymore. We’ll draw parallels to help make R feel more familiar and help you understand how this all fits in with your every day work.

Level and pre-requisites If you’re a SAS programmer you’ve never touched R or RStudio, this workshop is for you. This is a ground floor, first contact with R level workshop.
Back to top



Understanding Electronic Submission Components for Regulatory Submission of Clinical Study Data
Prafulla Girase, Alexion AstraZeneca


A regulatory submission of clinical study data also needs to be accompanied by various other electronic submission (eSUB) components such as Define-XML, annotated CRF, study data reviewer’s guide, analysis data reviewer’s guide etc. This seminar will take a deep dive into each of these components and educate attendees about key contents, best practices, and Global considerations (i.e., FDA & PMDA) during preparation of these components. For example, attendees will learn characteristics of a submission ready annotated CRF (i.e., annotations, validated bookmarks/links, document properties etc.). It will also go over key considerations related to preparation of a whole eSUB package for a submission such as folder structure considerations, final package checklist, regulatory hand-off etc. The author also plans to share his understanding of upcoming EMA's raw data pilot based on the latest publicly available guidance at the time of this seminar.
Back to top





Presenter Biographies

Jeff Abolafia

Jeff Abolafia is currently Director of Product Innovation at Pinnacle 21. Previously Jeff held the position of Chief Strategist of Data Standards and was a member of the faculty in the Department of Biostatistics at the University of North Carolina. Jeff has been involved with public health research and data standards for over thirty years. Jeff is a frequent contributor and presenter at PhUSE, SAS Global Forum, PharmaSUG, and CDISC conferences. Jeff co-founded the RTP CDISC User's Group and is a member of the CDISC ADaM Team and several PHUSE Real World Evidence working groups. His areas of interest include real world evidence, mobile health, data standards, regulatory submissions, and bioinformatics.


Gustav Bernard

Gustav Bernard is an Associate Director at IQVIA who has been with the company for 17 years. His work focuses on the implementation of CDISC Standards (SDTM, ADaM and Define-XML) within the IQVIA Global Biostatistics department. He is currently Leading the ADaM Innovation Initiatives. He has also created the Define-XML 2.0 automation process within IQVIA and the ADaM Designer application. Gustav earned a Bachelor of Business in computer science in 2004 from the University of the Orange Free State in South Africa.


Jim Box

Jim Box is a Principal Data Scientist at the SAS Institute, where he has been supporting customers implementation of machine learning for the past seven years. Prior to that he spent 18 years in Clinical Research Organizations primarily as a statistician and programming director, He has Masters Degrees in Statistics and in Analytics.


Lina Cong

Lina Cong is a Senior Heath informatics officer, in FDA CDER, Office of Business Informatics, eData team. She has both a medical and computer background, with more than ten years' experience in study data standards, study data submission, and relevant guidance and policy in federal agencies, and ten years' experience in clinical trial data analysis and clinical data management in the pharmaceutical industry


Prafulla Girase

Prafulla Girase has 20+ years of experience in Biotech industry including experience in statistical programming and data standards space. He has worked as an electronic submission (eSUB) lead or co-lead on five NDA/BLA clinical data submission packages that are currently approved therapies in the market. Prafulla has experience attending meetings with regulatory agencies (FDA/PMDA) regarding data standards including attendance of face-to-face data format consultation meeting with PMDA. He currently works as a Director, Data Standards and Governance at Alexion AstraZeneca Rare Disease where he is responsible for leading data standards and governance within Statistical Programming. He is a member of CDISC working groups and was also a co-lead of PhUSE's Define-XML Completion Guideline working group. Most recently, he was selected as a member of EMA's focus group for raw data pilot.


Ajay Gupta

Ajay Gupta is currently at Daiichi Sankyo. He received his master’s degree in Biomedical Engineering from Louisiana Tech University in 2006. Since 2010, he has been a regular presenter at SAS conferences, especially PharmaSUG. He has also been a member of the PharmaSUG conference committee for the past two years, and is interested in topics related to CDISC, Spotfire, Visual Basic for Applications, SAS Grid and SAS Application development.


Chris Hemedinger

Chris Hemedinger is the Director of SAS User Engagement. His talented team looks after SAS online communities, SAS user groups, developer experience and GitHub, tech newsletters, expert webinars and tutorials. Chris is a recovering software developer who helped build popular SAS products such as SAS Enterprise Guide. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies. You can follow Chris on Twitter as @cjdinger.


Jessica Higgins

Jessica Higgins is an experienced R user with over 15 years of experience in Statistics, Bioinformatics, Clinical, and Pharmacokinetic Programming experience. She has a PhD in biology from the University of North Carolina at Chapel Hill where she was an evolutionary biologist. She has successfully managed Programming and Biostatistical teams and worked to develop and create R programming processes for use in clinical environments. She has extensive experience in the validation of development and production R environments for use in the clinical world.


Sanjay S. Jaiswal

Sanjay S. Jaiswal is a Senior Executive in Accenture’s Applied Intelligence practice focusing on Clinical, Research, PV Analytics & AI/ML based insights, Safety Signal Detection and RWD/ RWE driven data analytics. He is the global lead for R&D Applied Intelligence and helps clients drive increased impact with data and AI / ML / analytics. He led teams across major clients to accelerate various parts of Clinical Development ranging from more targeted site identification, better understanding of patients to target, and many other parts of development by unlocking the power of traditional and non-traditional clinical data. He has deep expertise in Cloud-based Solutions, Life Sciences R&D Digital Transformation, Data and Technology Architecture & Analytics. Sanjay has a Medical Doctor (MD) / Doctorate of Philosophy (PhD) degree from Northwestern University.


Malini Narreddy

Malini Narreddy is a Senior Data Standards Leader at Sanofi with over 12 years of experience in clinical industry. She has strong expertise in applying CDISC CDASH and SDTM standards and worked on numerous therapeutic areas. She is currently leading multiple SDTM projects at Sanofi and serving as a subject matter expert for SDTM. She received her master's degree in Public Administration from Sri Venkateswara University, Tirupati, India.


Stephen Sloan

Stephen Sloan has worked at Accenture in the Services, Consulting, and Digital groups and is currently a senior manager in the SAS Analytics area. He has worked in a variety of functional areas including Project Management, Data Management, and Statistical Analysis. Stephen has had the good fortune to have worked with many talented people at SAS Institute. Stephen has presented at over 20 SAS conferences and been published in professional journals. Stephen has a B.A. cum laude with Honor in Mathematics from Brandeis University, M.S. degrees in Mathematics and Computer Science from Northern Illinois University, an MBA from Stern Business School at New York University. Stephen graduated 1st in his class with a graduate certificate in Financial Analytics from NYU's Stern Business School.


Michael Stackhouse

Michael Stackhouse is at the cutting edge of data technology within the pharmaceutical industry. He has extensive CDISC experience, working with both Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) standards, and serving as a subject matter expert for Define.xml. He holds a bachelor’s degree from Arcadia University, where he studied business administration, economics, and statistics. He is a 2020 UC Berkeley School of Information Master of Information and Data Science (MIDS) program graduate, where he worked on projects involving computer vision, natural language processing, cluster computing, and deep learning. Currently, Mike serves as the co-lead of the PHUSE working group Data Visualization and Open-source Technology. Mike and his team at Atorus have developed several open-source R packages, including the Atorus packages Tplyr and pharmaRTF.


Diane Wold

Diane Wold received her Ph.D. in Statistics from the University of North Carolina at Chapel Hill. She worked for Burroughs Wellcome/Glaxo Wellcome/Glaxo Smith Kline in a variety of roles for over 30 years. At the Glaxo Smith Kline merger, she joined the data standards group, and in 2002 she joined the CDISC SDS team. She was also involved in other CDISC teams, including the Protocol Representation Group and SHARE. In 2012 she became involved in the CFAST initiative to develop therapeutic area standards. In 2015 she joined CDISC as an employee.