PharmaSUG Single Day Event
Is Now a Virtual Meeting!
October 28-29, 2021
Analytic Evolution: Exploring the Next Phase of Drug Development & Submission

The virtual North Carolina Single-Day-Event (SDE) 2021 was a great success, with 128 attendees from 10 different countries for the October 29 event and 28 students for the previous day's seminar. Thank you to all of our attendees and sponsors for supporting this single-day event. All registrants have been provided with secured access to the live recording of the ten presentations. If you have not received the email with instructions for access, please contact This email address is being protected from spambots. You need JavaScript enabled to view it.. Links to all of the slide presentations are provided below.

Our North Carolina SDE 2022 is scheduled for October 21, with pre-event seminars on October 20. Hope to see you in-person next year at RTP, NC!

Sponsored by:
Catalyst Pinnacle21
Clinovo MaxisIT

Friday, October 29, 2021 Single-Day Event Presentations

Presentation (click for abstract)Presenter (click for bio)Slides
The 80/20 Rule: How AI-Driven Automation Can Improve Efficiency and Quality in Clinical Data ManagementMichael Roberson, MaxisITSlides (PDF, 321 KB)
Knowledge Graphs as a Foundation for the Analytics (R)EvolutionTim Williams, UCBSlides (PDF, 2.8 MB)
Using AI to Understand the Patient VoiceMichael Durwin, ICONSlides (PDF, 2.7 MB)
Identifying Sources of Bias in Machine Learning ModelsJim Box, SASSlides (PDF, 2.5 MB)
Python-izing the SAS Programmer: A Brief Introduction to the World of ObjectsMike Molter, LabCorpSlides (PDF, 476 KB)
Building a Bigger Analytics Tent at CDERRachel Dlugash and Stephen Wilson, FDA/CDERSlides (PDF, 881 KB)
Avoiding ADaM SinkholesRichann Watson, DataRich Consulting
Karl Miller, Clinical Solutions Group, Inc.
The Analysis Result Standard ProjectDiane Wold, CDISC
Jeff Abolafia, Pinnacle 21
Slides (PDF, 2.1 MB)
Generating .xpt Files with SAS, R and PythonTodd Case and YuTing Tian, Vertex PharmaceuticalsSlides (PDF, 837 KB)
Industry Projects on the Validation of Open-SourceMichael Stackhouse, AtorusSlides (PDF, 706 KB)

Event Co-Chairs

Matt Becker

Margaret Hung
MLW Consulting

Seminar and Presentation Abstracts

The 80/20 Rule: How AI-Driven Automation Can Improve Efficiency and Quality in Clinical Data Management
Michael Roberson, MaxisIT

MaxisIT will present a case study for how a mid-size sponsor adopted a simple AI-driven approach to ingesting clinical data for multiple applications, domains and across trials. An established data mapping and standardization strategy enables the system to map the majority of data items based on the system’s past experience. This approach is particularly valuable as the shift to decentralized trials introduces many new data sources and types. The presentation will describe the underlying metadata model, AI-driven mapping, the process of verifying new mapping, and the automated data refresh process.

By attending this presentation the audience, will learn how AI-driven data ingestion saves time, improves data quality and enables faster insight into on-going clinical trials. And how structured metadata and data standards improve the process of on-boarding new data sources, which reduces the cost and saves time for current and future data integrations.

Knowledge Graphs as a Foundation for the Analytics (R)Evolution
Tim Williams, UCB

Adoption of Graph technology is growing within the Pharmaceutical industry at a time when Machine Learning and Artificial Intelligence are poised to revolutionize analytics. Knowledge Graphs provide a strong foundation for both ML and AI. At a more rudimentary level, graph data improves data quality and data integration. The R platform provides many tools for working with graph data, from data conversion to analytics and visualization. This talk includes examples of how graph approaches are being used in the healthcare industry and beyond, including support for FAIR data principles and how the technology is both data-centric and patient-centric at the same time.

Using AI to Understand the Patient Voice
Michael Durwin, ICON

One of the greatest challenges to healthcare in general and to clinical trials specifically is to understand the Voice of the Patient; the direct narrative of their daily challenges, their sentiment regarding their illnesses and treatments, their behavior regarding medication and even the words and phrases patients use to discuss their symptoms. The increasing accuracy of artificial intelligence in social listening tools has allowed social intelligence scientists to understand the Patient Voice, not as filtered through care givers or healthcare providers but directly from patients. AI turns qualitative patient statements into massive amounts of quantitative data. This data in turn allows healthcare providers to monitor symptoms and even predict disease prior to diagnosis, pharmaceutical companies to recruit for clinical trials, and health organizations to be alerted to disease outbreaks via self-published public dialogue.

During this talk we will look at how AI is helping health organizations to sift through social and digital content to collect legacy and real-time data in order to develop strategic solutions to deal with the many challenges faced by patients and organizations regarding healthcare, specifically clinical trials.

I don’t work in the healthcare industry. Why should I attend?

Even if you don’t work in the healthcare industry, “patients” translates to “customers”. What social intelligence does for clinical trial recruitment can be directly applied to consumer conversion. Treatment perceptions are the same as product perceptions. And marketers have the same need to understand their customers’ behavior as healthcare providers and brands have for understanding what drives their patients’ decisions.

Identifying Sources of Bias in Machine Learning Models
Jim Box, SAS

Artificial Intelligence systems and Machine Learning models are having a dramatic impact on many industries. However, with every story of success, we are seeing instances of biased results doing real harm. In this session, we will look at some of the sources of bias and unexpected results, and explore ways to mitigate the negative impact of these models.

Python-izing the SAS Programmer: A Brief Introduction to the World of Objects
Mike Molter, LabCorp

As the industry looks more and more toward broadening its technological horizons, programmers accustomed to SAS are asking more questions about and experimenting more with open source languages such as Python. As with any other journey into an expansive wilderness, the question of where to start can be daunting. Wherever one does start, it doesn't usually take long to realize that when it comes to object-oriented languages, the comparison to SAS goes beyond superficial syntax differences into something much more fundamental. In this presentation we'll look for commonalities between two apparently very different worlds. The intention is to demonstrate that with a better understanding of objects, maybe the difference between these two worlds isn't so vast after all.

Building a Bigger Analytics Tent at CDER
Rachel Dlugash, FDA/CDER
Stephen Wilson, FDA/CDER

The Analytics and Informatics Staff (AIS) works within the Office of Biostatistics at FDA's Center for Drug Evaluation and Research (CDER) to help push for improvements and efficiencies associated the regulation of drugs and biologics. The AIS, though relatively new to the Agency, is deeply involved in a number of important projects to promote operational efficiency, support standardization and enhance/expand regulatory review processes at CDER. These activities include the creation of an AIS CDISC Data Standards Study Group for OB, the close collaborative development of high-priority COA/QRS supplements for the Agency, a pilot project to assess natural language processing (NLP) for information base development and the promotion of open source tools. We view this session as an opportunity for all of us to learn about each other and continue to work together to improve.

Avoiding ADaM Sinkholes
Richann Watson, DataRich Consulting
Karl Miller, Clinical Solutions Group, Inc.

The ADaM Implementation Guide was created in order to help maintain a consistency for the development of analysis data sets in the pharmaceutical industry. However, since its inception we have seen issues with guideline non-conformance which can impede this development process and carry impacts that are felt down-stream in subsequent processes. When working with ADaM data sets, non-compliance and other related issues are likely the number one source for numerous hours of re-work; not only creating unnecessary additional work for the data sets themselves, but also for reports, compliance checks, the Analysis Data Reviewers Guide (ADRG), etc. all the way down to the ISS/ISE processes. Considering this breadth of impact, one can see how devastating these sinkholes can be. Like any sinkhole, there is a way out of it but it is a long, tedious process that will consume a lot of resources and it is always better to avoid the sinkhole entirely. This presentation will assist you in creating compliant ADaM data sets, provide the reasoning on why you should avoid these sinkholes, all of which will help minimize re-work and likely eliminate the need for additional work.

The Analysis Result Standard Project
Diane Wold, CDISC
Jeff Abolafia, Pinnacle 21

The presentation will start with a high-level update on current CDISC activities, including the development of extended Analysis Results Metadata (ARM). It will continue with detail on the progress of the ARM project. The aims of the project are to 1) extend the current Analysis Results metadata to improve traceability and facilitate TFL automation and 2) develop an Analysis Results Data Model for storing analysis results.

Generating .xpt Files with SAS, R and Python
Todd Case, Vertex Pharmaceuticals
YuTing Tian, Vertex Pharmaceuticals

The primary purpose of this paper is to first lay out a process of generating a simplified Transport (.xpt) file with RStudio and Python to meet study electronic data submission requirements of the Food & Drug Administration (FDA). The second purpose of this paper is to compare the .xpt files created from three different languages: R, Python and SAS. The paper is the expansion of the original FDA guideline document “CREATING SIMPLIFIED TS.XPT FILES”, published in November, 2019. Transport files can be created by SAS, as well as open source software, including R and Python. According to the FDA guideline document mentioned above, .xpt files can be created by R and Python. This may allow Pharmaceutical companies to expand use of R and Python beyond data visualization and statistical analysis currently being generated by these two languages. Hopefully, readers can use the process shown in the paper as a template to create .xpt files.

Industry Projects on the Validation of Open-Source
Michael Stackhouse, Atorus

The world of open-source in the pharmaceutical industry has rapidly evolved over the last few years. Greater focus on the enablement of open-source languages for regulatory submissions has lead to the exciting new developments. These developments include industry specific working groups, focusing on different challenges of open-source, and open-source packages focused on clinical submission activities. A key area of focus for these efforts has been the topic of validation. Languages like R and Python bring new challenges in the area of validation, different than those of which the industry is accustomed. This presentation will provide an overview of several different ongoing efforts tackling these challenges in industry across organizations such as PHUSE and R Consortium.

CDISC ADaM – Principles, Rules and Complex Examples
Richann Watson, DataRich Consulting

This course will provide a high-level overview of some of the basic ADaM concepts; however, it is assumed that the attendee will be familiar with the different ADaM structures and principles. The course will delve into what is meant by traceability and analysis ready as well as look at some rules and best practices. However, the primary focus is to illustrate the implementation of some of the more difficult or less common concepts found in both the ADaM Implementation Guide (ADaMIG) and the ADaM Structure for Occurrence Data (OCCDS) documents. The course includes an illustration of the use of criterion variables (CRITy and MCRITy) and record-level and parameter-level population flags (-RFL and -PFL), as well as a demonstration of how to set up time-to-event and questionnaire/rating/scales analysis data sets. In addition, it will go into depth about AEs of special interest and the use of Standard MedDRA Queries (SMQ) and provide an illustration of how the OCCDS can be used to handle the non-typical analysis for events data.

Why you should take this seminar if you work with ADaM:

  1. You have an understanding of basic ADaM structures and principals, but those nuances have a tendency to trip you up or maybe you just need a refresher on standards and best practices.
  2. You have been asked to implement some less common or more difficult concepts, such as criterion variables (CRITy/MCRITy), record-/parameter-level population flags (-RFL/-PFL).
  3. You are tasked to create a data set that deals with adverse events of special interest (AESI) or a non-typical analysis of events.
  4. You need to set up time-to-event, questionnaire, rating and/or scales analysis data set, and would like to use the most effective techniques.

Presenter Biographies

Jeff Abolafia

Jeff Abolafia is currently Director of Product Innovation at Pinnacle 21. Previously Jeff held the position of Chief Strategist of Data Standards and was a member of the faculty in the Department of Biostatistics at the University of North Carolina. Jeff has been involved with public health research and data standards for over thirty years. Jeff is a frequent contributor and presenter at PhUSE, SAS Global Forum, PharmaSUG, and CDISC conferences. Jeff co-founded the RTP CDISC User's Group and is a member of the CDISC ADaM Team and several PHUSE Real World Evidence working groups. His areas of interest include real world evidence, mobile health, data standards, regulatory submissions, and bioinformatics.

Jim Box

Jim Box is a Principal Data Scientist at the SAS Institute, where he has been supporting customers implementation of machine learning for the past seven years. Prior to that he spent 18 years in Clinical Research Organizations primarily as a statistician and programming director, He has Masters Degrees in Statistics and in Analytics.

Todd Case

Todd Case has worked in roles of varied and increased responsibility in the biotech/pharmaceutical industries for over 17 years. Won team awards for leading and managing Statistical Programming teams to multiple successful FDA (NDA/BLA), EMA (EU), PMDA (Japan) and Rest of World (ROW) filings. Currently leading Data Standards, Strategic Outsourcing and Innovation teams within Biometrics. Previously led Therapeutic Area and Resourcing teams. Initiated and led Team and Departmental Meetings, x-Company Meetings and participated in PhUSE and other Industry Working Groups. Requested presenter and panelist as well as author of numerous papers and presentations at conferences in the US and internationally, including PhUSE, PhUSE SDE, PharmaSUG, PharmaSUG China, PharmaSUG SDE, NESUG, SAS Global Forum, JSM (Joint Statistical Meetings) and the Women's Innovation Network.

Rachel Dlugash

Rachel Dlugash is a Statistical Analyst at the FDA, in the Analytics and Informatics Staff Division of the Office of Biostatistics in CDER. She is involved in efforts to improve the implementation and use of data standards within the agency, as well as participating in outside agency collaborations. Prior to the FDA, she performed data analysis and data monitoring for academic clinical trials at Johns Hopkins. She has also worked with patients and collected clinical trial data at the University of Pennsylvania after starting her career at Octagon Research Solutions (now Accenture) where she prepared pharmaceutical clinical trial data to be SDTM compliant.

Michael Durwin

Michael Durwin is a 26-year veteran of social media strategy and research. With experience across a dozen industries both on the agency and brand side, he brings a holistic view of the many use cases for social intelligence. He has worked with brands ranging from HBO to the islands of Bermuda, and has helped launch the first Kindle, the first brand on YouTube, predicted the 2016 U.S. election in 2015, and tracked “a new flu-like virus in China” that didn’t yet have a name. He currently works as the Director of Social Intelligence & Communities at ICON out of their Boston office.

Karl Miller

Karl Miller is a Senior Data Standards Engineer with Clinical Solutions Group, an IQVIA company, and has over 19 years of industry experience through strategic partnerships with pharmaceutical companies, focused primarily within the clinical research environment. He has been actively working with CDISC standards over the past 15+ years, specializing in development and implementation of CDISC submission data standards for clinical trials. Currently a member of the CDISC ADaM team and Integration Sub-team.

Mike Molter

Mike Molter is the Associate Director of Statistical Programming and Technology Initiatives at LabCorp in Raleigh, NC. In this position, Mike works with FSP clients as well as internal study teams on the development of technical tools both inside and outside of SAS. He is also involved with the development of open source initiatives and processes as well as staff training. Mike has been involved in SAS programming since 1999, in clinical trials since 2003, and in industry data standards since 2005. He spent several years as a member of the CDISC XML Technologies team, and is a certified CDISC instructor for the define.xml class. Professional interests are centered around the use of cutting edge technologies to optimize the use of metadata throughout the lifecycle of a clinical trial. Personal interests include cycling, swimming, running, and reading.

Michael Roberson

Michael Roberson is VP, Business Development at MaxisIT. Michael has worked with clinical research and development companies as they plan and implement new systems across the life-cycle of clinical trial execution and reporting. Michael enjoys connecting with business leaders and subject matter experts to improve the process of planning, executing, documenting and analyzing clinical trials, with the ultimate goal of bringing new therapies to patients. Michael has worked with and helped build a number of successful teams with some of the best companies working in the clinical research domain.

Michael Stackhouse

Michael Stackhouse is at the cutting edge of data technology within the pharmaceutical industry. He has extensive CDISC experience, working with both Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) standards, and serving as a subject matter expert for Define.xml. He holds a bachelor’s degree from Arcadia University, where he studied business administration, economics, and statistics. He is a 2020 UC Berkeley School of Information Master of Information and Data Science (MIDS) program graduate, where he worked on projects involving computer vision, natural language processing, cluster computing, and deep learning. Currently, Mike serves as the co-lead of the PHUSE working group Data Visualization and Open-source Technology. Mike and his team at Atorus have developed several open-source R packages, including the Atorus packages Tplyr and pharmaRTF.

YuTing Tian

YuTing Tian is interested in exploring research in statistics. She is currently working as a senior statistical programmer at Vertex Pharmaceuticals Incorporated in the Boston area. She presented papers on SAS and R at SAS Global Forum 2021 and JSM 2020, as well as at local SAS Users Group conferences.

Richann Watson

Richann Watson is an independent statistical programmer and CDISC consultant based in Ohio. She has been using SAS since 1996 with most of her experience being in the life sciences industry. She specializes in analyzing clinical trial data and implementing CDISC standards. Additionally, she is a member of the CDISC ADaM team and various sub-teams. Richann loves to code and is an active participant and leader in the SAS User Group community. She has presented numerous papers, posters, and training seminars at SAS Global Forum, PharmaSUG, and various regional and local SAS user group meetings. Richann holds a bachelor’s degree in mathematics and computer science from Northern Kentucky University and master’s degree in statistics from Miami University.

Tim Williams

Tim Williams has led multiple initiatives within the PHUSE organization, including "Clinical Trials Data as RDF", "RDF Data Cubes for Clinical Trials Data, and an interactive Knowledge Graph Workshop. His focus is on practical applications of graph technology in the pharmaceutical industry. Tim is a Statistical Solutions Lead at UCB Biosciences, based in Raleigh, North Carolina.

Stephen Wilson

Dr. Wilson has worked as a mathematical statistician at FDA for more than 32 years. He is currently a Senior Staff Fellow with the Office of Biostatistics at the Center for Drug Evaluation and Research. Steve received his doctorate in Biostatistics from the University of North Carolina, Chapel Hill, in 1984. His professional interests are centered on working collaboratively to continuously improve the science and practice of clinical trials and the regulatory review of drugs and biologics.

Diane Wold

Diane Wold received her Ph.D. in Statistics from the University of North Carolina at Chapel Hill. She worked for Burroughs Wellcome/Glaxo Wellcome/Glaxo Smith Kline in a variety of roles for over 30 years. At the Glaxo Smith Kline merger, she joined the data standards group, and in 2002 she joined the CDISC SDS team. She was also involved in other CDISC teams, including the Protocol Representation Group and SHARE. In 2012 she became involved in the CFAST initiative to develop therapeutic area standards. In 2015 she joined CDISC as an employee.