Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2022 will feature over 150 paper presentations, posters, and hands-on workshops. Papers are organized into 14 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 11-May-2022.

Sections



Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-019 Jayanth Iyengar From %let To %local; Methods, Use, And Scope Of Macro Variables In Sas Programming
AP-025 Stephen Sloan Twenty Ways to Run Your SAS® Program Faster and Use Less Space
AP-030 Stephen Sloan
& Kirk Paul Lafler
A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
AP-038 Bartosz Szymecki Fast hashes for complex joins
AP-041 Jayanth Iyengar
& Josh Horstman
Look Up Not Down: Advanced Table Lookups in Base SAS
AP-072 Navneet Agnihotri
& Sumit Pradhan
& Rachel Brown
SAS Standard Programming Practices – Handy Tips for the Savvy Programmer
AP-081 Divya Vemulapalli A Programmer’s Perspective on Patient reported Outcomes in Oncology Trials
AP-083 Nikita Sathish Pharmacokinetics Concentration Data Deciphered – Presenting an Oncology Case Study
AP-107 Pradip Muhuri Macro for Working with Zip Transport Files from the Web: The Case of the Medical Expenditure Panel Survey
AP-108 Nancy Brucken From Codelists to Format Library
AP-111 Louise Hadden
& Richann Watson
Functions (and More!) on CALL!
AP-125 Derek Morgan ISO 8601 and SAS®: A Practical Approach
AP-146 Durga Kalyani Paturi Merging Pharmacokinetic (PK) Sample Collection and Dosing Information for PK Analysis based on Study Design and lessons learned
AP-171 Chary Akmyradov Self Modifying Macro to Automatically Repeat a SAS Macro Function
AP-180 Louise Hadden Putting the Meta into the Data: Managing Data Processing for a Large Scale CDC Surveillance Project with SAS®
AP-183 Troy Hughes Calling for Backup When Your One-Alarm Becomes a Two-Alarm Fire: Developing SAS® Data-Driven Concurrent Processing Models through Control Tables and Dynamic Fuzzy Logic
AP-188 Phil Bowsher Pins for Sharing Clinical R Assets
AP-189 Phil Bowsher
& Ryan Johnson
Quarto for Creating Scientific & Technical Documents
AP-193 Max Cherny R syntax for SAS programmers


Applications Development

Paper No. Author(s) Paper Title (click for abstract)
AD-001 Chengxin Li
& Tingting Tian
& Toshio Kimura
ADCM Macro Design and Implementation
AD-027 Stephen Sloan Running Parts of a SAS Program while Preserving the Entire Program
AD-036 Girish Kankipati
& Jai Deep Mittapalli
Reconstruction of survival curves: An approach of comparing study drug curve with the standard of care digital survival curve
AD-045 Kevin Lee How to create SDTM SAS transport datasets using Python
AD-059 Timothy Harrington Evaluating and Correcting Nominal and Actual Relative Time Measurements in Clinical Trials PK data
AD-062 Aiming Yang
& Jinchun Zhang
& Yiwen Luo
& Nan Xiao
& Yilong Zhang
A Multilingual Shiny App for Drug Labelling in Worldwide Submission
AD-069 Simiao Ye
& Jeff Xia
A Shiny App to Spell Check an ADaM Specification
AD-076 Uday Preetham Palukuru
& Runcheng Li
& Nileshkumar Patel
& Changhong Shi
Exploring R as a validation tool for statistical programming in clinical trials
AD-084 Jeff Xia A SAS Macro to Aid Assembly Review of M5 Module in Submission
AD-115 Ballari Sen Checking Compliance of CDISC SDTM Datasets Utilizing SAS Procedures.
AD-122 Lijun Chen
& Jin Shi
& Tong Zhao
Automatic CRF Annotations Using Python
AD-131 Inka Leprince
& Elizabeth Li
& Carl Chesbrough
SAS Macro to Colorize and Track Changes Between Data Transfers in Subject-Level Listings
AD-145 Jake Adler
& Assir Abushouk
Optimizing TLF Generation: Titles and Footnotes Applying a New Idea to a Basic Approach
AD-150 Lex Jansen Working with Dataset-JSON using SAS©
AD-155 Greg Weber
& Steve Hege
& Siddharth Kumar
Connecting SAS® and Smartsheet® to Track Clinical Deliverables
AD-167 Kevin Viel Using the SAS System® with the National Center for Biotechnology Information resources and Immune Epitope Database to explore the realm of potential SARS-CoV-2 variants.
AD-170 Kevin Viel Using the SAS System® to interface with the Entrez Programming Utilities (E-utilities) of the National Center for Biotechnology Information (NCBI).
AD-184 Troy Hughes SAS® Spontaneous Combustion: Securing Software Portability through Self-Extracting Code
AD-196 Prakasam R
& Sujatha Kandukuri
Using R-Shiny Capabilities in Clinical Programming to create Applications
AD-200 Charu Shankar Intro to Coding in SAS Viya


Artificial Intelligence (Machine Learning)

Paper No. Author(s) Paper Title (click for abstract)
AI-010 Ilan Carmeli Data Mining of Tables: The Barrier for Automation
AI-106 Jim Box What's your model really doing? Human bias in Machine Learning
AI-116 Yida Bao
& Philippe Gaillard
Potential features analysis affect Covid-19 spread
AI-130 Farha Feroze
& Ilango Ramanujam
Generating statistical Tables, Listings, and figures using machine learning, deep learning, and NLP techniques that provide more efficiency to the statistical programming process
AI-151 Madhusudhan Nagaram
& Syam Prasad Chandrala
Leveraging AI to process unstructured Clinical data in real-time


Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-002 Peng Du
& Hongli Wang
Subjects Enrolled in Multiple Cohorts within a Study- Challenges and Ideas
DS-054 Xiaoyin Zhong
& Jerry Xu
From the Laboratory Toxicity Data Standardization to CTCAE Implementation
DS-061 Lindsey Xie
& Richann Watson
& Jinlin Wang
& Lauren Xiao
ADaM Child Data Set: An Alternative Approach for Analysis of Occurrence and Occurrence of Special Interest
DS-064 Trevor Mankus Upgrading from Define-XML 2.0 to 2.1
DS-066 Raghava Pamulapati Above and Beyond Lesion Analysis Dataset (ADTL)
DS-095 Sandra Minjoe Making an ADaM Dataset Analysis-Ready
DS-112 Richann Watson
& Karl Miller
Standardised MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options
DS-121 Varsha Korrapati
& Johnny Maruthavanan
How to utilize the EC domain to handle complex exposure data
DS-134 Chad Fewell
& Jesse Beck
Redefining Industry Standards: Making CDASH and SDTM Work Together from the Database Level
DS-139 Christine McNichol
& Mike Lozano
A Tale of Two CDISC Support Groups - Supporting CDISC Standards is Anything but Standard
DS-197 Soumya Rajesh
& Michael Wise
Clinical Classifications: Getting to know the new kid on the QRS block!


Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-040 Maryna Aksaniuk The power of data visualization: Seaborn vs SGPLOT.
DV-056 Daniel Mattei Rita: Automated Transformations, Normality Testing, and Reporting
DV-071 Akshita Gurram
& Mamatha Mada
Introduction to Annual Reporting for Beginners!
DV-073 Luwei Pang
& Aiming Yang
SAS ® ODS Application of Interactive and Informative Clinical Trial Reports
DV-074 Jeremy Gratt
& Qiuhong Jia
& Girish Kankipati
Metadata Driven Approach for Creation of Clinical Trial Figures
DV-082 John saida Shaik
& Sreeram Kundoor
Building Dashboards for Data Review and Visualization using R Shiny
DV-088 Girija Javvaji
& Sivasankar Konda
Generating Forest and Kaplan Meier graphs for Regulatory Submission: Comparison of SAS and R
DV-089 Girija Javvaji
& Alistair D'Souza
Generating Bar Chat and Mutipanel graphs using SAS vs R for Regulatory Submission
DV-113 Hong Zhang
& Yixin Ren
& Huei-Ling Chen
A Dynamic Data Visualization Report for Efficacy Endpoints of Early Oncology Study
DV-120 Fan Lin R And SAS in Analysis Data Reviewer’s Guide and Data Visualization
DV-172 Louise Hadden SAS® PROC GEOCODE and PROC SGMAP: The Perfect Pairing for COVID-19 Analyses
DV-173 Joseph Cooney Copy comments: A Dynamic solution to solve the age-old need
DV-181 Louise Hadden Designing and Implementing Reporting Meeting Federal Government Accessibility Standards with SAS®
DV-185 Troy Hughes GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS® PROC GINSIDE


Hands-On Training

Paper No. Author(s) Paper Title (click for abstract)
HT-103 Bill Coar ODS Document & Item Stores: A New Beginning
HT-186 Phil Bowsher
& Cole Arendt
Team Code Collaboration in RStudio with Git
HT-201 Troy Hughes What’s black and white and sheds all over? The Python Pandas DataFrame, the Open-Source Data Structure Supplanting the SAS Data Set
HT-202 Louise Hadden Learn to Visualize Market Analysis Data Using SAS® PROC GEOCODE, PROC GINSIDE, and Prescriber Characteristics
HT-203 Nancy Brucken
& Karl Miller
Following Your Footsteps: Maintaining Traceability in ADaM Datasets
HT-204 Chris Hemedinger How to use Git with your SAS projects
HT-205 Matthew Slaughter
& Isaiah Lankham
Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Getting Started)
HT-206 Josh Horstman Getting Started Using SAS Proc SGPLOT for Clinical Graphics
HT-207 Mike Stackhouse
& Jessica Higgins
Using Tplyr to Create Clinical Tables


Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-029 Stephen Sloan Developing and running an in-house SAS Users Group
LS-033 Stephen Sloan
& Lindsey Puryear
Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
LS-047 Kevin Lee Leading into the Unknown? Yes, we need Change Management Leadership
LS-058 Zhouming(Victor) Sun Visualization of Programming Activities and Deliveries for Multiple Clinical Studies
LS-098 Priscilla Gathoni Schoveing Series 5: Your Guide to a Quality NO
LS-101 Corey Evans
& Samantha Kennedy
How Clear Communication & Expectations Improve the CRO – Sponsor Relationship: the CRO Perspective
LS-109 Eunice Ndungu
& Rohit Alluri
& Radha Railkar
Diversity, Equity, and Inclusion in the Workplace: Challenges and Opportunities
LS-136 Rajinder Kumar Exciting Opportunities for Fresher
LS-137 Carol Matthews Empower Your Programmers: Get the Most from Your Programming Team
LS-141 Scott Burroughs Rethinking Programming Assignments
LS-147 Aman Bahl
& Hrideep Antony
Agile innovation – an adaptive approach to transformation in Clinical Research Organizations
LS-209 David D’Attilio
& Nithiya Ananthakrishnan
& Praveen Garg
& Priscilla Gathoni
& Dilip Raghunathan
In the world of decentralized trials, do we still need to do double programming?


Medical Devices

Paper No. Author(s) Paper Title (click for abstract)
MD-140 Kalyan Chakravarthy Buddavarapu
& Sherry Ye
& Vijay Vaidya
Digital Health Technology in Clinical Trials: Opportunities, Challenges and Biometrics Future Re-imagined
MD-174 Douglas Milikien Measuring Reproducibility and Repeatability of an AI-based Quantitative Clinical Decision Support Tool Having a Medical Decision Point


Quick Tips

Paper No. Author(s) Paper Title (click for abstract)
QT-005 Dave Hall You Don't Know Where That's Been! Pitfalls of Cutting-and-Pasting Directly into the Programming Work Flow
QT-008 Abhinav Srivastva Programmed Value-Level Metadata for Define XML 2.0
QT-016 Yogesh Pande How Simple Table of Content can make CSR Table, Listings, and Figures Easily Accessible?
QT-017 Jianli Ping
& Krishna Sivakumar
Standardizing Procedures for Generating Dose Proportionality Figures to Improve Programming Efficiency
QT-021 Noory Kim Using Boolean Functions to Excel at Validation Tracking
QT-091 Xingshu Zhu
& Li Ma
& Bo Zheng
Auto Extraction of the Title Description in Table Report
QT-093 Himanshu Patel
& Chintan Pandya
SAS® Studio: Creating, Analyzing, and Reporting Data with Built-in & Custom Tasks
QT-102 Kamlesh Patel
& Jigar Patel
Getting odds ratio at Preferred Term (PT) level for safety analysis in R
QT-114 Chintan Pandya
& Himanshu Patel
Avoid Common Mistakes in Preparing the BIMO Deliverables
QT-135 Sumit Pratap Pradhan
& Navneet Agnihotri
& Rachel Brown
Data Masking
QT-143 Mrityunjay Kumar
& Dnyaneshwari Nighot
& Nitish Kumar
How to create BARDA Safety report for Periodic review
QT-144 Wayne Zhong Smart Batch Run your SAS Programs
QT-169 Aakar Shah
& Tracy Sherman
A Beginner’s Guide to Create Series Plots Using SGPLOT Procedure: From Basic to Amazing
QT-177 Louise Hadden Form(at) or Function? A Celebratory Exploration of Encoding and Symbology
QT-195 Naga Madeti Excel the Data
QT-199 Sunil Gupta Monitoring SDTM Compliance in Data Transfers from CROs


Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-020 Jayanth Iyengar Understanding Administrative Healthcare Datasets using SAS programming tools.
RW-092 Xingshu Zhu
& Bo Zheng
& Li Ma
Automation of Variable Name and Label Mapping from Raw Data to Analysis Data Standards
RW-160 Ashwini Yermal Shanbhogue RWD (OMOP) to SDTM (CDISC): A primer for your ETL journey
RW-179 Sherrine Eid
& Samiul Haque
& Robert Collins
Generating Real World Evidence On The Likelihood Of Metastatic Cancer In Patients Through Machine Learning In Observational Research: Insights For Prevention
RW-192 Venkat Rajagopal
& Syamala Schoemperlen
& Abhinav Jain
Introduction to Synthetic Control Arm


Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
SA-003 Jun Ke
& Kelly Chao
& Corey Evans
Estimating Differences in Probabilities (Marginal Effects) with Confidence Interval
SA-004 Natasha Oza
& Ben Wang
& Jesse Canchola
A SAS macro wrapper for an efficient Deming regression algorithm via PROC IML: The Wicklin method
SA-009 Ilan Carmeli
& Keren Mayorov
& Hugh Donovan
The Emerging Use of Automation to Address the Challenge of Cross-Table Consistency Checking of Output Used in the Reporting of Clinical Trial Data
SA-011 Ilan Carmeli
& Yoran Bar
Validation of Statistical Outputs Using Automation
SA-015 Ilya Krivelevich
& Lei Gao
& Richard Kennan
Derivation of Efficacy Endpoints by iRECIST Criteria: A Practical Approach
SA-090 Kenneth Liu
& Ziqiang Chen
Two Methods to Collapse Two Treatment Groups into One Group
SA-104 Lisa Mendez How RANK are your deciles? Using PROC RANK and PROC MEANS to create deciles based on observations and numeric values.
SA-118 Lili Li
& Roger Chang
& Benjamin Fine
& Yu-Chen Su
Win Ratio Simulation For Power Calculation Made Easy
SA-123 Mikhail Melikov
& Alex Karanevich
An Expanded Set of SAS Macros for Calculating Confidence Limits and P-values Under Simon’s Two-Stage Design Accounting for Actual and Planned Sample Sizes
SA-142 Igor Goldfarb
& Ritu Karwal
& Xiaohua Shu
Analysis of sample size calculations in clinical trial – errors, pitfalls and conclusions
SA-149 Matthew Slaughter
& Isaiah Lankham
Simple and Efficient Bootstrap Validation of Predictive Models Using SAS/STAT® Software


Strategic Implementation

Paper No. Author(s) Paper Title (click for abstract)
SI-006 Srinivasa Rao Mandava New Digital Trends and Technologies in Clinical Trials and Clinical Data Management
SI-014 Bill Coar Quality Control With External Code
SI-024 Gina Boccuzzi Applying Agile Methodology to Statistical Programming
SI-028 Stephen Sloan Getting a Handle on All of Your SAS® 9.4 Usage
SI-042 Steve Black Lessons learned while switching over to SAS Studio from SAS Desktop.
SI-044 Kevin Lee Enterprise-level Transition from SAS to Open-source Programming for the whole department
SI-055 Jeff Cheng
& Abhilash Chimbirithy
& Amy Gillespie
& Yilong Zhang
Implementing and Achieving Organizational R Training Objectives
SI-057 Jane Liao
& Fansen Kong
& Yilong Zhang
External R Package Qualification Process in Regulated Environment
SI-065 Danfeng Fu
& Ellie Norris
& Suhas Sanjee
& Susan Kramlik
PROC FUTURE PROOF 1.2 - Linked Data
SI-067 Frank Menius
& Monali Khanna
& Vincent Allibone
Finding Accord: Developing an eCOA Data Transfer Specification (DTS) All Can Agree On.
SI-097 Ben Bocchicchio
& Sandeep Juneja
Ensuring Distributed Data custody on Cloud Platforms
SI-099 Susan Kramlik
& Eunice Ndungu
& Hemu Shere
Cyber Resiliency – Computing Platform Modernization
SI-162 Mercidita Navarro
& Nancy Brucken
& Aiming Yang
& Greg Ball
Just Say No to Data Listings!
SI-164 Nagadip Rao
& Pavan Jupelli
Quality Gates: An Overview from Clinical SAS® Programming Perspective
SI-168 Parag Shiralkar
& Nagadip Rao
Vendor Audit: What it is and What to Expect from it
SI-187 Phil Bowsher
& Cole Arendt
Creating an Internal Repository of Validated R Packages
SI-191 Neharika Sharma Orchestrating Clinical Sequels with a Strategic Wand - Challenges of introducing Rolling CSRs in a Master Protocol


Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-023 Yizhuo Zhong
& Christine Teng
& Majdoub Haloui
Lessons Learned from Using P21 to Create ADaM define.xml with Examples
SS-034 Julie Ann Hood
& Sarah Angelo
Damn*t, The Define!
SS-035 Phil Hall FDA Advisory Committee Meetings– A Statistical Programmer’s Survival Guide
SS-039 Binal Mehta
& Patel Mukesh
Real Time Oncology Review Readiness from Programming Perspective
SS-079 Vincent Allibone
& Frank Menius
& Monali Khanna
eCOA and SDTM: Bring Order to the Wild West
SS-087 Hong Qi
& Majdoub Haloui
& Guowei Wu
Including Population Information in ADaM define.xml for Better Understanding of Datasets
SS-110 Jamuna Purma
& Bindya Bindya Vaswani
A SAS Macro to Support the Supplemental Qualifier Section in cSDRG
SS-152 Kristin Kelly Explanations in the Study Data Reviewer’s Guide: How’s It Going?
SS-154 Srinivas Kovvuri
& Kiran Kundarapu
& Satheesh Avvaru
& Randi McFarland
A Standardized Reviewer’s Guide for Clinical Integrated Analysis Data
SS-158 Jennifer McGrogan
& Rex Tungala
& Mario Widel
A Practical Approach to Preparing Documentation for Clinical Registries


e-Posters

Paper No. Author(s) Paper Title (click for abstract)
EP-007 Abhinav Srivastva Clean Messy Clinical Data Using Python
EP-032 Stephen Sloan Using SAS ® to Create a Build Combinations Tool to Support Modularity
EP-037 Shefalica Chand Sponsor-defined Controlled Terminologies: A Tiny Key to a Big Door
EP-063 Sabarinath Sundaram
& Lyma Faroz
Unravel the mystery around NONMEM data sets for statistical programmers
EP-068 Frank Menius
& Monali Khanna
& Vincent Allibone
Begin with the End in Mind: eCOA data from collection to analysis
EP-096 Noory Kim Bookmarking CRFs More Efficiently
EP-156 Syam Prasad Chandrala
& Madhusudhan Nagaram
& Chaitanya Chowdagam
& Jegan Pillaiyar
& Kunal Chattopadhyay
Automation of Clinical Data extracts using Cloud Applications
EP-176 Wenjun He Automation using SAS Makes it Easy to Monitor Dynamic Data in Clinical Trial




Abstracts

Advanced Programming

AP-019 : From %let To %local; Methods, Use, And Scope Of Macro Variables In Sas Programming
Jayanth Iyengar, Data Systems Consultants LLC
Mon, 8:00 AM - 8:20 AM, Location: Room 201-202

Macro variables are one of the powerful capabilities of the SAS system. Utilizing them makes your SAS code more dynamic. There are multiple ways to define and reference macro variables in your SAS code; from %LET and CALL SYMPUT to PROC SQL INTO. There are also several kinds of macro variables, in terms of scope and other ways. Not every SAS programmer is knowledgeable about the nuances of macro variables. In this paper, I explore the methods for defining and using macro variables. I also discuss the nuances of macro variable scope, and the kinds of macro variables from user-defined to automatic.


AP-025 : Twenty Ways to Run Your SAS® Program Faster and Use Less Space
Stephen Sloan, Accenture
Mon, 8:30 AM - 8:50 AM, Location: Room 201-202

When we run SAS® programs that use large amounts of data or have complicated algorithms, we often are frustrated by the amount of time it takes for the programs to run and by the large amount of space required for the program to run to completion. Even experienced SAS programmers sometimes run into this situation, perhaps through the need to produce results quickly, through a change in the data source, through inheriting someone else’s programs, or for some other reason. This paper outlines twenty techniques that can reduce the time and space required for a program without requiring an extended period of time for the modifications. The twenty techniques are a mixture of space-saving and time-saving techniques, and many are a combination of the two approaches. They do not require advanced knowledge of SAS, only a reasonable familiarity with Base SAS® and a willingness to delve into the details of the programs. By applying some or all of these techniques, people can gain significant reductions in the space used by their programs and the time it takes them to run. The two concerns are often linked, as programs that require large amounts of space often require more paging to use the available space, and that increases the run time for these programs.


AP-030 : A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
Stephen Sloan, Accenture
Kirk Paul Lafler, sasNerd
Tues, 1:30 PM - 1:50 PM, Location: Room 201-202

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or “keys”, and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or “messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.


AP-038 : Fast hashes for complex joins
Bartosz Szymecki, Syneos Health
Mon, 9:30 AM - 9:50 AM, Location: Room 201-202

Even though SAS® hash objects have been around for a long time, their application within the industry is still rather limited. It is well known that they can successfully replace data step MERGE statements with improved efficiency, but what about more complex types of joins that statistical programmers can come across in their day-to-day work? PROC SQL would be a natural choice, especially for small datasets. However, as data volumes increase, as do the number of runs on that data, the advantage of using SAS hash objects over PROC SQL becomes clearly visible. If, in addition, a SAS hash approach is implemented within an easy-to-use macro, one may start to wonder why it shouldn’t be used as the default for the majority of joins we do. This paper presents such macro in comparison to PROC SQL, using real-world examples that can arise whilst working with ADaM/SDTM standards.


AP-041 : Look Up Not Down: Advanced Table Lookups in Base SAS
Jayanth Iyengar, Data Systems Consultants LLC
Josh Horstman, Nested Loop Consulting
Mon, 10:30 AM - 11:20 AM, Location: Room 201-202

One of the most common data manipulation tasks SAS programmers perform is combining tables through table lookups. In the SAS programmer’s toolkit many constructs are available for performing table lookups. Traditional methods for performing table lookups include conditional logic, match-merging and SQL joins. In this paper we concentrate on advanced table lookup methods such as formats, multiple SET statements, and HASH objects. We conceptually examine what advantages they provide the SAS programmer over basic methods. We also discuss and assess performance and efficiency considerations through practical examples.


AP-072 : SAS Standard Programming Practices – Handy Tips for the Savvy Programmer
Navneet Agnihotri, Syneos Heath
Sumit Pradhan, Syneos Health
Rachel Brown, Syneos Health
Mon, 11:30 AM - 11:50 AM, Location: Room 201-202

As a programmer, did you ever paused for a moment and pondered that your program code could stand up to the review and re-use by your colleagues (beginners, intermediate or experts)? Ever wondered if the algorithm used in your program and that your programming style would pass the "white glove" assessment? We’ve all been there! Every programmer at some point in their career certainly encounters a scenario wherein some critical delivery to a client is planned. It is late in the day and to meet the client deadline, you’re unfortunately required to pick up another programmer’s code. As you open the program and scroll through it, your heart sinks. You quickly realize that resolving the issue is not a fast fix but will take hours of deciphering. But what shall we do now; we grumble, sigh and either re-write or struggle to clean it up to make it better for the next programmer and future deliverables. Inconsistent and unstructured code affects us all, including the client. Had the previous programmer followed a few key rules, the fix might have been a quick and simple modification. Code is an intellectual tangible asset and it’s essential that SAS® users possess the necessary skills to implement “best practice” programming techniques to ensure greater code readability, maintainability and longevity while ensuring code reusability. This paper summarizes strategies that should be used in your daily programming to effectively adhere to good programming practices and ensure that code is structured, readable, understandable, portable, and maintainable.


AP-081 : A Programmer’s Perspective on Patient reported Outcomes in Oncology Trials
Divya Vemulapalli, Statistical Programmer
Mon, 2:30 PM - 2:50 PM, Location: Room 201-202

Patient-reported outcomes (PRO) are used in clinical trials to assess the quality of life from the patient’s perspective. Patients are asked to respond to questionnaires related to physical functioning, adverse effects of treatment, and the overall burden of such adverse effects. In addition to overall quality of life, PROs also measure treatment benefit or risk and are helpful in supporting the labelling claims and reimbursement positioning for the drug. Since PROs are directly reported by the patient, they have gained increased utility for determining the quality-of-life endpoints of a clinical trial. This paper takes a closer look at the data collection process and the issues we may come across in PRO data in the context of clinical trials in an oncology setting. It also discusses some of the common PRO instruments in such trials and statistical analysis methods for those instruments.


AP-083 : Pharmacokinetics Concentration Data Deciphered – Presenting an Oncology Case Study
Nikita Sathish, Seattle Genetics
Tues, 8:30 AM - 8:50 AM, Location: Room 201-202

Pharmacokinetics (PK) analyses in clinical trials aim to help researchers understand to what degree the given drug is absorbed, distributed, metabolized, and excreted over time. The ability to view concentration as a function of time yields an efficient way to assess the safety of the drug, which is the first key step in any clinical trials. This paper will walk through a sample case study in an oncology setting to give a high-level overview of how statistical programming teams can transform lab collected PK concentration data into an interim PK concentration (PKC) file, which is then used by clinical pharmacologists as a source of information for deriving PK parameters. We explain in detail how several key variables are derived in the process and the challenges surrounding that effort. We will also clarify how clinical pharmacologists use the PKC file to generate the PK profile of the drug being studies. This will help statistical programmers understand the basic relationship between PK concentration and PK parameter data which in turn may boost the quality and efficiency of specifying and programming data sets in this area.


AP-107 : Macro for Working with Zip Transport Files from the Web: The Case of the Medical Expenditure Panel Survey
Pradip Muhuri, DHHS/AHRQ
Tues, 10:30 AM - 10:50 AM, Location: Room 201-202

Downloading zip SAS® transport files (one at a time) from the web, unzipping the files outside of SAS®, and then restoring them into SAS® data sets can be time-consuming and cumbersome, mainly when it involves multiple files. This paper presents a SAS® macro that automates the process of downloading Medical Expenditure Panel Survey (MEPS)-based zip SAS transport files for data years 2009-2018 from the Agency for Healthcare Research and Quality (AHRQ) website and restoring them into SAS data sets. When executed, the program dynamically saves all desired files (i.e., downloaded zip transport files, unzip files, and restored SAS data sets) in pre-created folders on a local computer.


AP-108 : From Codelists to Format Library
Nancy Brucken, IQVIA
Tues, 11:00 AM - 11:20 AM, Location: Room 201-202

Codelist files in the pharmaceutical industry are designed to enumerate all possible values for a given variable. One common method for capturing and storing codelist information is to use the Microsoft® Excel® template generated by Pinnacle 21 Enterprise or Community, since that file is often used as the basis for analysis dataset specifications. From there, it’s a short task to read the codelists from the file and convert them to SAS® formats and informats.


AP-111 : Functions (and More!) on CALL!
Louise Hadden, Abt Associates Inc.
Richann Watson, DataRich Consulting
Tues, 9:00 AM - 9:50 AM, Location: Room 201-202

SAS® Functions have deservedly been the focus of many excellent SAS papers. SAS CALL Routines, which rely on and collaborate with SAS Functions, are less well known, although many SAS programmers use these routines frequently. This paper and presentation will look at numerous SAS Functions and CALL Routines, as well as explaining how both SAS Functions and CALL Routines work in practice. There are many areas that SAS CALL Routines cover including CAS (Cloud Analytic Services) specific functions, character functions, character string matching, combinatorial functions, date and time functions, external routines, macro functions, mathematical functions, sort functions, random number functions, special functions, variable control functions, and variable information functions. While there are myriad SAS CALL Routines and SAS Functions, we plan to drill down on character function SAS CALL Routines including string matching; macro, external and special routines; sort routines; random number generation routines; and variable control and information routines. We could go on and on about SAS CALL Routines, but we are going to limit the SAS CALL Routines discussed in this paper, excluding any environment specific SAS CALL Routines such as those designated for use with CAS and TSO, as well as other redundant examples. We hope to demystify SAS CALL Routines by demonstrating real world applications of specific SAS CALL Routines, bringing some amazing capabilities to light.


AP-125 : ISO 8601 and SAS®: A Practical Approach
Derek Morgan, Bristol Myers Squibb
Tues, 2:30 PM - 2:50 PM, Location: Room 201-202

The ISO 8601 standard for dates and times has long been adopted by regulatory agencies around the world for clinical data. While there are many homemade solutions for working in this standard, SAS has many built-in solutions, from formats and informats that even take care of time zone specification, to the IS8601_CONVERT routine, which painlessly handles durations and intervals. These built-in capabilities, available in SAS 9.2 and above, will streamline your code and improve efficiency and accuracy.


AP-146 : Merging Pharmacokinetic (PK) Sample Collection and Dosing Information for PK Analysis based on Study Design and lessons learned
Durga Kalyani Paturi, Senior Programmer Analyst
Tues, 3:30 PM - 3:50 PM, Location: Room 201-202

PK merge dataset is a vital source dataset for PK statistical analysis. PK Concentrations, sample collection and dosing variables were merged and subsequently used as input for creating PK parameters. Because of the increasing complexity in the study designs, data issues and possible need of adding exclusion flags, one must carefully understand the study design/ Statistical Analysis Plan (SAP) and specifications before building the merge dataset. This paper explains the merge data set structure, present examples of merge in different study designs and various scenarios of data exclusions. Finally, we discuss the interesting lessons learned from programmer’s point of view.


AP-171 : Self Modifying Macro to Automatically Repeat a SAS Macro Function
Chary Akmyradov, Arkansas Children's Research Institute
Wed, 8:30 AM - 8:50 AM, Location: Room 201-202

SAS Macro language makes the workflow easier. Writing your macro functions to shorten your code and obtain the analysis is a time saver. However, if you have hundreds of variables to be used with a macro function or if you need to use the same macro function for each level of a variable, then it is a time consuming process to repeat the same macro function by changing its argument(s). In this presentation, you will learn how to develop a SAS macro function using %do and %while loops with %scan and %eval functions to run your macro function automatically for each different argument.


AP-180 : Putting the Meta into the Data: Managing Data Processing for a Large Scale CDC Surveillance Project with SAS®
Louise Hadden, Abt Associates Inc.
Wed, 9:00 AM - 9:20 AM, Location: Room 201-202

There are myriad epidemiological and surveillance studies ongoing due to the pervasive COVID-19 pandemic, often embodying the definition of “big data” with thousands of participants, variables, and lab samples. Data can be utilized coming from many different streams in a given study, for example: REDCap software, electronic medical records (EMR), chart abstraction, laboratory records, etc. Different contractors can be managing different aspects of the same project, the data is changing minute to minute, and the deliveries are required at a fast and furious pace. Wrangling all the different data sources requires robust data management routines, and SAS® can help, with tools to obtain data via APIs and PROC HTTP, metadata resources, and programming techniques. This paper and presentation will outline best practices for managing multiple aspects of large scale CDC surveillance projects, using SAS.


AP-183 : Calling for Backup When Your One-Alarm Becomes a Two-Alarm Fire: Developing SAS® Data-Driven Concurrent Processing Models through Control Tables and Dynamic Fuzzy Logic
Troy Hughes, Datmesis Analytics
Wed, 9:30 AM - 9:50 AM, Location: Room 201-202

In the fire and rescue service, a box alarm (or, simply, “alarm”) describes the severity of a fire. As an alarm is elevated from a “one-alarm” fire to a multi-alarm fire, additional, predetermined resources (i.e., personnel and apparatuses) are summoned to combat the blaze more aggressively. Thus, a five-alarm fire—or its equivalent “five-alarm” Cincinnati chili—represents something extremely hot and dangerous. After extinguishment, and as smoke and embers recede and overhaul begins, fire and rescue resources are released “back into service” so they can be utilized elsewhere if necessary. Essential to managing complex fireground operations, this load balancing paradigm is also common in grid and cloud computing environments in which additional computational resources can be shifted temporarily to an application or process to maximize its performance and throughput. This text instead demonstrates a programmatic approach in which SAS® extract-transform-load (ETL) operations are decomposed and modularized and subsequently directed (for execution) by control tables. If increased throughput is required, additional instances of the ETL program can be invoked concurrently, with each software instance performing various operations on different data sets, thus decreasing overall runtime. A configuration file allows end users to specify prerequisite processes (that must be completed before an operation can commence), thus facilitating the dynamic fuzzy logic that autonomously selects the specific ETL operations to be executed. This data-driven design approach ensures that the execution of operations can be prioritized, optimized, and, to the extent possible, run in parallel to maximize performance and throughput.


AP-188 : Pins for Sharing Clinical R Assets
Phil Bowsher, RStudio Inc.
Tues, 4:30 PM - 4:50 PM, Location: Room 201-202

RStudio will be presenting an overview of Pins for the R user community at PharmaSUG. This talk will review how to share data, models, and other R objects. Pins help make it easy to share objects across projects and with your colleagues. This is a great opportunity to learn about best practices when sharing data across your organization. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to Pins with examples using clinical data. This presentation will highlight how to attach an ETL job that outputs data as a Pin and how to then feed the data into a Shiny app.


AP-189 : Quarto for Creating Scientific & Technical Documents
Phil Bowsher, RStudio Inc.
Ryan Johnson, RStudio
Tues, 2:00 PM - 2:20 PM, Location: Room 201-202

RStudio will be presenting an overview of the new scientific and technical publishing system called Quarto, for the R user community at PharmaSUG. This talk will review Quarto as an open-source tool that is language agnostic for creating reports. This is a great opportunity to learn about a new tool to use for creating clinical reports that incorporate languages like R, JavaScript, and python. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to the current landscape of Quarto as well as recent developments and examples. This presentation will highlight Quarto and how it can be used to render plain markdown, Jupyter Notebooks, and Knitr documents.


AP-193 : R syntax for SAS programmers
Max Cherny, GlaxoSmithKline
Mon, 4:30 PM - 4:50 PM, Location: Room 201-202

Making the transition from SAS programming to R programming can be quite challenging, especially for SAS programmers who have never coded in an object-oriented language. However, learning to code in R does not have to be difficult for those already fluent in SAS. Both languages provide similar functionality, with the main difference being the syntax required to accomplish the same objective. This paper provides a side-by-side comparison of R and SAS syntax, using examples of a clinical programmer’s typical SAS code adjacent to the corresponding R syntax. These easy and reusable examples can function either as stand-alone programs or as starting points for further acquisition of R fluency for the SAS programmer.


Applications Development

AD-001 : ADCM Macro Design and Implementation
Chengxin Li, Regeneron
Tingting Tian, Regeneron Pharmaceuticals Inc.
Toshio Kimura, Regeneron Pharmaceuticals Inc.
Mon, 8:30 AM - 8:50 AM, Location: Room 203-204

Previous work explained in detail a set of macros that automated the development of safety ADaM datasets. As an extension to that previous work, this paper delves into the design and implementation of a macro that will generate the ADCM dataset. This paper will broaden the focus to include the end-to-end streamlined data processing perspective from data collection to analysis and thus will discuss some of the upstream and downstream components including data collection and SDTM considerations as well as the analysis need that ultimately drives the ADCM requirements. Since the ADCM macro incorporates analysis need, the resulting ADCM will be analysis-ready in supporting common concomitant medication analyses while allowing for traceability back to SDTM. The GPS (Global, Project, Study) method was applied to the macro development. Global variables are derived without user input, whereas project and study specific variables are controlled by macro parameters. The macro employs logically structured sequential modules with high prioritization on user experience. The macro parameters and core concepts are described in detail in the paper.


AD-027 : Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan, Accenture
Mon, 10:30 AM - 10:50 AM, Location: Room 203-204

The Challenge: We have long SAS ® programs that accomplish a number of different objectives. We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don’t want to re-run the parts that ran successfully. We don’t want to recreate data sets that were already created. This can take a considerable amount of time and resources and can also occupy additional space while the data sets are being created. We only need some of the results from the program currently, but we want to preserve the entire program. We want to test new scenarios that only require subsets of the program.


AD-036 : Reconstruction of survival curves: An approach of comparing study drug curve with the standard of care digital survival curve
Girish Kankipati, Seagen Inc
Jai Deep Mittapalli, Seagen Inc.
Mon, 11:00 AM - 11:20 AM, Location: Room 203-204

Survival analysis corresponds to a set of statistical approaches used to investigate the time it takes for an event of interest to occur. Most of the survival procedures involve KM plots, log-rank tests, etc. Survival analysis focuses on two important aspects: occurrence or non-occurrence of an event, and follow-up time. We are often interested in assessing whether there are differences in survival (or cumulative incidence of event) among different treatments. For instance, in first-in-human (FIH) clinical trials we want to compare survival curves between the investigational and comparative drugs on market (or the standard of care). For exploratory purposes, there is often an interest in seeing study vs. comparison drug survival curves on the same graph. To plot the survival curve for standard of care, we may need to construct the data from a digital image file in .png or .jpg format. How do we do it? To address these challenges, this paper introduces an innovative approach for reconstruction of data and comparison of survival curves. It will cover how to extract data points from survival curve digital images, using the trialdesign.org site as an example. The data points extracted from the image are saved to an Excel file which is then converted into a SAS® data set and combined with study drug survival data to generate the plot.


AD-045 : How to create SDTM SAS transport datasets using Python
Kevin Lee, Genpact
Mon, 9:00 AM - 9:20 AM, Location: Room 203-204

The paper is written for those who wants to use Python to create SDTM SAS transport files from Raw SAS datasets. The paper will show the similarity and differences between SAS and Python in terms of SDTM dataset development, and actual Python codes to create SDTM SAS transport file. The paper will start with Python packages that could read and write SAS datasets such as xport, sas7bdat, and pyreadstat. The paper will introduce how Python reads SAS datasets from the local drive such as demographic, exposure, randomization and disposition raw SAS datasets. The paper will also show how Python creates variables from raw data such as SEX, USUBJID, RACE, RFSTDTC, and RFENDTC. The paper will also show how Python merge datasets using outer and inner join. The paper will also show how programmers use Python Dataframe for data manipulation such as renaming, dropping, and replacing variables. Finally, the paper will show how Python could create SAS SDTM transport file in the local drive. The paper also includes the actual python codes that read Raw SAS datasets, merge, manipulate and write SDTM DM SAS xport file.


AD-059 : Evaluating and Correcting Nominal and Actual Relative Time Measurements in Clinical Trials PK data
Timothy Harrington, Navitas Data Sciences
Mon, 11:30 AM - 11:50 AM, Location: Room 203-204

Pharmacokinetic (PK) data has two types of observations: Dosing and Sampling. Amongst the data items recorded in both of these observation types are the nominal and actual dose administration or sample collection times. The nominal times are the named times specified on the study case report forms such as VISIT, CYCLE and DAY. The actual times are the chronological times (local to the location of the site) as recorded by the investigating staff when the dose was administered or the sample, such as a blood draw, was collected. To analyze the PK data there is a need to know the actual and relative differences between these various times. Examples are the nominal and actual time a sample was taken after the first dose or the most recent dose, or how soon a pre-dose sample was collected before a dose was administered. This paper is intended for persons with knowledge of base SAS and with a basic understanding of Clinical Trials Pharmacokinetic data. There is identification of the different timing variables and discussion, using SAS(r) code examples, of how these variables are evaluated. There is also a discussion of common problems caused by incomplete or missing dates and times and suggested imputation methods used for their correction.


AD-062 : A Multilingual Shiny App for Drug Labelling in Worldwide Submission
Aiming Yang, Merck & Inc
Jinchun Zhang, Merck & Co., Inc
Yiwen Luo, Merck & Co., Inc
Nan Xiao, Merck & Co., Inc
Yilong Zhang, Merck & Co., Inc
Mon, 2:30 PM - 2:50 PM, Location: Room 203-204

In preparing a successful worldwide submission for a drug or vaccine, one of the critical steps is to provide drug labeling in different languages and different formatting requirements to worldwide regulatory agencies. If a drug label contains figures, there is a challenge to create and maintain drug labels in different languages programmatically. For example, in oncology studies, Kaplan-Meier Plot (K-M plot) is frequently required for drug labelling worldwide. An application that can generate the figure in the required language based on the sole data source is highly desired, because it ensures accuracy, consistency, and security of the data. Shiny is an open-source R package for building web applications straight from R. In this paper, we demonstrate a user-friendly Shiny app to simplify the labeling process in collaboration with local regulatory teams who need to update drug labels in local languages. The Shiny app simplifies the manual steps to re-create a K-M plot in different languages and different formatting requirements. We explain the implementation of the Shiny app with code examples.


AD-069 : A Shiny App to Spell Check an ADaM Specification
Simiao Ye, Merck & Co., Inc.
Jeff Xia, Merck
Mon, 1:30 PM - 1:50 PM, Location: Room 203-204

The ADaM define.xml is a required file that describes the structure and content (datasets, variables, value level metadata, and controlled terminologies) of the data in a submission package. To produce a high quality define.xml file, Pinnacle 21 Enterprise (P21E) is often a preferred tool for the compliance check and validation of CDISC standards. However, the features to automatically check spelling errors, SAS syntax, extra space, and traceability of SDTM variables are not implemented in P21E and may demand excessive attention and efforts from the ADaM define.xml authors. This paper describes a Shiny application that can read through various input files including the SDTM define.xml, ADaM specification spreadsheet, and user-defined dictionaries and perform an automatic spell check to ensure document readability and eliminate manual efforts in identifying the spelling errors. Implementation of the app, relevant package and function details, and examples are provided in this paper.


AD-076 : Exploring R as a validation tool for statistical programming in clinical trials
Uday Preetham Palukuru, MERCK
Runcheng Li, MERCK
Nileshkumar Patel, MERCK
Changhong Shi, MERCK
Tues, 8:30 AM - 8:50 AM, Location: Room 203-204

SAS® has long been used as the go to and in some cases the only software for statistical programming in clinical trials. In recent years, R has emerged as an alternative software for use in clinical trial data analysis. However, the widespread use of R is still hindered by some practical considerations such as, learning curve for SAS® only users and limited knowledge on the capabilities of R for data analysis. In this paper, we will discuss the unique strengths of R in the validation of clinical trial data analysis deliverables. Exploratory results from the efforts to validate SAS® generated outputs using R will be discussed. We mainly focus on the validation of the most used deliverables in clinical trial data analysis i.e. Tables, Listings and Figures (TLFs). We will showcase the interactive features available in R that help in validation of figures We will also demonstrate the validation capability of R by comparing mockup table package and final output tables, so any discrepancy in titles, subtitles or footnotes can be identified efficiently. The ability of R to perform complicated statistical procedures with reduced programming code complexity and easier execution will also be discussed in this paper. By integrating R shiny applications, the improvement in efficiency of validation due to easier user interface, is also discussed. We hope the discussed features and capabilities will help promote the use of R as an efficient validation tool in clinical trial data analysis.


AD-084 : A SAS Macro to Aid Assembly Review of M5 Module in Submission
Jeff Xia, Merck
Tues, 9:00 AM - 9:20 AM, Location: Room 203-204

Assembly Review is the final check of the published electronic regulatory submission performed by the submission’s content authors/owners prior to release to health authorities and/or subsidiaries. Normally the SAS programming group is the contents owner of all data components, which includes SAS dataset in XPT format, annotated case report form, define documents as well as reviewer’s guide, etc. in m5 module for all submitted studies. Therefore, the programming group is responsible to perform a thorough assembly review to ensure correctness and completeness of each data component in m5 module. Traditionally this was done by manually comparing the list of file names in each subfolder in m5 module in a published sequence with the corresponding folder in the final drop off location, and catching the difference if any to avoid any mistakes/overlooks in regulatory publishing process. This paper introduces a SAS macro that compares the list of file names in two folders (i.e., folder “New” and folder “Old”) including their subfolders, and flags any files not matching, i.e., a file exists in the folder “New” but not in the folder “Old”, vice versa. Additionally, if a file exists in both folders “New” and “Old”, then it compares the file time stamp and flags if there is any difference. Lastly, this macro generates a report to list the name of each file and details of comparison by folder/subfolder.


AD-115 : Checking Compliance of CDISC SDTM Datasets Utilizing SAS Procedures.
Ballari Sen, Bristol Myers Squibb
Tues, 9:30 AM - 9:50 AM, Location: Room 203-204

CDISC (Clinical Data Interchange Standards Consortium) SDTM (Study Data Tabulation Model) is a required standard data structure to be used for submitting tabulation data to the US FDA and PMDA and NMPA submissions for clinical trials. An approach for implementing such data standards should be quick and effective with less resource utilization. Compliance Measures are needed to check compliance and streamline operations for the preparation of data for FDA, PMDA and NMPA submission, although there are many good validation compliance tools for CDISC SDTM ready compliance datasets such as Pinnacle 21 Enterprise [1][2]. However, the reusability, efficiency and automation of SAS® provides an effective and customized tool for validating CDISC SDTM compliance datasets to be used in clinical FDA submissions. This paper will discuss how to efficiently produce the validation reports utilizing SAS® software utilizing base SAS capabilities. Validation is an essential part of the CDISC SDTM data sets, summary table and listing development process for elaborating the diverse approaches SAS programs can be applied to catch errors in data standard compliance or to identify inconsistencies that would otherwise be missed by any other general tool utilities. [2]. This paper also addresses the unclean raw data issues which could mess up the programming logic and warning messages and could ultimately lead to wrong or inaccurate conclusions and results impacting the drug’s safety and efficacy, which could in turn be put the patient’s health in jeopardy and have a significant negative impact on the pharmaceutical company [8].


AD-122 : Automatic CRF Annotations Using Python
Lijun Chen, LLX Solutions, LLC
Jin Shi, LLX Solutions, LLC
Tong Zhao, LLX Solutions, LLC
Tues, 3:30 PM - 3:50 PM, Location: Room 203-204

The annotated CRF is a required component in the submission package sent to FDA. However, the CRF annotation is usually a monotonous and time-consuming manual process, and some existing automatic packages still have limitations. Taking the advantage of all the powerful functions Python can carry, we have developed a Python package to automatically annotate CRF pages. The package is designed to work on a blank CRF that is for a version update or that comes from a new study. The whole process consists of four steps, kicked off with optional information extraction from an old annotated CRF, followed by reading and mapping new CRF to provide input for the next step of annotating CRF, and lastly bookmarks can be optionally added. At a few check points, the Excel spreadsheets are used to allow manual controls to ensure accuracy. The current performance of the package is acceptable, and more explorations have been planned. This paper introduces some key points of the current stage of the package, and some relevant sample code and examples will be provided in the paper.


AD-131 : SAS Macro to Colorize and Track Changes Between Data Transfers in Subject-Level Listings
Inka Leprince, PharmaStat, LLC
Elizabeth Li, PharmaStat, LLC
Carl Chesbrough, PharmaStat, LLC
Tues, 1:30 PM - 1:50 PM, Location: Room 203-204

Most clinical trials have an independent data monitoring committee (DMC). In order to provide quality data to the DMC, sponsors conduct internal reviews of available data prior to each DMC meeting. Subject-level listings are used to aid the clinical data review of the ongoing studies. To ease the burden on the data reviewers, providing color coding for old/deleted records from the previous data transfer and highlighting new records from the current data transfer is increasingly requested by sponsors. We have developed a SAS macro, TrackCHG, to meet this request. This paper describes the SAS macro code of TrackCHG in detail. It also provides sample datasets and code for the reader to create an adverse event listing using the TrackCHG macro. After reading this paper, SAS users should be able understand and implement the TrackCHG macro into their existing listing programs to color code their output listings.


AD-145 : Optimizing TLF Generation: Titles and Footnotes Applying a New Idea to a Basic Approach
Jake Adler, PROMETRIKA
Assir Abushouk, PROMETRIKA
Tues, 2:00 PM - 2:20 PM, Location: Room 203-204

When receiving feedback from a sponsor on tables/listings one of the most frequent updates is title/footnote changes. These updates can be very time consuming, so what if there was a way to take the updated shells and automate that to the programs? In this paper Excel Visual Basics for Applications (VBA) will be used to take the contents of titles/footnotes from the shells of a Word document and import that into SAS so titles and footnotes can be read in automatically. To validate this process when the shells are updated Apache Subversion (SVN) will be used to compare changes in old shells versus new shells.


AD-150 : Working with Dataset-JSON using SAS©
Lex Jansen, CDISC
Tues, 11:00 AM - 11:50 AM, Location: Room 203-204

The Operational Data Model (ODM) is a vendor neutral, platform independent data exchange format, intended primarily for interchange and archival of clinical study data pertaining to individual subjects, aggregated collections of subjects, and integrated research studies. ODM provides the foundation for most CDISC Data Exchange Standards, such as Define-XML. CDISC is in the late-stage development of the much-anticipated ODM v2.0 update. ODM v2.0 will include the specification of Dataset-JSON, an efficient and modern exchange format for data which addresses many of the limitations of SAS v5 XPT files. JSON representations for exchange standards are widely used in today’s architectures. In RESTful web services, JSON is often the preferred format for the service response, due to its compactness and ease of use in mobile applications. Other standards used in healthcare, such as HL7-FHIR support JSON as well as XML, together with other formats such as RDF. This paper will show how SAS can work with Dataset-JSON, both reading and writing Dataset-JSON. We will discuss the native SAS JSON engine, but also the use of PROC LUA with publicly available JSON modules.


AD-155 : Connecting SAS® and Smartsheet® to Track Clinical Deliverables
Greg Weber, Navitas Data Sciences
Steve Hege, Alexion
Siddharth Kumar, Navitas Life Sciences
Tues, 2:30 PM - 2:50 PM, Location: Room 203-204

ABSTRACT Efficient monitoring and management of the clinical programming development and validation lifecycle is vital and can be a challenge in clinical study reporting. There are various methods used by companies to track the progress of clinical deliverables. Our current solution is tracking progress using Microsoft Excel spreadsheets. In our Statistical Computing Environment (SCE), this is not ideal as it requires checking out, downloading, editing the spreadsheet and then uploading and checking in the updated Excel file. This manual process is both time-consuming and prone to error. In addition, sharing and working collaboratively is problematic as only one user can update the file at a time, so a better solution is desired. In this paper, we discuss connecting Smartsheet® with our SAS® LSAF® environment to provide a more collaborative Clinical Programming Deliverable Tracker requiring less intervention from managers and programmers. Smartsheet® is an online service for work management and collaboration that uses a tabular interface and provides workflow capabilities. We demonstrate techniques that utilize HTTP and REST to interact with and update our Smartsheet® Tracker from SAS®, using Proc HTTP along with the LSAF® macro and Smartsheet® APIs. Smartsheet® has become an important part of our SCE ecosystem and, using the processes and techniques developed for our Clinical Tracker, we plan to automate other SCE processes.


AD-167 : Using the SAS System® with the National Center for Biotechnology Information resources and Immune Epitope Database to explore the realm of potential SARS-CoV-2 variants.
Kevin Viel, Histonis, Incorporated
Tues, 10:30 AM - 10:50 AM, Location: Room 203-204

SARS-Cov-2 emerged in late 2019 and quickly became a pandemic. Infection may result in Covid-19, which may cause acute respiratory distress syndrome (ARDS), a risk factors for death, and post-infection sequelae (Covid-19 long haulers). The public databases of the National Center for Biotechnology Information (NCBI) of the United States National Institute of Health (NIH) include RNA sequences of SARS-CoV-2 and their classification by the Pango Nomenclature System. The number of viruses produced by an infected person and the poor fidelity of replication of viruses make variants or SARS-CoV-2 a practical certainty. The “epitope escape” model may predict which variants emerge and, specifically, does not require inter-human transmission. Even asymptomatic infections will generate variants. Generally, an MHC-II groove must bind a peptide with sufficient affinity to effectively present it the adaptive immune system to generate antibodies. Tools such as the Immune Epitope Database (IEDB) estimate which peptides might bind to the MHC-II molecules and, therefore, might emerge as variants of concern, such as the “Delta” or “Omicron” variants, whether Covid-19 is asymptomatic. The goal of this paper is to introduce the NCBI repository, to demonstrate the use the SAS System® to enumerate the realm of variants expected in a given SARS-CoV-2 sequence, i.e. the lineages, that may arise, given a single, random nucleotide substitution per codon and then estimate the binding affinity of the resulting peptides to the MHC-II genotypes available in IEDB. This paper demonstrates snippets of SAS code that are required given the sheer size of genomic data.


AD-170 : Using the SAS System® to interface with the Entrez Programming Utilities (E-utilities) of the National Center for Biotechnology Information (NCBI).
Kevin Viel, Histonis, Incorporated
Wed, 9:00 AM - 9:20 AM, Location: Room 203-204

The National Center for Biotechnology Information (NCBI) of the United States National Institute of Health has multiple databases that are generally accessed via a web interface. Popular databases include PubMed, which houses citations for biomedical literature, and PubChem, which holds chemical information, such as chemicals by name, molecular formula, or structure, among other data. NCBI provides a set of eight server-side programs, the Entrez Programming Utilities (E-utilities), that provide a stable interface into the Entrez query and database system. The goal of this paper is to introduce the E-utilities and describe a SAS macro that uses the HTTP procedure using the E-utilities URL.


AD-184 : SAS® Spontaneous Combustion: Securing Software Portability through Self-Extracting Code
Troy Hughes, Datmesis Analytics
Wed, 8:30 AM - 8:50 AM, Location: Room 203-204

Spontaneous combustion describes combustion that occurs without an external ignition source. With the right combination of fire tetrahedron components—including fuel, oxidizer, heat, and chemical reaction—it can be a deadly yet awe-inspiring phenomenon, and differs from traditional combustion that requires a fire source, such as a match, flint, or spark plugs (in the case of combustion engines). SAS® code as well often requires a "spark" the first time it is run or run within a new environment. For example, SAS programs may operate correctly in an organization's original development environment, but may fail in its production environment until necessary folders are created, SAS libraries are assigned, control tables are constructed, or configuration files are built or modified. And, if software portability is problematic within a single organization, imagine the complexities that exist when SAS code is imported from a blog, white paper, textbook, or other external source into a new environment. The lack of software portability and the complexities of initializing new software often compel development teams to build software from scratch rather than attempting to reuse or rehabilitate existent code. One solution is to develop SAS programs that flexibly build and validate required environmental and other components during execution. To that end, this text describes techniques that increase the portability, reusability, modularity, and maintainability of SAS code and demonstrates self-extracting, spontaneously combustible code that requires no spark.


AD-196 : Using R-Shiny Capabilities in Clinical Programming to create Applications
Prakasam R, Ephicacy Lifescience Analytics
Sujatha Kandukuri, Ephicacy Lifescience Analytics
Wed, 9:30 AM - 9:50 AM, Location: Room 203-204

According to clinicaltrials.gov, the number of Clinical studies until 2021 is 399,549. More than 30,000 studies have been registered every year for the last four years. Lots of Data is generated, and to make full use of the possible opportunities underlying, big Pharma companies started initiatives to use open-source tools like R. In Clinical Programming, the potential of data could be unleashed by working with it efficiently by applying the latest technological advancements. Examples include automating some data-related activities, enriching the aids for decision making through interactive visualizations., Possibilities exist in the current technology space to integrate Automation capabilities, Web Deployment capabilities, GUI based Interactive Visualization capabilities - all together into a single front-end application using R-Shiny, which helps achieve efficiency with zero license cost. This paper presents a use case for clinical programming to build a Deployable, Scalable, Reusable Application/Utility Tool for automating Patient Profile and Visualising Patient information made on R-Shiny. The outcome of the created application would be to help understand how we shall build an integrated solution in a single application by making use of: • Reactive Programming available in R Shiny to create a GUI based Interactive tools • Web Deployment capabilities available in Shiny framework to deploy the developed app, accessible through Web-browsers • Programming capabilities available in R Programming for automating repetitive tasks This would trigger ideas that can be applied in different clinical avenues and encourage the migration from the traditional licensing models to open-source frameworks.


AD-200 : Intro to Coding in SAS Viya
Charu Shankar, SAS Institute
Tues, 4:00 PM - 4:50 PM, Location: Room 203-204

Can I use my existing SAS code in SAS Viya? What's a Cloud Analytic Server(CAS)? How's SAS 9 different from CAS. What is a Compute Server? What is the advantage of learning to speak in SAS Viya? These are some of the questions top of mind for the Classic SAS 9 User who has a solid foundation in SAS 9 & isn't sure how things integrate with existing SAS code. Learn through the yummy analogies of SAS instructor-yoga teacher & chef, Charu who drew from her experience of cooking in a yoga retreat in the Bahamas for over 300 guests to understand the distinction between SAS & CAS. You too can learn the concepts. If techie terminology has your head all wound up in circles, then this session is just for you. Learn how you can stretch your knowledge of SAS programming concepts to beautifully write CAS code with just a few simple tweaks.. all this & much more in this session, All levels welcome. Some basic knowledge of SAS programming will help you get more value out of this session.


Artificial Intelligence (Machine Learning)

AI-010 : Data Mining of Tables: The Barrier for Automation
Ilan Carmeli, Beaconcure LTD
Tues, 11:00 AM - 11:20 AM, Location: Lone Star B

Currently, the statistical analysis outputs are validated manually by SAS programmers and biostatisticians. There are two main reasons for that: The process of generating and validating statistical analysis outputs is not standardized. Specifications and definitions are written in ‘Word’ documents and ‘Excel’ files. The outputs of the statistical analysis are static and considered flat files (PDFs, RTFs, HTMLs, etc.). Metadata is missing, and the information regarding the data hierarchy within and between the outputs does not exist. The files are made for human review, not for software analysis. In order to develop any type of automation, an automated solution for converting static files into machine readable formats, should be developed. To achieve that, we must follow these steps: Parsing: converting raw files into an abstract structure. Utilizing information contained in the files, such as abbreviations, clinical terms, synonyms, etc. Classification of headers identifies column and row headers automatically. Cells characteristics: adding the metadata information to each cell. As a result, the information of every cell resides in a structure database, and we can finally benefit from it. Table mining process example: There is no information about the cells in the static output. In our presentation we will describe the challenges and the implications of implementing such a process.


AI-106 : What's your model really doing? Human bias in Machine Learning
Jim Box, SAS Institute
Tues, 1:30 PM - 1:50 PM, Location: Lone Star B

Machine Learning is showing up everywhere in our industry, from performing diagnostic tasks to determining how patients are treated. People sometimes think that since a model came up with the process, it must be a fair one. However, there are multiple places where human biases affect the outcomes of these models. We'll explore some of the sources of bias, see the impact this can have on real people, and look at ways to mitigate the possible harmful side effects of using machine learning.


AI-116 : Potential features analysis affect Covid-19 spread
Yida Bao, Auburn University
Philippe Gaillard, FLORIDA STATE UNIVERSITY
Tues, 2:00 PM - 2:20 PM, Location: Lone Star B

The emergence of the covid19 virus has deeply affected the world for two years. There are no government or effective means that can completely suppress its spread. On the other hand, we cannot deny that the spreading map of the covid-19 virus is not completely average. There are always regions with fewer infections and deaths than others. In this paper, we will use different features to explore the reasons. We selected the epidemic data from 2,352 counties in the United States and updated the latest infection and death number in 2022. Meanwhile, we introduced 28 objective variables such as temperature, longitude and latitude, diabetes ratio, smoker proportion, etc. for statistical analysis. This is a very meaningful research project, which can help health institutions to invest the right resources according to different situations in different regions. Basic and graceful linear regression will be used in our projects, telling us a lot of potential information. Besides, we use a series of machine learning algorithms such as neural networks, decision trees, etc. to make a prediction based on the feature in the dataset. SAS 9.4 and SAS enterprise miner will be the main platforms for this project.


AI-130 : Generating statistical Tables, Listings, and figures using machine learning, deep learning, and NLP techniques that provide more efficiency to the statistical programming process
Farha Feroze, Symbiance Inc
Ilango Ramanujam, Symbiance Inc
Tues, 2:30 PM - 2:50 PM, Location: Lone Star B

Our in-house tool is a seamless solution to generate statistical TLG reporting in a structured and faster way with the help of AI that reduces manual effort significantly. The tool is built with an efficient framework for accelerating the TLG generation process in just three simple steps. Generate TLG reports just by uploading TLG mock shell and ADaM datasets. AI engine will automatically annotate mock shell with ADaM variables about 70 to 80% for the first time and accuracy can be improved on subsequent study usages. Once the annotations are finalized by the user, the system will generate the reports (tables, listings, and figures) in minutes and provide the stand-alone program for each TLG. The tool is easily configurable for any sponsor related TLG shell. The tool is trained by historical data. This paper will discuss one such algorithm of Machine Learning which was implemented as a tool.


AI-151 : Leveraging AI to process unstructured Clinical data in real-time
Madhusudhan Nagaram, Allogene Therapeutics
Syam Prasad Chandrala, Allogene
Tues, 3:30 PM - 3:50 PM, Location: Lone Star B

Clinical trial data is usually posted as structured data in the form of SAS datasets or excel or csv formats. This structured data can be easily read using various analytics tools such as SAS, R , Tibco Spotfire etc. and can be presented as listings, summarized reports or visualizations or dashboards. Sometimes Translational data which includes labs, Pharmacokinetics, Pharmacodynamics and biomarker data can be obtained from external vendors in various file formats. Often these external vendors lack the capability to send real-time data in structured file format. Instead, they send this data as PDF documents, scanned images of printer or handwritten documents which is challenging to read and ingest into the tools for data listings and visualizations without human intervention. Our goal is to minimize or eliminate human intervention, automate the dataset acquisition, pre-processing, validating and ingesting data into AWS datalake at real-time for functional users as an analytics platform. We have addressed this challenge by creating an automation process using various tools like Microsoft Power Automate, AWS S3, AWS Lambda functions, AWS Athena and MS AI models. So, the documents were processed in real-time and be available on datalake for analytics tools and dashboards for statistical analysis by utilizing serverless applications. In this paper, we are going to demonstrate step by step process of mapping the data from PDF files using Microsoft Power automate, storing it in AWS S3 bucket, querying the data using AWS Athena and reading it into Tibco Spotfire using information link.


Data Standards

DS-002 : Subjects Enrolled in Multiple Cohorts within a Study- Challenges and Ideas
Peng Du, INNOVENT BIOLOGICS INC
Hongli Wang, INNOVENT BIOLOGICS INC
Tues, 10:30 AM - 10:50 AM, Location: Room 303-304

Recently, an increasing number of study protocols have allowed the patients to participate in more than one cohort within a study or more studies within a product. One of the reason could be recruiting patients especially health subjects in Phase I clinical trials, has gained a lot of challenges during the COVID pandemic season. Less health volunteers/patients are willing to participate in clinical trials which brings a lot of pressures on clinical timeline. On the other hand, FDA clearly defined the concept of SUBJID and USUBJID in Study Data Technical Conformance Guideline (TCG, OCT 2021). However, how to handle each scenarios is not always easy for our stats programmers, since this could impact the SDTM and ADaM datasets and bring a lot of potential risks to our programming activities. The previous CDISC presentation from MSI team has proposed the structure of DC domain. However, there are still some remaining questions such as how to handle each scenarios at ADaM level and what is the impaction to other SDTM datasets. This paper will mainly focus on how to handle the subjects enrolled in multiple cohorts or more studies at the programming level. We will give a couple of solid examples to illustrate each scenarios. Suggestions to handle SDTM datasets such as TS, DM, DC, finding domains and ADaM datasets on ADSL will be given with examples as well.


DS-054 : From the Laboratory Toxicity Data Standardization to CTCAE Implementation
Xiaoyin Zhong, Genmab
Jerry Xu, Genamb
Tues, 11:00 AM - 11:20 AM, Location: Room 303-304

Oncology is one of the most prevalent indications for clinical trials today. One specific method of analyzing oncology data relates to quantify the magnitude of abnormalities of clinical laboratory results. However, laboratory data is often challenging to work with during the analysis data set and output creation, especially when lab limits can be assessed bi-directionally. The lab data process becomes more complicated as the latest CTCAE version 5.0 incorporates grading criteria dependent on baseline measurements. In this paper, we will present the approach to ADLB data set creation, laboratory data presentation (grade shift table) as well as deriving the toxicity grade based on CTCAE version 5.0 and our recommendations to some common issues.


DS-061 : ADaM Child Data Set: An Alternative Approach for Analysis of Occurrence and Occurrence of Special Interest
Lindsey Xie, Kite Pharma, Inc.
Richann Watson, DataRich Consulting
Jinlin Wang, Kite Pharma, Inc.
Lauren Xiao, Kite Pharma, Inc.
Tues, 11:30 AM - 11:50 AM, Location: Room 303-304

Due to the various data needed for safety occurrence analyses, the use of a child data set that contains all the data for a given data point aids in traceability and support of the analysis. Adverse Events of Special Interest (AESI) represents adverse events (AEs) that are of particular interest in the study. These AESI could potentially have symptoms associated with them. AESI could be captured as clinical events (CEs) in the CE domain while the associated symptoms for each CE are captured as AEs in the AE domain. The relationship between CEs and associated AE symptoms are important part of safety profile for a compound in clinical trials. These relationships are not always readily evident in the source data or in a typical AE analysis data set (ADAE). The use of a child data set can help demonstrate this relationship, which provides enhanced data traceability to help the review effort for both sponsor and regulatory agency reviewer. This paper provides examples of using a child data set to preserve data with the relationship between CEs and associated AE symptoms in an ADAE child data set (ADAE). In addition, the paper will show that ADAE and ADAE serve as analysis-ready data sets for the summary of AE and AESI.


DS-064 : Upgrading from Define-XML 2.0 to 2.1
Trevor Mankus, Pinnacle 21
Tues, 1:30 PM - 1:50 PM, Location: Room 303-304

The Define-XML standard has developed significantly since its original inception in 2005 when version 1.0 was released. Fast forward to this year and versions 2.0 and 2.1 are the industry standard now. However, about 6 years separate the publication of the two versions and much has changed. This presentation will focus on highlighting the new elements and attributes introduced in version 2.1 as well as discuss some best practices that creators should follow in order to upgrade their existing define 2.0 to the latest published version of the standard. In addition, the presentation will show how Pinnacle 21 Enterprise can be used to aid in the up-versioning process using our Excel-like editor and also cover the new validation rules that were implemented so that you can feel confident your upgraded define.xml file complies with the latest guidance.


DS-066 : Above and Beyond Lesion Analysis Dataset (ADTL)
Raghava Pamulapati, Merck
Tues, 2:00 PM - 2:20 PM, Location: Room 303-304

A subject’s lesion response is often a primary efficacy measure in Oncology studies. A lesion analysis dataset is derived from Tumor Identification (TU) and Tumor Results (TR) SDTM domains. These SDTM datasets contain the subject's Target, Non-Target, and New Lesions information collected at numerous evaluation time points throughout the study. Each patient can have multiple target and / or non-target lesions. The image and measurement of those lesions are collected at each time point along with new lesion information if any.To evaluate the tumor responses to the investigational treatment throughout the study, we develop an analysis dataset(ADTL) with a single record per subject and time point. Those records contain information such as the sum of diameter of target lesions and percentage change from baseline.Calculating the sum of all target lesions for each evaluation time point can be challenging since a subject may not complete all the lesion scans on the same day due to the subject's health issues or other challenges. When subjects have scans for one evaluation time point on multiple dates, it could be difficult to determine which evaluation time point the scans belong to, especially when records with different visits. We set up time windows around each evaluation time point to solve this problem.This paper will demonstrate the basic ADTL data structure and derivations. It will also cover challenging scenarios of lesion assessments collected on multiple days, missing lesion assessment for one or more lesions,merged or split lesions, and how the lesion numbers assigned/changed in crossover period.


DS-095 : Making an ADaM Dataset Analysis-Ready
Sandra Minjoe, ICON PLC
Tues, 2:30 PM - 2:50 PM, Location: Room 303-304

One of the fundamental principles of ADaM is to be “analysis-ready”. But what does that mean, and how do you determine if your analysis dataset is indeed “analysis-ready”? This paper delves into what the ADaM documents say about being “analysis-ready”, including what type of dataset manipulation is allowed (and not allowed) to happen between the ADaM dataset and the statistical output. It describes how to choose the appropriate dataset structure and recommends variables that will help efficiently create different types of analysis output, such as tables and figures. It also describes situations where “analysis-ready” doesn’t apply. This paper includes examples of what you can do to ensure your dataset meets the ADaM “analysis-ready” fundamental principle.


DS-112 : Standardised MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options
Richann Watson, DataRich Consulting
Karl Miller, IQVIA
Tues, 3:30 PM - 3:50 PM, Location: Room 303-304

Ordinarily, Standardised MedDRA Queries (SMQs) aim is to group specific MedDRA terms for a defined medical condition or area of interest at the Preferred Term (PT) level, which most would consider to be the basic use of SMQs. However, what if your study looks to implement the use of SMQs that goes beyond the basic use? Whether grouping through the use of algorithmic searching, using weighted terms or not, or through the use of hierarchical relationships, this paper looks to cover advanced searches that will take you beyond the basics of working with SMQs. Gaining insight to this process will help you become more familiar in working with all types of SMQs and will put you in a position to become the "go-to" person for helping others within your company.


DS-121 : How to utilize the EC domain to handle complex exposure data
Varsha Korrapati, Seagen Inc.
Johnny Maruthavanan, Seagen
Tues, 4:00 PM - 4:20 PM, Location: Room 303-304

Dose modifications including dose elimination, hold, delay, reduction and mid-cycle adjustments due to treatment-related toxicities are common in oncology clinical trials. In certain studies, the design of CRFs that capture treatment administration involves a high degree of complexity to account for such unplanned dose modifications that occur within or between treatment cycles. Under those circumstances, the creation of an SDTM EC (Exposure as Collected) data set helps capture all dose modifications that may occur to a protocol-specified treatment administration. The EC data set in turn facilitates a seamless functional path to derive the subsequent EX (Exposure) data set that contains information on actual drug administered within each study cycle; this separation makes reviewing and consuming these data a lot easier. This paper summarizes one approach of leveraging EC domain to capture dose modification scenarios, outlining the process flow and traceability of data from EC to EX with an example based on CDISC SDTMIG v3.3, and highlighting the benefits to reviewability and accessibility of constructing these SDTM data sets in this manner.


DS-134 : Redefining Industry Standards: Making CDASH and SDTM Work Together from the Database Level
Chad Fewell, Clinipace
Jesse Beck, Clinipace
Tues, 8:30 AM - 8:50 AM, Location: Room 303-304

In today's industry, there appears to be a disconnect between the purpose of designing a database and how the database can be utilized to represent a client’s needs efficiently. The database is not just a tool for capturing data. It is also a tool that can be standardized to provide SDTMs almost instantaneously. The industry has moved towards CDASH being the standard for database design however there appears to be a fundamental misunderstanding of "Why" CDASH is needing to be used. This has led to a lot of the efficiencies that CDASH can provide never coming to fruition for SDTM creation. This inefficiency in CDASH primarily comes from a lack of understanding of how both CDASH and SDTMs work together. When a database is designed in RAVE, TrialMaster, or any other electronic database system there needs to be a strong SDTM influence into variable design as well. There is a flexibility in CDASH variable naming that just cannot be duplicated within SDTMs as well as the ability to preemptively check for data entry errors typical checks do not look for. The flexibility of naming for CDASH variables can be leveraged into a standard process which could ultimately allow for SDTMs to be written and Pinnacle 21 checks designed at the CDASH level all while abiding by CDASH and SDTM standards and naming rules. However, this can only be done if the database design is created by an individual or team of colleagues that are experts in both CDASH and SDTM.


DS-139 : A Tale of Two CDISC Support Groups - Supporting CDISC Standards is Anything but Standard
Christine McNichol, Labcorp
Mike Lozano, Eli Lilly and Company
Tues, 9:00 AM - 9:20 AM, Location: Room 303-304

It was the cleanest compliance report, it was the worst of compliance reports, it was the team of subject matter experts, it was the team mapping their first SDTM study, it was the well tested standard process used for years, it was the project without a standard and needing development. Any CDISC support team has likely been faced with supporting many of these. Their overall goal is the same: to support teams in the implementation of CDISC standards such as SDTM and ADAM in order to produce a high quality compliant product following CDISC standards for submission. However, the challenges, structures and approaches to support can differ greatly between companies. As a result, CDISC standards support teams come in many different shapes and sizes with different processes and methods of support. This is the tale of two CDISC support groups, one pharma and one CRO, navigating these challenges with converging paths to one common goal.


DS-197 : Clinical Classifications: Getting to know the new kid on the QRS block!
Soumya Rajesh, CSG Llc. - an IQVIA Business
Michael Wise, ApellisPharmaceuticals
Tues, 9:30 AM - 9:50 AM, Location: Room 303-304

Questionnaires, Ratings and Scales (QRS) encompasses 3 main concepts: Questionnaires (QS), Functional Tests (FT), and Clinical Classifications (RS) - the new kid on the QRS block. As with all things new, there may be some confusion in mapping Clinical Classifications, versus the other members of the QRS family. To add to this confusion, Clinical Classifications have recently been moved from QS to RS - a domain that was originally developed just for Oncology (Disease Response). Knowing the distinction between a Questionnaire and Functional Test, as well as what distinguishes them from a Clinical Classification is critical here. This presentation begins by introducing the members of the QRS family, and provides differences among them, by using a few published instruments for illustration. The Six Minute Walk and AUDIT-SR show how FT differs from QS. Then we introduce AIMS as a rather straightforward clinical classification, (one that was recently moved from QS to RS), and ATLAS is discussed to illustrate how clinical classifications utilize data from multiple domains. The paper also outlines the rules that should be followed to map a clinical classification that sources data from multiple domains. Finally, we present a sponsor created VAS as another example of an instrument that, despite appearances, does not conform to QRS standards and is not mapped to QS. General guidance for mapping, a decision tree map, and references that allow a more detailed understanding of clinical classifications, sum up this paper on the QRS family.


Data Visualization and Reporting

DV-040 : The power of data visualization: Seaborn vs SGPLOT.
Maryna Aksaniuk, Quartesian
Mon, 8:00 AM - 8:20 AM, Location: Lone Star B

Data visualization is the bridge between human and machine. It makes data more understandable for our perception. Nowadays there are plenty of programming languages that allow you to create various data visualization outcomes depends on your requirements and purposes. The million-dollar question is: which one is the best? Most popular ones that widely used in the data science community are Python and SAS, which generally is applied in more specific areas. The purpose of this paper is to overview and compare features of Python matplotlib based library Seaborn and SAS the SGPLOT procedure; demonstrate their strengths and weaknesses, attractiveness, versatility, and comprehensibility.


DV-056 : Rita: Automated Transformations, Normality Testing, and Reporting
Daniel Mattei, Tradecraft Clinical Research
Tues, 9:30 AM - 9:50 AM, Location: Lone Star B

R is an open-source programming language that allows for highly customizable analyses and visualization capabilities within a more traditional computing environment. Statistical programmers within the clinical trials industry have taken advantage of its flexibility to create specialized packages for the creation of SDTM and ADaM data structures, tables, figures, and listings (TFLs), as well as import/export capabilities to interface with SAS datasets and associated metadata. More general packages aimed at users transitioning from SAS, providing critical functions such as descriptive statistical reports, hypothesis-testing to assess parametric assumptions of normality, and nonlinear transformations are not yet widely available for use on tabular data, as is typically the case when working with clinical data structures. Rita, a software package providing these functions, uses feature-detection algorithms to select for numeric columns (converting the type, if necessary), conduct normality testing on each field, and perform all available transformations on each column, selecting for the best-performing transformation. Rita is presented here to facilitate adoption of the package for transitioning users. This paper provides a tutorial for the features presented above, as well as Rita’s several quality-of-life features, such as plotting capabilities that automatically select the most fitting way to visualize each column, detection and removal of null records, and the ability to customize which plots, normality tests, and transformations are applied if one wishes to designate these settings. Lastly, a Shiny app is provided to demonstrate Rita’s features with data from the CDISC Pilot 01 Study and used to explain core functions for interested readers.


DV-071 : Introduction to Annual Reporting for Beginners!
Akshita Gurram, Seagen
Mamatha Mada, Seagen
Mon, 9:30 AM - 9:50 AM, Location: Lone Star B

In clinical trials, the safety information of the investigational drug plays a critical role. Regular analysis of safety is crucial for the assessment of risk to trial participants and to understand the risk vs. benefit of the medicinal product. Summaries of safety information are submitted to the regulators and other committees at regular periodic intervals to monitor the safety profile of an investigational drug. The Development Safety Update Report (DSUR) is the pre-marketing equivalent of the post-marketing Periodic Benefit-Risk Evaluation Report (PBRER). The DSUR goal aims to assess risk and any changes in risk since the previous DSUR, while the PBRER goal is to present a comprehensive and critical analysis of new or emerging information on the risks of the drug. Both the DSUR and PBRER are produced once every six months, annually or less frequently, based on national or regulatory requirements. These reports evaluate adverse events that occur during the clinical trial as tracking them is vital in determining if the event is related to the investigational drug. Also, the demographics of the participants, whether participants complete or discontinue the study, why they discontinue the study, and information related to death and exposure are captured in the tables and listings. This paper focuses on the approach of handling multiple studies across an indication and the detailed information of the outputs that are required for submission. It introduces a flexible reporting system with examples and programming solutions to generate the reports required for the regulatory submission.


DV-073 : SAS ® ODS Application of Interactive and Informative Clinical Trial Reports
Luwei Pang, MSD
Aiming Yang, Merck & Inc
Mon, 10:30 AM - 10:50 AM, Location: Lone Star B

Graphs and charts are used extensively in the clinical-trial efficacy and safety analyses. They transform large amounts of information and statistical results into easy-to-understand formats. Visualization methodologies have been developed in different languages and software. For clinical trials, directly using the analysis datasets in real time without transferring data from one language/system such as SAS to another language/system such as R is desired. It is also desired to have visualization programmatically rather than data to be transferred to excel spreadsheet to manually redraw. Meanwhile, compared with the traditional graphs and charts, more interactive and informative visualization outputs are in demand to enhance the reviewing efficiency and data traceability. This paper provides the approaches of creating the interactive outputs of the time to event Kaplan-Meier plots as an example of the efficacy analysis in Oncology trials and safety analysis charts by using SAS ® ODS Graphic data tip technique. The technique displays “details-on-demand” as the cursor hovers over the points in the graphs, the more detailed information will be displayed. We also provide the dashboard application of SAS ® ODS Layout to arrange the text, graphs, and tables side by side on the same page for efficient reviewing. Both types of SAS ODS code are presented and explained.


DV-074 : Metadata Driven Approach for Creation of Clinical Trial Figures
Jeremy Gratt, Modular Informatics LLC
Qiuhong Jia, Seattle Genetics
Girish Kankipati, Seagen Inc
Mon, 2:30 PM - 2:50 PM, Location: Lone Star B

In clinical studies, a variety of figures is required to visually analyze the responses to study drug. Creation of these figures presents many programming challenges. For example, SAS® provides multiple figure creation methods and this makes it difficult for programmers to learn. Modern figures used for clinical trials often overlay multiple kinds of data within a single graph, and this complicates the input data structure used to generate the figures. Within SAS®, different kinds of figure elements (lines, bars, markers) have different rules for use, and these differences can also be tricky to learn. Creation of legends that combine and describe the various figure elements can present challenges that are not always natively handled within SAS®. Finally, clinical study teams may ask for study-specific customizations to describe specific trial designs. In the paper we describe these various challenges to figure generation and present our solution for creation of standard macros for spider plots, waterfall plots and swimmer plots that can be used across clinical programs. This solution includes a novel use of metadata provided by programmers at the study level that allows for customization of any number of figure elements (bars, lines, markers), flexibility towards handling of input data of varying structures, and custom creation of complex legends. We also discuss how use of metadata allows for standardization of the figure elements across studies and clinical programs.


DV-082 : Building Dashboards for Data Review and Visualization using R Shiny
John saida Shaik, Seagen Inc.
Sreeram Kundoor, Kite Pharma Inc.
Mon, 11:00 AM - 11:20 AM, Location: Lone Star B

In the pharmaceutical and biotech industry, SAS® is a widely used software for clinical trial data analysis and to prepare CDISC-compliant data sets for FDA submission. Though SAS® is a robust and feature-rich programming language, access to features that enable development of interactive data exploration tools may be beyond the scope of the license of some companies. In scenarios where ad-hoc or exploratory analyses are desired with quick access to data, R, as an open-source software, is suitable and freely available with large community support behind it. Such quick access by cross-functional teams can be especially beneficial for efficacy data in single-arm clinical trials, for which interactive dashboards can easily be built using the Shiny package in R. These dashboards, also known as Shiny web apps, can be tailored to the specific needs of the study team. In this paper, a brief introduction to R programming and building dashboards using the R Shiny package to explore and visualize efficacy data in oncology clinical trials – including summary statistics and graphs such as waterfall, KM, and box plots – will be discussed in more detail.


DV-088 : Generating Forest and Kaplan Meier graphs for Regulatory Submission: Comparison of SAS and R
Girija Javvaji, Covance Inc
Sivasankar Konda, Covance Inc
Mon, 11:30 AM - 11:50 AM, Location: Lone Star B

Statistical graphs are the essential component of the clinical trials which are included in the submission package to the regulatory agencies. Currently most of the graphs are generated using SAS and in some instances open source technologies like R and Python are being used in CRO and Pharma. In this paper we would like to discuss some of the prominent clinical graphs such as Forest plot and Kaplan-Meier plot using SAS and R. In this paper we have highlighted some of the differences, features and compared side-by-side outputs that are generated in both languages. When we have the knowledge and software available for both SAS and R, we have the luxury to choose them based on the graphs that we need to plot. With the upgraded versions of Graphic Template Language (GTL), SAS is getting user friendly to accomplish the desired high quality clinical graphs while R being an open source and is easy to adapt. ggplot2 package from R is a robust library which has the features to plot most of the clinical graphs. Additionally, there are many dedicated R packages to create specific graphs such as ‘survival’ for Kaplan-Meier, ‘forestplot’ package for Forest plots and there is scope to create more advanced customized plots in the future. We are using GTL for SAS which might be embedded in macros and for R, we are covering all the graphs using ggplot2 package. We will distinguish outputs using three factors i.e. Time, Quality and Resources.


DV-089 : Generating Bar Chat and Mutipanel graphs using SAS vs R for Regulatory Submission
Girija Javvaji, Covance Inc
Alistair D'Souza, Covance Inc
Mon, 2:00 PM - 2:20 PM, Location: Lone Star B

In the submission package to the regulatory agencies, graphs play a vital role in interpreting the clinical data in the pictorial form. Until recently most prominently used software to create graphs was SAS but after knowing the importance of other languages (like R, Python, Spotfire etc.) in generating graphs, we tend to compare them to know the advantages and limitations of one another. Among these alternative options, R has been adopted in the pharmaceutical companies along with the traditional SAS language. In this paper we would like to discuss some of the prominent clinical graphs such as Bar chat and Multipanel plot using both SAS and R. A side-by-side comparison of both the graphs in SAS and R using their respective syntax, key differences and features that are available in both languages are highlighted in this paper. With the availability of both SAS and R softwares, we have the option to choose them based on the graphs that we need to plot. SAS is getting user friendly to accomplish the desired high quality clinical graphs with the upgraded versions of Graphic Template Language (GTL) while R is an open source software and is easy to adapt. ggplot2 package from R is a robust library which has the features to plot most of the clinical graphs. We are using GTL for SAS which might be embedded in macros and for R, we are covering all the graphs using ggplot2 package. We will distinguish outputs using three factors: Time, Quality and Resources.


DV-113 : A Dynamic Data Visualization Report for Efficacy Endpoints of Early Oncology Study
Hong Zhang, Merck & Co
Yixin Ren, Merck
Huei-Ling Chen, Merck & Co.
Mon, 3:30 PM - 3:50 PM, Location: Lone Star B

In a solid tumor early oncology study, efficacy analysis usually focuses on tumor size change and the best overall response of the drug treatment effect. Clinical clinicians and statisticians analyze the drug effect based on the summarized results and individual patient-level analyses. When they have question on certain data points, they prefer more detailed individual patient data to better understand the data. The clinical programmers often work with the clinicians and statisticians in the data research effort by providing requested data. This paper presents a macro tool that creates a dynamic visualization report specifically for solid tumors oncology studies to avoid repeated programming efforts to answer clinicians’ and statisticians’ requests. This macro tool connects the patient’s solid tumor size, response, and any critical events in the study to produce by-patient-level listing worksheets in excel file format. This report is well received by clinicians, statisticians, and programmers due to significantly reduced communication and coding time. This paper will discuss the details of this tool and the key syntax.


DV-120 : R And SAS in Analysis Data Reviewer’s Guide and Data Visualization
Fan Lin, Gilead Science
Mon, 4:00 PM - 4:20 PM, Location: Lone Star B

Although SAS has been the preferred programming tool in clinical studies for decades, R is gaining more popularity recently due to its flexibility and advanced graphical capabilities in data visualization. The primary scope of this paper is to compare SAS and R in automation of Analysis Data Reviewer’s Guide (ADRG) in preparation of Section 7.2, Analysis Output Programs. The secondary scope of this paper is using R for data visualization. In this paper the automation processes using SAS and R are described and the advantage or disadvantage of each language is summarized. Also the easy use of R in data visualization to create the complicated circular plot is demonstrated.


DV-172 : SAS® PROC GEOCODE and PROC SGMAP: The Perfect Pairing for COVID-19 Analyses
Louise Hadden, Abt Associates Inc.
Tues, 10:30 AM - 10:50 AM, Location: Lone Star B

The new SAS® mapping procedure PROC SGMAP is adding capability with every release. PROC SGMAP was introduced in SAS 9.4M5 as an extension of (ODS) Graphics techniques to render maps and then overlay plots such as text, scatter, or bubble plots. It has contributed a lot of functionality which used to be reserved for SAS/GRAPH users to BASE SAS – including PROC GEOCODE. PROC GEOCODE has been available in SAS/GRAPH since Version 8.2, and recently became available in BASE SAS with a number of other tools in Version 9.4 Maintenance Release M5. SAS provides a link to files required for street level geocoding and more on SAS MAPSONLINE. The ongoing COVID-19 pandemic has produced massive amounts of epidemiological and surveillance data, much of which can be linked to geography as countries, including the United States, grapple with how to address the constantly transmuting contagion. The combination of PROC SGMAP and PROC GEOCODE is well positioned to help researchers address and visualize COVID-19 data. This paper and presentation will walk through the graphic representation of publicly available COVID data sources using PROC SGMAP and PROC GEOCODE.


DV-173 : Copy comments: A Dynamic solution to solve the age-old need
Joseph Cooney, OPKO Health Inc.
Tues, 9:00 AM - 9:20 AM, Location: Lone Star B

Over the years, I have come to know that just about every Data Management team needs Copy Comments functionality. Whether it be achieved through fancy applications, Excel macros, or SAS programs, one thing is always certain, DM needs their comments carried from one output to the next. With nearly 15 years of Clinical Data Programming experience spread across multiple employers and sponsors, I have seen what works, and simply, what does not work. I believe that I have come up with a solution that trumps each of the methods that I have seen used in the past and requires only Microsoft Excel and SAS access with limited programming experience. I have created a SAS Copy Comments macro that when executed prior to exporting data, will dynamically import a listings previous output, compare to current final dataset, and copy DM comments. Since this macro is ran at the end of each program and therefore, the final dataset is available, the proc contents can be used to dynamically generate the import statement for the previous output to ensure an exact match in variables and formatting. This ensures that same records will always compare, and most importantly, DM comments will be copied. Lastly, since the macro is dynamic, it can be implemented into your automated listings refresh process with zero need for manual manipulation.


DV-181 : Designing and Implementing Reporting Meeting Federal Government Accessibility Standards with SAS®
Louise Hadden, Abt Associates Inc.
Mon, 9:00 AM - 9:20 AM, Location: Lone Star B

SAS® software provides a number of tools with which to build in and enhance accessibility / 508 compliance for data visualization and reporting, including ODS PDF options, ODS HTML5 options, the CALL SLEEP and CALL SOUND routines, and the SAS Graphics Accelerator. As amazing as these tools are, successfully implementing accessible reporting requires planning and design, and some creative use of additional SAS tools. This paper and presentation will demonstrate how to plan for success with accessible visualization design using the above mentioned tools, and more.


DV-185 : GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS® PROC GINSIDE
Troy Hughes, Datmesis Analytics
Tues, 8:30 AM - 8:50 AM, Location: Lone Star B

The GINSIDE procedure represents the SAS® solution for point-in-polygon determination—that is, given some point on earth, does it fall inside or outside of one or more bounded regions? Natural disasters typify geospatial data—the coordinates of a lightning strike, the epicenter of an earthquake, or the jagged boundary of an encroaching wildfire—yet observing nature seldom yields more than latitude and longitude coordinates. Thus, when the United States Forestry Service needs to determine in what zip code a fire is burning, or when the United States Geological Survey (USGS) must ascertain the state, county, and city in which an earthquake was centered, a point-in-polygon analysis is inherently required. It determines within what boundaries (e.g., nation, state, county, federal park, tribal lands) the event occurred, and confers boundary attributes (e.g., boundary name, area, population) to that event. Geographic information systems (GIS) that process raw geospatial data can struggle with this time-consuming yet necessary analytic endeavor—the attribution of points to regions. This text demonstrates the tremendous inefficiency of the GINSIDE procedure, and promotes GeoWaffles as a far faster alternative that comprises a mesh of rectangles draped over polygon boundaries. GeoWaffles debuted in the 2013 white paper Winning the War on Terror with Waffles: Maximizing GINSIDE Efficiency for Blue Force Tracking Big Data (Hughes, 2013), and this text represents an in-memory, hash-based refactoring. All examples showcase USGS tremor data as GeoWaffles tastefully blow GINSIDE off the breakfast buffet—processing coordinates more than 25 times faster than the out-of-the-box SAS solution!


Hands-On Training

HT-103 : ODS Document & Item Stores: A New Beginning
Bill Coar, Axio, a Cytel Company
Tues, 3:30 PM - 4:50 PM, Location: Brazos

Over the years, there seems to be a constant need to improve processes for creating a single file deliverable from multiple (tens or hundreds?) tables, listings, and figures. However, the use of item stores, ODS Document, and Proc Document are available tools that often go unnoticed. Many options have been presented and thoroughly discussed, but relatively few discuss these techniques that are available with Base SAS. An item store is a SAS library member that consists of pieces of information (ie, procedure output) that can be accessed independently. With item stores, procedure output can be created at one point in time and accessed at a later point in time. Item stores are created using ODS Document statements and accessed using Proc Document. Before diving in to using item stores for combining tables, listings, and figures, we propose heading back to basics with an introduction to item stores, ODS Document, and Proc Document in a more general setting using basic procedures such as Proc Means, Proc Freq, and Proc Univariate. We then take these concepts and extend them to Proc Report, including the use of by-group processing. This Hands-on-Workshop will introduce the user to item stores, ODS Document, and Proc Document. By the end of the workshop, the user will have enough insights to use the technique to obtain a single file containing a set of TLFs. The use of ODS is required in this application using SAS 9.4 in a Windows environment


HT-186 : Team Code Collaboration in RStudio with Git
Phil Bowsher, RStudio Inc.
Cole Arendt, RStudio PBC
Mon, 8:30 AM - 9:50 AM, Location: Brazos

RStudio will be presenting an overview of team collaboration via git in RStudio for the R user community at PharmaSUG. This talk will review how to use RStudio to collaborate with your team on managing code in git. This is a great opportunity to learn about best practices for managing code in a team-based environment where multiple people are working from the code. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to the current landscape of versioning as well as recent developments using CICD. This presentation will break down steps for clinical statisticians to follow that are new to managing code in a versioning tool.


HT-201 : What’s black and white and sheds all over? The Python Pandas DataFrame, the Open-Source Data Structure Supplanting the SAS Data Set
Troy Hughes, Datmesis Analytics
Mon, 10:30 AM - 11:50 AM, Location: Brazos

Tired of paying for your SAS license and curious about exploring the most popular, freely downloadable open-source data analytic software?! This HANDS-ON training will introduce the Python “Pandas” library, the predominant data manipulation module within the Python language, and the “DataFrame” data structure. All examples will first be shown in BASE SAS, after which a functionally equivalent Python solution will be demonstrated and explored. Students desiring to run examples should install Python (version 3.10.0 or higher) and the Pandas library (version 1.4.1 or higher). No previous Python experience is required to attend!


HT-202 : Learn to Visualize Market Analysis Data Using SAS® PROC GEOCODE, PROC GINSIDE, and Prescriber Characteristics
Louise Hadden, Abt Associates Inc.
Mon, 1:00 PM - 1:50 PM, Location: Brazos

"Hands On” workshop will walk attendees through the process of obtaining geographic data at various levels, address information, population characteristics, and prescriber characteristics for the state of Texas; and using BASE SAS procedures such as PROC GEOCODE and PROC GINSIDE to pinpoint geographies, perform market analyses, and produce data visualizations. Public Use Data files will be used to demonstrate concepts that are directly transferrable to the pharmaceutical industry.


HT-203 : Following Your Footsteps: Maintaining Traceability in ADaM Datasets
Nancy Brucken, IQVIA
Karl Miller, IQVIA
Mon, 2:00 PM - 2:50 PM, Location: Brazos

Traceability is one of the fundamental principles of ADaM. The ability to follow a data point from the TFLs back to its location in an ADaM dataset, and then back to its origin in SDTM or other ADaM datasets is one of the requirements for a dataset to be considered conformant to the ADaM model. This workshop will cover several methods for incorporating traceability in your ADaM datasets using standard ADaM variables and metadata.


HT-204 : How to use Git with your SAS projects
Chris Hemedinger, SAS
Tues, 8:30 AM - 9:50 AM, Location: Brazos

Are you asked to use source management and DevOps tools to manage your work and integrate with production? Git – the most popular source-management system – is often central to this integration. In this Hands-on Tutorial, we’ll explore how to create a Git repository and add your SAS content. We will learn about the Git integration methods within SAS tools to manage and update your SAS content in a collaborative way. We will practice the use of Git commands, Git interfaces in SAS Studio and SAS Enterprise Guide, and Git functions within the SAS programming language.


HT-205 : Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Getting Started)
Matthew Slaughter, Kaiser Permanente Center for Health Research
Isaiah Lankham, University of California Office of the President
Mon, 3:30 PM - 4:50 PM, Location: Brazos

Interested in learning Python? How about learning to make Python and SAS work together? In this hands-on training, we'll practice writing Python scripts using Google Colab (https://colab.research.google.com/), which is a free online implementation of JupyterLab, and we'll link to SAS OnDemand for Academics (https://welcome.oda.sas.com/) to access the SAS analytical engine. We'll also learn to use the popular pandas package, whose DataFrame objects are the Python equivalent of SAS datasets. Along the way, we'll work through common data-analysis tasks using both regular SAS code and Python together with the SASPy package, highlighting important tradeoffs for each and emphasizing the value of being a polyglot programmer fluent in multiple languages. This will include a beginner-friendly overview of Python syntax and data structures. SASPy is a module developed by SAS Institute for the Python programming language, providing an alternative interface to the SAS system. With SASPy, SAS procedures can be executed in Python scripts using Python syntax, and data can be transferred between SAS datasets and their Python DataFrame equivalent. This allows SAS programmers to take advantage of the flexibility of Python for flow control, and Python programmers can incorporate SAS analytics into their scripts and applications. This class is aimed at SAS programmers of all skill levels, including those with no prior experience using Python or JupyterLab. Accounts for Google and SAS OnDemand for Academics will be needed to interact with code examples (see the instructions at https://tinyurl.com/SASPySetupTest).


HT-206 : Getting Started Using SAS Proc SGPLOT for Clinical Graphics
Josh Horstman, Nested Loop Consulting
Tues, 10:30 AM - 11:50 AM, Location: Brazos

Do you want to create highly-customizable, publication-ready clinical graphics in just minutes using SAS®? This hands-on training introduces the SGPLOT procedure, which is part of ODS Statistical Graphics, included in Base SAS®. We start with the basic building blocks and work through examples of several different types of graphs commonly used in clinical reporting as well as simple ways to customize each one.


HT-207 : Using Tplyr to Create Clinical Tables
Mike Stackhouse, Atorus Research
Jessica Higgins, Atorus
Tues, 1:00 PM - 2:20 PM, Location: Brazos

Getting to know the world of R can be difficult. With the huge variety of open source packages available on CRAN, it can be challenging to know the right tools to get the job done. This workshop will help makes things clearer by giving you a hands on walk through of using the R package Tplyr to create clinical safety summaries. By the end of the 90 minutes, you will have R code in hand to create a demographics, adverse events, and lab summaries, and a basic understanding of how Tplyr can support these tables for you.


Leadership Skills

LS-029 : Developing and running an in-house SAS Users Group
Stephen Sloan, Accenture
Mon, 1:30 PM - 1:50 PM, Location: Lone Star C

Starting an in-house SAS ® Users Group can pose a daunting challenge in a large worldwide organization. However, once formed, the SAS Users Group can also provide great value to the enterprise. SAS users (and those interested in becoming SAS users) are often scattered and unaware of the reservoirs of talent and innovation within their own organization. Sometimes they are Subject Matter Experts (SMEs); other times they are new to SAS but provide the only available expertise for a specific project in a specific location. In addition, there is a steady stream of new products and upgrades coming from SAS Institute and the users may be unaware of them or not have the time to explore and implement them, even when the products and upgrades have been thoroughly vetted and are already in use in other parts of the organization. There are often local artifacts like macros and dashboards that have been developed in corners of the enterprise that could be very useful to others so that they don’t have to “reinvent the wheel”.


LS-033 : Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
Stephen Sloan, Accenture
Lindsey Puryear, SAS Institute
Mon, 2:30 PM - 2:50 PM, Location: Lone Star C

The Challenge: Instead of managing a single project, we had to craft a solution that would manage hundreds of higher- and lower-priority projects, taking place in different locations and different parts of a large organization, all competing for common pools of resources. Our Solution: Develop a Project Optimizer tool using the CPM procedure to schedule the projects and using the GANTT procedure to display the resulting schedule. The Project Optimizer harnesses the power of the delay analysis feature of PROC CPM and its coordination with PROC GANTT to resolve resource conflicts, improve throughput, clearly illustrate results and improvements, and more efficiently take advantage of available people and equipment.


LS-047 : Leading into the Unknown? Yes, we need Change Management Leadership
Kevin Lee, Genpact
Mon, 2:00 PM - 2:20 PM, Location: Lone Star C

The paper is written for those who want to lead the new changes in biometric department. Currently, the biometric department is going through Big Changes from traditional SAS programming to open-source programming, cloud computing, data science or even Machine Learning, and how to manage and lead those changes becomes critical for the leaders so that changes could be achieved under budget and on schedule. Change Management is the activities/processes that support the success of changes in the organization and is considered as a leadership competency for enabling changes within the organization. More importantly, the success rate of the changes directly correlates with change management by the leaders. Leaders with excellent change management is six times more likely to succeed than ones with poor change management. The paper will discuss major obstacles that leaders will face such as programmer/middle management resistance, insufficient support. And it will also discuss about success factors that leaders could implement in change management such as detailed planning, dedicated resources and funds, experiences in change, participation of programmers, frequent transparent communication, and clear goals. Finally, the paper will show the examples of how change management effectively lead the success of Open Source Programming Migration from SAS for the department of more than 150 SAS programmers.


LS-058 : Visualization of Programming Activities and Deliveries for Multiple Clinical Studies
Zhouming(Victor) Sun, Astrazeneca
Mon, 8:30 AM - 8:50 AM, Location: Lone Star C

Managing programming activities is very challenging, especially when multiple studies are involved with competing deliveries for each study. As a project programming lead, to ensure that all programming deliveries are on track with high quality and an optimized resource usage is critical. Therefore, an efficient planning and monitoring is key to a successful programming management. This paper presents one comprehensive visualization of the programming activities and deliveries for multiple studies, including key delivery timelines and major study information for tracking progresses in real-time. The graph is generated using SAS Graph Template Language (GTL), which enable to create your own custom graphs or to modify graphs created by SAS analytical procedures. The example presented in this paper illustrates how assignments and deliveries from multiple studies can be dynamically visualized and managed in a powerful manner.


LS-098 : Schoveing Series 5: Your Guide to a Quality NO
Priscilla Gathoni, AstraZeneca, Statistical Programming
Mon, 9:30 AM - 9:50 AM, Location: Lone Star C

Have you ever said NO and felt guilty, ashamed, anxious, frustrated, or angry afterward? Or better still, have you have said NO and walked away feeling calm, poised, balanced, and centered? The two-letter word NO can pose both danger and authority when proclaimed, and that is why understanding the various ways of saying NO is crucial. Leadership, management, coaching, mentoring, facilitating, parenting, marriage, teaching, recruiting, hiring, firing, and many other roles require the mastery of the word NO. The most valuable assets as you progress in your career and life are your time, the ability to think, doing things in the order of their importance, and relationship building. Therefore, protecting these valuable commodities requires a strategic and tactical delivery method of saying NO. Your time on earth is too precious to be wasted with the inability to say NO or have failed relationships. This paper will help you unlock seven ways of saying NO by deprogramming your habitual thinking patterns to achieve greater self-confidence, control, trust, and create healthy relationships. The art of saying NO in a quality way allows you to surround yourself with positive-minded people and eliminate toxic people, jobs, and relationships.


LS-101 : How Clear Communication & Expectations Improve the CRO – Sponsor Relationship: the CRO Perspective
Corey Evans, LLX Solutions, LLC
Samantha Kennedy, LLX Solutions, LLC
Tues, 9:30 AM - 9:50 AM, Location: Lone Star C

The Sponsor – CRO relationship is a critical one in the life cycle of clinical trials. Fostering a positive relationship through clear communication as well as clear expectations and timelines is critical in maintaining harmony. A relationship that lacks any one of these key characteristics, whether it is communication, expectations, or no clear timeline can quickly lead to tension and frustration on both sides. Minimizing frustration and tension as much as possible is paramount in preserving a good relationship, which may reduce study team turnover, and will hopefully lead to a long-term relationship between the Sponsor and CRO.


LS-109 : Diversity, Equity, and Inclusion in the Workplace: Challenges and Opportunities
Eunice Ndungu, Merck & Co.
Rohit Alluri, Merck & Co.
Radha Railkar, Merck & Co.
Wed, 8:00 AM - 9:50 AM, Location: Lone Star B-C

The objective of this panel discussion is to start a dialogue on the topic of Diversity and Inclusion (D&I) in the workplace. Advancing workplace diversity is more important today than ever before. Benefits of a diverse and inclusive workplace include a deeper trust and commitment from employees, respect among employees of different backgrounds, and the ability to integrate different perspectives to drive innovation. The panel members will provide their thoughts on many aspects of D&I including what it means to them, recognizing and reducing unconscious bias, what steps should be taken to promote the understanding of D&I in the workplace, mentorship of students from under-represented groups to promote their understanding of careers in programming, data science and statistics, and how to recruit and retain a diverse workforce.


LS-136 : Exciting Opportunities for Fresher
Rajinder Kumar, Novartis Healthcare Private Limited
Tues, 8:30 AM - 8:50 AM, Location: Lone Star C

Should we hire a fresher or experienced associate is the corporate version of hen or egg phrase? Everyone wants the experience in his/her team. Ironically, no one wants to groom a fresher into an experienced resource. The feeling that experience will add to richness of best practices learnt and better execution strategies keeps us blinded to many positive facets of the fresh energy, which the entrants can add to any industry. The advantage of being the lowest in hierarchy and no earned badges to lose gives the freshers innovative edge and higher risk appetite. This keeps their fear of mind more open to new ideas and think out of box. We are living in VUCA(Volatile, Uncertain, Complex and Ambiguous) world. Most of our experienced associates are not much comfortable with VUCA world. They have evolved into a particular way of working which was part of their experience journey. With no doubt on their capabilities to learn the new way, they will always be behind the current generation who has lived this way from the start itself. Freshers are still free to try different things before being specialized into a particular domain. They can be groomed as per the current needs with positive and inclusive mentoring and growth opportunities. The perks of learning and growth will always attract their loyalty and the team will stay with you longer. Therefore, the need for a balanced team has and will always be an adequate mix of experience as well as the freshers.


LS-137 : Empower Your Programmers: Get the Most from Your Programming Team
Carol Matthews, Advance Research Associates, Inc.
Mon, 4:00 PM - 4:20 PM, Location: Lone Star C

Experts agree that empowering your workforce can provide a path to increased productivity, higher quality and greater employee retention. While the concept of empowerment seems like an obviously good idea, too often leaders behave in ways that do just the opposite. It is important for leaders to understand what empowerment really means, how to foster an environment that promotes employee engagement, and how that can be translated into a happier, more competent and more productive workforce. Particularly in the highly regulated technical world in which pharmaceutical programmers work, it can be challenging for leaders to trust that their staff will make effective decisions on their own. We will discuss what it means to empower your programming staff and practical ways to do so while reducing risk and realizing the benefits to be gained.


LS-141 : Rethinking Programming Assignments
Scott Burroughs, PAREXEL International
Tues, 9:00 AM - 9:20 AM, Location: Lone Star C

Programmers working on a project team in the pharmaceutical industry can reside in all regions of the world, often with little or no work hours overlapping. What happens when deadlines are approaching with much work to be done and the production programmer for a data set or display and the QC programmer work in different regions of the world? Often it means working odd and/or long/extra hours so questions can be answered immediately and more back-and-forth production and QC runs can be made at the same time to get your tasks done on time. But does it have to be this way? Can we re-think this process so people aren’t working so many odd/extra hours to get things done? This paper will examine potential fixes, including findings from an experiment trying to see what works and what doesn’t.


LS-147 : Agile innovation – an adaptive approach to transformation in Clinical Research Organizations
Aman Bahl, SYNEOS HEALTH
Hrideep Antony, Syneos Health USA
Mon, 9:00 AM - 9:20 AM, Location: Lone Star C

Healthcare organizations today need to be agile in the age of digital transformation. Creativity, innovation, and sustainability are some of the key skills that organizations today need to be continually adaptive. Agile helps the organization to quickly respond and adapt to ever-changing customer needs and evolving research requirements. Agile relies on iterative improvement, encouraging adaptability and quick responses to validated feedback. This process can help in integrating design with development, adaptation, thus making innovation processes significantly faster. In a clinical organization, the need for innovation depends on several factors like customer needs, changing roles, shift in mindset, automation, new tools, and technologies etc. In the paper, we will be discussing some of the above Agile innovation areas with examples where organizations need to focus on creating an agile and adaptive innovation culture. In order to be agile, organizations need to deeply understand the power of intentional social interactions in facilitating the flow of ideas, information, and insights. Adaptive space creates connections that serve to discover, develop, and diffuse new ideas into and across an organization. We can provide useful innovative solutions once we understand the needs of the customer. This process is facilitated using an agile approach of providing several prototype iterations with frequent feedback and revisions. This paper will also provide detailed insights on the above agile iterative process. We will also discuss few areas where clinical organizations could use adaptive approach to drive innovations along with barriers and challenges to adoption.


LS-209 : In the world of decentralized trials, do we still need to do double programming?
David D’Attilio, Sierra Data Solutions
Nithiya Ananthakrishnan, Algorics
Praveen Garg, AstraZeneca
Priscilla Gathoni, AstraZeneca, Statistical Programming
Dilip Raghunathan, Independent
Mon, 11:00 AM - 11:50 AM, Location: Lone Star C

Panel Discussion - Electronic Data capture in clinical data management has transformed by leaps and bounds. If we look back on how data management has evolved, we started with scanned paper-based forms, and then data was entered twice (double data entry), and then transitioned to electronic data capture with query reconciliations and finally to direct data entry using esource and EMR integrations. However, in clinical data analysis, we have continued to carry out double programming in SAS for FDA submissions. While this is a time-tested method, it also adds inefficiency in market where it is already difficult to hire good quality statistical programmers and biostatisticians to carry out the tasks. Can our industry adopt automation, computer system validation, and risk-based submission approaches, in order to dramatically increase efficiencies?


Medical Devices

MD-140 : Digital Health Technology in Clinical Trials: Opportunities, Challenges and Biometrics Future Re-imagined
Kalyan Chakravarthy Buddavarapu, ASTRAZENECA
Sherry Ye, Astrazeneca
Vijay Vaidya, Astrazeneca
Tues, 9:30 AM - 9:50 AM, Location: Lone Star A

There has been significant increase in clinical trials using Digital Health (DH) technologies. It is estimated by 2025, 70% of clinical trials will incorporate DH technology in one form or another. COVID-19 pandemic severely impacted clinical trials affecting subjects participation, recruitment, supply chain disruptions; it also accelerated adoption of DH technologies to protect patient safety and enable clinical trials continuity with minimal disruptions. FDA has formed, Digital Health Center of Excellence within Center for Devices and Radiological Health (CDRH), to provide technological advice and oversight for use of DH technologies in clinical trials. With accumulating evidence, DH technologies will immensely benefit clinical trials with decentralized and hybrid trials with patient-centric focus, there are proportionate benefits for society, healthcare systems, health care providers, insurance companies, and Pharmaceutical industry with site-less trials, availability of more follow-up data, patient retention, generation and supplementation of real-world evidence, and operational efficiency in the conduct of clinical trials. The rapid evolution of DH technologies presents challenges to keep-up the pace, with changing landscape and evaluation of these technologies in time to be of use in clinical trials. The digital access divide also poses challenge of access of these trials to disadvantaged groups who may not have access to internet or older population unable to use the digital technology. This proposed paper, examines trends, benefits, challenges of using DH technologies in clinical trials and case-study from recently conducted clinical trial at AstraZeneca, Late Stage Cardiovascular trial, to highlight, benefits and challenges, using DH technology, from programming perspective.


MD-174 : Measuring Reproducibility and Repeatability of an AI-based Quantitative Clinical Decision Support Tool Having a Medical Decision Point
Douglas Milikien, Accudata Solutions, Inc.
Tues, 10:30 AM - 10:50 AM, Location: Lone Star A

Artificial Intelligence and machine-learning based methods have led to a rapid expansion of software products used in medical diagnostics. These tools are most frequently intended not directly for diagnosis, but as supporting information for the clinician to arrive at a diagnosis and are therefore referred to as Clinical Decision Support tools. Methods for quantifying the measurement agreement of these software to gold standard measurement is well known. What is less well-known are methods for quantifying the reproducibility and repeatability of quantitative assessments when those assessments involve measurement variation due to 1.) case difficulty, 2.) operator skill level or judgment , and 3.) stability over time. Although generic guidelines exist (CLSI EP05-A3) for designing and measuring precision experiments, these guidelines assume that there are multiple replicates for each combination of experimental conditions. The practicalities of using these machine-learning platforms often prevent the collection of more than one replicate per experimental condition. This paper illustrates a method for designing and powering such reproducibility and repeatability studies through simulations and discusses how an acceptance criterion was determined. Furthermore, this paper illustrates visual methods for exploring the contributions of Case, Operator, and Time to measurement variability and estimates Variance Components and their standard error using a Mixed Model. Of particular interest are values in the neighborhood of a medical decision point and the classification of cases by the software as positive or negative based on that medical decision point. The repeatability of those binary classifications will be examined as well.


Quick Tips

QT-005 : You Don't Know Where That's Been! Pitfalls of Cutting-and-Pasting Directly into the Programming Work Flow
Dave Hall, Quality Data Services
Mon, 8:30 AM - 8:40 AM, Location: Room 301-302

It is sometimes tempting to simply use a mouse to cut a value from a source and paste it into a SAS program, a specification, or any other element involved in the workflow of a project. Alternatives such as typing long lists or a wall of text, and implementing safeguards such as an extra programming step to detect and filter out problematic characters, are exactly the type of time-consuming activity that one may be trying to avoid, in favor of a few quick mouse clicks. But this method can be a risky proposition. This paper will describe such risks and their ramifications. Cutting and pasting this way, with no filter or precautions, is like finding a piece of candy on the street. Although it might look tasty, there’s no way to know where it came from or what’s in it. You don’t know where it’s been!


QT-008 : Programmed Value-Level Metadata for Define XML 2.0
Abhinav Srivastva, Exelixis Inc
Mon, 8:45 AM - 8:55 AM, Location: Room 301-302

Value-level metadata in Define-XML 2.0 has been greatly enhanced to list a unique set of values (typically for a result variable) in combination with one or more categorical parameters of interest using a WHERE subset clause. A given slice of data using a WHERE subset clause can also be based on multiple datasets to create complex subsets as per the intent of the sponsor. The paper introduces a SAS® macro which can help create a Value-level metadata in XML format that can be used as part of the larger Define XML file along with preparing other components like Control Terminology (CT), Comments, etc that go with the Define document.


QT-016 : How Simple Table of Content can make CSR Table, Listings, and Figures Easily Accessible?
Yogesh Pande, Merck Inc.
Mon, 9:00 AM - 9:10 AM, Location: Room 301-302

Clinical Study Report (CSR) contains 100+ Tables, Listings, and Figures (TLFs). If a study SAS® programmer programmatically creates Table Of Content (TOC) with titles, page numbers, and includes all displays as part of the same document, the file becomes quite large. Navigating from one display to another becomes slow. Also, the TLF folder can get crowded with lots of TLF files and to know which filename belongs to which title becomes difficult. To overcome these challenges, this paper is introducing a macro that can generate TOC having the TLF titles and it makes sure that the TLFs are easily accessible via hyperlink for each display.


QT-017 : Standardizing Procedures for Generating Dose Proportionality Figures to Improve Programming Efficiency
Jianli Ping, Gilead Sciences Inc
Krishna Sivakumar, Gilead Sciences Inc
Mon, 9:15 AM - 9:25 AM, Location: Room 301-302

Dose proportionality figures with selected PK parameters provide an intuitive linearity assessment of PK parameters with dose levels for clinical trials in dose escalation. The nature of common PK parameters and dose amounts is usually in favor of logarithmic scale figures. However the programming procedures for such figures can be time-consuming and tedious. This paper presents generalized procedures to create a natural logarithmic scale of dose proportionality figures that include observed scatter plots, reference line, regression line, (and prediction intervals) and legend annotation through macro calls using SAS. The described procedures can be applied to different studies by adjusting the parameter calls to increase the programming efficiency and accuracy. Some challenges and proposed solutions will be discussed.


QT-021 : Using Boolean Functions to Excel at Validation Tracking
Noory Kim, SDC
Mon, 9:30 AM - 9:40 AM, Location: Room 301-302

Our statistical programming teams use Excel to keep track of the completion and validation of project outputs. Large spreadsheets can provide detailed information on individual outputs but not a quick summary of overall progress. How can we enable all team members to be more mindful of the overall progress and thus take better ownership of meeting project deadlines? This paper will show how to combine Boolean formulas in Excel to count metrics such as (1) the number of topline outputs that passed validation and (2) the number of outputs that have passed validation on or after the primary completion date. The reader can use these formulas to create a dashboard in a separate tab of the validation tracker. This method does not require VBA or SAS, making it relatively simple to implement. By including the dashboard in the validation tracker itself, the reader can provide an easy way for all programmers working on a project to check its overall progress.


QT-091 : Auto Extraction of the Title Description in Table Report
Xingshu Zhu, Merck
Li Ma, Merck
Bo Zheng, merck
Mon, 9:45 AM - 9:55 AM, Location: Room 301-302

There are times during the completion phase of a study where programmers are asked to provide a full list of all table reports in Rich Text Format and their descriptive titles to help with preparing the Clinical Study Report or publications. When the number of total reports is small, this request is simple and can be quickly completed manually by opening each report and copying the title over. However, when there are hundreds of reports or more, it becomes a daunting task where a manual process is impractical and can dramatically increase the risk of mistakes. This paper shares an approach that utilizes RTF control words to programmatically identify the descriptive titles for a list of table reports one-by-one and outputs the results to an Excel file that contains the report names, titles, and folder location links.


QT-093 : SAS® Studio: Creating, Analyzing, and Reporting Data with Built-in & Custom Tasks
Himanshu Patel, Merck & Co.
Chintan Pandya, Merck & Co.
Mon, 10:30 AM - 10:40 AM, Location: Room 301-302

SAS Enterprise Guide or Display Manager is the default programming tool for SAS users in the clinical trial industry. SAS Studio is a relatively new tool within the SAS environment, with powerful enhancements compared to SAS Enterprise Guide and Display Manager. SAS Studio provides built-in tasks for generating and executing complex statistical methods, models, graphs, and SAS procedures using a simple point-and-click interface. SAS Studio also allows programmers to develop custom tasks for the unique requirements they receive from time to time. This paper helps understand some of the predefined point-and-click task features and how to create custom tasks for individual needs with the SAS Studio interface. It can be a beneficial tool for urgent requests, data analysis, and exploration. This paper will be helpful to those planning to consider an alternative programming option to SAS EG or Display Manager.


QT-102 : Getting odds ratio at Preferred Term (PT) level for safety analysis in R
Kamlesh Patel, Rang Technologies
Jigar Patel, Rang Technologies
Mon, 10:45 AM - 10:55 AM, Location: Room 301-302

Safety Analysis is vital part of regulatory review. Various types of adverse event (AE) analysis are being performed in industry for reviewing safety by comparing each preferred term (PT) level statistical parameters in two treatment groups. Hence, calculating various statistical parameters at preferred term level becomes necessary. We will discuss about getting Odds Ratio (OR) for two treatment with respect to each preferred term. Calculating Odds ratio in R for multiple records together is very tricky. This can be also used in various other safety analysis like volcano plot creation. We will be presenting a simple way to calculate odds ratio for hundreds of preferred term together in R for various safety analysis.


QT-114 : Avoid Common Mistakes in Preparing the BIMO Deliverables
Chintan Pandya, Merck & Co.
Himanshu Patel, Merck & Co.
Mon, 11:15 AM - 11:25 AM, Location: Room 301-302

The purpose of the BIMO package is to verify the integrity of the data submitted and confirm that all regulations for clinical trials are being met. The data package is required for pivotal studies and should include information to support your application’s safety and efficacy endpoints. It is important to pay special attention when creating a BIMO package and avoid making mistakes. This paper will discuss common mistakes we encounter and avoid them to create a good quality summary-level clinical site (CLINSITE) dataset, subject-level line listings by Site, and data definition (define.xml). The paper also provides tips on cross-checking the efficacy and safety counts with CSR counts.


QT-135 : Data Masking
Sumit Pratap Pradhan, SYNEOS HEALTH
Navneet Agnihotri, Syneos Heath
Rachel Brown, Syneos Health
Mon, 1:30 PM - 1:40 PM, Location: Room 301-302

When Clinical trials are conducted in double blinded manner and continuous analysis is needed by both blinded and unblinded team, data should be handled with special attention. Ideally unblinded data should not be seen by blinded team until Database Lock is completed, but analyzing data is necessary by blinded team to take important decisions for the study. Here Data Masking comes into the role to help sharing data with blinded team without violating any rule. In the process of data masking, original subject Id of a patient will be replaced by a dummy Id. Since the study is ongoing, analysis must be performed on individual data cuts. Another important requirement is that keep the dummy Id assigned consistent across all data cuts. This is particularly helpful in doing comparison between different data cuts. Example – Consider 1st data cut has 3 subjects, below masking scheme is applied Subject Id Dummy Id 101-33 DUM-01 101-45 DUM-02 101-75 DUM-03 Consider 2nd data cut has 4 subjects (1 additional subject, Last row), only new subject will be assigned new Dummy Id. See below masking scheme: Subject Id Dummy Id 101-33 DUM-01 101-45 DUM-02 101-75 DUM-03 102-65 DUM-04 In this paper, details of achieving this result will be explained.


QT-143 : How to create BARDA Safety report for Periodic review
Mrityunjay Kumar, Ephicacy Lifescience Analytics
Dnyaneshwari Nighot, Ephicacy Lifescience Analytics
Nitish Kumar, Ephicacy Lifecience Analytics
Mon, 1:45 PM - 1:55 PM, Location: Room 301-302

For an ongoing clinical study, we come across a situation wherein we need to submit safety report for periodic review to BARDA (Biomedical Advanced Research and Development Authority). It becomes of utmost importance for the statistical programmers to understand the contents of the report and the way to organize data within the report. Safety data such as number of enrolled subjects, screen failure subjects, discontinued subjects, adverse events and serious adverse events per site provides a detailed and easy way to understand safety report for an ongoing study to successfully review by BARDA committee. An illustration about contents of report with customized excel output using ODS HTML may be useful for the programmers to easily produce the report without any hassle and may serve as a possible solution.


QT-144 : Smart Batch Run your SAS Programs
Wayne Zhong, Accretion Softworks
Mon, 2:00 PM - 2:10 PM, Location: Room 301-302

While batch running your SAS programs, you may need more features than simply running all programs. This paper shows how you can: redirect the log for each program, clean temporary files between runs, summarize errors, and pass environmental macro variables into each executing program.


QT-169 : A Beginner’s Guide to Create Series Plots Using SGPLOT Procedure: From Basic to Amazing
Aakar Shah, Neoleukin Therapeutics
Tracy Sherman, Ephicacy
Mon, 2:15 PM - 2:25 PM, Location: Room 301-302

As a beginner SAS programmer, it can be intimidating to work on graph assignments. SAS ODS Graphics provides an avenue for SAS users to create visually pleasing graphs very quickly. However, ODS Graphics documentation is extensive and cumbersome. This paper can help beginner SAS programmers on their journey from creating a basic series plot using a simple SGPLOT statement to amazing, formatted graphs which can be shared with upper management or in a conference presentation. Knowledge learned from this paper can be further applied to other types of graphs.


QT-177 : Form(at) or Function? A Celebratory Exploration of Encoding and Symbology
Louise Hadden, Abt Associates Inc.
Mon, 2:45 PM - 2:55 PM, Location: Room 301-302

The concept of encoding is built into SAS® software in a number of forms, including PROC FORMAT, which transforms values in variables for reporting and to create new variables. Similarly, symbol tables are built into SAS software, so as to communicate with different platforms and systems. This quick tip demonstrates how to PROC FORMAT, and by extension PROC FCMP, to create a system to convert user provided text into Morse Code, and then convert that Morse Code word into sounds, all using SAS. This fun exploration is highly informational about the sort sequence used by SAS software on different platforms, as well as demonstrating the use of PROC FORMAT, PROC FCMP, and sound generation in SAS.


QT-195 : Excel the Data
Naga Madeti, Statistical Programmer
Mon, 2:30 PM - 2:40 PM, Location: Room 301-302

In the pharmaceutical research, when processing the data, it is evident that converting SAS dataset to EXCEL data and EXCEL data to SAS dataset is certain. This paper explains coding tips and tricks in using ODS EXCEL with PROC REPORT to generate a beautiful excel sheet, which can be successively useful to export to SAS dataset if needed without any issues for the columns with long text. Also, demonstrates the process of generating EXCEL sheet, with addition of multiple sheets using PROC PRINT. In addition, it describes an approach in finding and resolving the issues of special characters when merging the imported excel data with a SAS dataset.


QT-199 : Monitoring SDTM Compliance in Data Transfers from CROs
Sunil Gupta, Experis
Mon, 3:30 PM - 3:40 PM, Location: Room 301-302

Typically, CROs transfer SDTMs to sponsors on a regular basis and sponsors accept the new SDTMs. Instead of just blindly accepting SDTMs, it is better to monitor SDTM compliance after each SDTM transfer. The benefits include higher SDTM compliance, higher data quality and error-free programs downstream. With each data transfer, smarter sponsors are proactive to build and compare metadata between the specifications, original and new SDTMs. This method assures more consistency in SDTM data structure as well as study level demographic and safety statistics. Sponsors can get early alert of any major changes in study baselines. This presentation shows how to create standard and custom SDTM metadata as well as methods to identify and track SDTM metadata differences.


Real World Evidence and Big Data

RW-020 : Understanding Administrative Healthcare Datasets using SAS programming tools.
Jayanth Iyengar, Data Systems Consultants LLC
Tues, 8:30 AM - 9:20 AM, Location: Room 301-302

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers and insurance companies is massive and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of administrative healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.


RW-092 : Automation of Variable Name and Label Mapping from Raw Data to Analysis Data Standards
Xingshu Zhu, Merck
Bo Zheng, Merck
Li Ma, Merck
Tues, 9:30 AM - 9:50 AM, Location: Room 301-302

As real-world data gains more relevance in the pharmaceutical industry, it becomes important to have a repeatable and consistent process to map raw variable names and labels into standardized versions for analysis. Raw data files with no standard variable naming and labeling conventions are a common occurrence in non-interventional primary data collection studies. This paper describes two methods for handling the import and variable mapping tasks in SAS. One method leverages the built-in features of SAS PROC IMPORT as a direct and simple approach that allows raw data to be imported and variable names mapped to predefined standards. The second method is more complex and relies on using SAS to extract the relevant meta data from a raw data file to create an Excel mapping template, which makes it easier for non-SAS users to access, collaborate, and offer input during the mapping process.


RW-160 : RWD (OMOP) to SDTM (CDISC): A primer for your ETL journey
Ashwini Yermal Shanbhogue, None
Tues, 11:00 AM - 11:20 AM, Location: Room 301-302

Real-World Data (RWD) is defined as data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources like electronic health records (EHRs), medical claims and billing data, data from product and disease registries, patient-generated data etc. The increase in availability of this observational data and the evolution of tools to analyze it has piqued the interest of global regulatory agencies to use this data to support regulatory decision making. Since this data and its collection was not designed for this purpose however, it does not have uniform structure and vocabulary, making it difficult and time consuming to compare or exchange between computer systems. To circumvent this problem, RWD can be put into a common format or common data model (CDM) with common representation (terminologies, vocabularies). OMOP (Observational Medical Outcomes Partnership), which is a part of one of the four largest RWD networks, Observational Health Data Sciences and Informatics (OHDSI) is one such CDM. To be available for regulatory submissions, RWD in CDM must then be extracted, transformed and loaded (ETL) into an FDA supported data standard like Clinical Data Interchange Standards Consortium’s (CDISC’s) Study Data Tabulation Model (SDTM). In this presentation, I will explore the mapping of an open access RWD in OMOP CDM to appropriate variables and ontologies in CDISC SDTM and any associated challenges. This will enable sponsors to kick start their ETL journey with a blueprint to the process and equip them to deal with forthcoming challenges.


RW-179 : Generating Real World Evidence On The Likelihood Of Metastatic Cancer In Patients Through Machine Learning In Observational Research: Insights For Prevention
Sherrine Eid, SAS Institute
Samiul Haque, SAS Institute, Inc
Robert Collins, SAS Institute, Inc
Tues, 11:30 AM - 11:50 AM, Location: Room 301-302

Properly identifying comorbidities in cancer patients that increase the likelihood of metastatic disease is critical to preventing disease progression and morbidity. Living with comorbidities can lead to an increased likelihood of negative outcomes in patients who are diagnosed with cancer. This research aims to identify opportunities to prevent or at least affect the likelihood of metastasis. METHODS Over 2.3 million deidentified, cancer patient medical claims records (www.compile.com) were analyzed to assess the likelihood of metastatic cancer. Cohorts were defined as any patient who was diagnosed with an ICD-10 diagnosis of C7.x, C78.x, C79.x, or C80.x (n=2,396,043). The outcome variable was metastatic cancer defined by Elixhauser. Lasso logistic regression, decision trees, gradient boosting and random forest were run and compared adjusting for sex, age and the top six Elixhauser comorbidity groups (alcohol abuse, congestive heart failure, coagulopathy, anemia, hypertension, and liver disease.) Analyses were executed using SAS® Health. RESULTS Liver Disease, Hypertension and Coagulopathy, respectively showed a 8.3% to 2.6% magnitude in contributing to the likelihood metastases. Decision tree, random forest, and neural network models (KS Youden 0.0793, 0.009 and 0, respectively) were the least fit models, while the gradient boosting model and logistic regression were the best models. (KS Youden 0.112, and 0.167, respectively) CONCLUSIONS Electronic medical records should identify patients who have a comorbidity of liver disease to more closely monitor them for metastatic disease. Intervention is crucial to improving metastasis and mortality in these patients.


RW-192 : Introduction to Synthetic Control Arm
Venkat Rajagopal, Ephicacy Consulting Group
Syamala Schoemperlen, Ephicacy Consulting Group
Abhinav Jain, Ephicacy Consulting Group
Tues, 1:30 PM - 2:20 PM, Location: Room 301-302

There has been a rapid expansion in the use of non-RCT based evidence in the regulatory approval of treatments. A methodology named Synthetic Control Arm (SCA) has emerged to analyze external control data. SCAs help augment or replace randomized controls in many cases. SCAs provide cross-sponsor regulatory grade historical trial data that can be used for comparative analysis in new trials. This helps accelerate development timelines, reduces costs and increases the probability of trial success. It also helps decrease recruitment and reduces patient burden. Appropriate statistical methodologies need to be used to address the difference between RCT and SCA data. SCA comparisons are based on the complexity associated with matching external data to the study. For a simple comparison, simple mean, median or fixed-effect pooling is enough. Imbalance adjustments are carried out with multivariate regression and propensity scoring. For more complex situations, Bayesian, random forests, and neural networks are used.


Statistics and Analytics

SA-003 : Estimating Differences in Probabilities (Marginal Effects) with Confidence Interval
Jun Ke, Independent Statistician
Kelly Chao, LLX Solutions, LLC
Corey Evans, LLX Solutions, LLC
Mon, 8:30 AM - 8:50 AM, Location: Lone Star A

Fitting logistic models using SAS procedures such as PROC LOGISTIC can obtain log odds for populations in the data, you can also obtain estimates of odds and odds ratios by options prespecified in the procedure (or by exponentiating the log odds). But to get difference in population event probabilities (population means), we need to estimate a nonlinear function of the parameters of the logistic model. Alternatively, you could model the probabilities themselves rather than the log odds. This paper will present the statistical background as well as the modeling process using layman’s explanations. SAS Inc has provided a series of macros that could assist with calculating the difference in probabilities, however, in this paper, we will introduce a more robust model that could accommodate all sorts of scenarios that could happen during the trial.


SA-004 : A SAS macro wrapper for an efficient Deming regression algorithm via PROC IML: The Wicklin method
Natasha Oza, Roche Molecular Systems, Inc.
Ben Wang, BioMarin Pharmaceutical, Inc.
Jesse Canchola, Roche Diagnostics Solutions
Mon, 9:00 AM - 9:20 AM, Location: Lone Star A

Deming regression is one tool in method comparison/correlation studies and is used to compare two or more quantitative measurements from two similar methods that produce like-measurements on the same subject/sample on the linear scale where both methods can be deemed to have error. Contrast this with linear regression (viz., Y=mX+b+error) where only the dependent variable Y is deemed to have error and the X variable is fixed (i.e., without error, for example, an expected value). At least two SAS macros exist that perform Deming regression (Deal, Pate, and El Rouby in 2009 and Njoya and Hemyari in 2017). However, these macros can lengthen the time to final result as they both use a macro loop to perform the jackknife or bootstrap. The method proposed by Wicklin (2019) is deemed as substantially more efficient as it uses SAS/IML with explicit formulas to compute the slope and intercept of the Deming regression line. We introduce a SAS macro wrapper that produces camera-ready graphs of the Deming regression line drawn over the scatterplot.


SA-009 : The Emerging Use of Automation to Address the Challenge of Cross-Table Consistency Checking of Output Used in the Reporting of Clinical Trial Data
Ilan Carmeli, Beaconcure LTD
Keren Mayorov, Beaconcure LTD
Hugh Donovan, Beaconcure LTD
Mon, 9:30 AM - 9:50 AM, Location: Lone Star A

The current gold standard for validation of programs written to summarize data from clinical trials is 100% double programming of all analysis datasets, tables, listings, and figures. This approach, however, does not identify cross-table discrepancies, and, we would argue, is not the best method of obtaining the highest quality results. As it is a manual exercise, it is labor intensive and subject to human error. Performing cross table checks of outputs manually is a common practice performed by biostatisticians in the pharmaceutical industry. Manually validating multiple tables takes a significant amount of time and resources. This repetitive work can be done by a dedicated software, allowing the biostatisticians to focus on the statistical aspects of the study. It also provides a consistent, specified approach, whereas the current checking process is often unspecified and therefore not replicable. An automation solution developed by Beaconcure cross checks two or more outputs in exactly the same way as figures within and across tables are commonly compared today but it does so in a comprehensive, consistent and faster manner. This technology can be used multiple times as data accumulates, identifying programming errors that lead to discrepancies in the output. It can also be used for all of reporting, not just tables for Clinical Study Reports, for example, Interim Analyses, Safety Updates, output for Data Monitoring Committees. We will present at the conference multiple examples of cross-table checks as well as the automated process for performing the checking.


SA-011 : Validation of Statistical Outputs Using Automation
Ilan Carmeli, Beaconcure LTD
Yoran Bar, Beaconcure LTD
Mon, 11:00 AM - 11:20 AM, Location: Lone Star A

How might the major statistical outputs validation challenges be relieved? The most comprehensive and effective solution is automation supported by Machine Learning (ML) have already been proven across many industries, and they are considered a game-changer in the world of drug development. Applied to statistical outputs validation, ML and automation can be fundamental to rapidly ensuring sound interpretation of clinical data and, therefore, more efficient regulatory submissions. ‘Verify’ is Beaconcure’s automated output validation solution, supported by ML and created specifically for the pharmaceutical industry. It converts various clinical data formats in any layout into a semantic and dynamic database, to which any required segmentation rule, crosscheck and analysis can be applied. All defined errors and anomalies are identified, with 99.7% accuracy, in a matter of hours, drastically reducing clinical data processing timelines and increasing the quality of the output. How does the technology work? ‘Verify’ validates statistical outputs automatically by applying various algorithms to the processed data. The verification algorithms use the base table processing information to identify groups and sub-groups in the data, with the capability of validating single and cross-table content. The system can then flag discrepancies and direct the user to the relevant table for follow-up action and resolution of identified discrepancies. In this presentation we will demonstrate how to: Configure and set up the software Create a new project We will also present how Verify: Performs within table checks Performs cross-table checks Manages the resolution of discrepancies Maintains an audit trail of all resolutions


SA-015 : Derivation of Efficacy Endpoints by iRECIST Criteria: A Practical Approach
Ilya Krivelevich, Eisai Inc.
Lei Gao, Eisai Inc.
Richard Kennan, Eisai Inc.
Mon, 10:30 AM - 10:50 AM, Location: Lone Star A

Immune-checkpoint inhibitors represent one of the most important therapy advancements in modern oncology. They are currently used for treatment of multiple malignant diseases, especially at advanced metastatic stages. A challenging aspect of these immunotherapies is that they may show atypical therapy response patterns such as pseudoprogression. In 2017, the RECIST working group published a modified set of response criteria, iRECIST, for immunotherapy, based on RECIST 1.1, which was initially developed for cytotoxic therapies and adapted for targeted agents. This document provides rules and examples for how to derive most common endpoints (like Best Overall Response or Progression-Free Survival Time) for clinical studies according to iRECIST criteria. When a Clinical Study Report is produced, it is developed from Analysis (ADaM) Data Sets created under CDISC guidelines. This article provides recommendations for creating ADaM data sets and deriving needed endpoints by iRECIST criteria from these ADaM data sets.


SA-090 : Two Methods to Collapse Two Treatment Groups into One Group
Kenneth Liu, Merck & Co., Inc.
Ziqiang Chen, Merck & Co., Inc.
Mon, 11:30 AM - 11:50 AM, Location: Lone Star A

When studies include different vaccine regimens (eg 1 Dose, 2 Dose) or multiple doses of the same treatment (eg 10 mg, 20 mg), statistics for each regimen or dose are summarized in separate columns in a table. Sometimes, it is of interest to combine the multiple regimens or doses into 1 group. This presentation will discuss 2 methods to combine multiple columns of a table into 1 column. One way is to collapse the regimens or doses into 1 group and analyze the data as if they were 1 group. Another way is to keep the regimens or doses in separate groups and use the ESTIMATE statement in SAS PROC MIXED to combine them. These 2 methods will be compared and discussed with an example and a simulation.


SA-104 : How RANK are your deciles? Using PROC RANK and PROC MEANS to create deciles based on observations and numeric values.
Lisa Mendez, Emerge Solutions Group
Mon, 2:00 PM - 2:20 PM, Location: Lone Star A

For many cases using PROC RANK to create deciles works sufficiently, but occasionally you find that it does not work for your needs. PROC RANK uses number of observations to produce a rank; however, if you need to create ranks (deciles) across observations in a data set based on a measure of a numeric variable (i.e., Sum), then Proc Rank will not work. Instead, you can use Proc Means to successfully create ranks and percentiles. This paper will illustrate the basic usage of PROC RANK and how to use PROC MEANS for the alternative. The paper will utilize BASE SAS 9.4 code and will use a fictional dataset that provides the total number of prescriptions written by providers for a specific year. All levels of SAS users may benefit from the information provided in this paper.


SA-118 : Win Ratio Simulation For Power Calculation Made Easy
Lili Li, Inari Medical
Roger Chang, Inari Medical
Benjamin Fine, Inari Medical
Yu-Chen Su, Inari Medical
Mon, 2:30 PM - 2:50 PM, Location: Lone Star A

More often than not, treatment effects cannot be evaluated by a single event. Therefore, composite endpoints have been frequently used for decision-making in clinical trials. However, conventional analysis of composite endpoints still treats each component as separate and equal. While with clinical relevance in mind, for instance, death may be seen as more important than other events such as stroke; thus, hierarchy ranking of component endpoint is desired. In 2012, Pocock et al. proposed the win ratio approach for composite endpoint analysis. In this approach, each component from the composite endpoint is ranked by its clinical importance in the study, and then analyzed by one component at a time in a hierarchical fashion between any two subjects. Since then, applications of win ratio approach have been steadily gaining momentum in clinical trial analysis. While more of those applications are observed at the hypothesis testing stage, fewer are focused on the early study design phase. This paper will present an easy-to-use SAS Macro that has the flexibility to customize component event rates and rankings based on clinical relevance to assist at the study design stage by simulating subject-level data with composite endpoints to derive the win ratio statistics and calculate power for a given sample size.


SA-123 : An Expanded Set of SAS Macros for Calculating Confidence Limits and P-values Under Simon’s Two-Stage Design Accounting for Actual and Planned Sample Sizes
Mikhail Melikov, Cytel
Alex Karanevich, EMB Statistical Solutions
Mon, 3:30 PM - 3:50 PM, Location: Lone Star A

Simon’s two-stage designs are popular single-arm, binary-endpoint clinical trials that include a single interim analysis for futility. In the literature, it is widely acknowledged that the inferential statistics calculated (p-values, point estimates, and confidence limits) in Simon’s designs typically do not account for when the actual sample size differs from the planned sample size at stage two. Koyama and Chen (2008) provided methods to calculate proper inferential statistics for studies where the planned sample size at stage two is either the same or different from the actual sample size. Our previously published SAS macro implemented these methods for the case where the planned sample size matched the actual sample size. We acknowledge that the actual sample size for stage two is often different from what is planned: therefore, to expand the utility of our previous SAS macro, in the field of biostatistics among the audience with intermediate level of statistical skills, we have implemented methods for inference when there are deviations in the planned vs. actual sample size. The expanded SAS program now has macros to calculate inferential statistics with and without deviations in the planned vs. actual sample size at stage two, and also calculates two types of point estimates of the response rate at stage two - uniformly minimum variance unbiased estimator (UMVUE) as discussed by Jung and Kim (2004) and reasonable point estimate proposed by Koyama and Chen (2008). We do not expect the SAS macro to be dependent on operating system or SAS software version.


SA-142 : Analysis of sample size calculations in clinical trial – errors, pitfalls and conclusions
Igor Goldfarb, Accenture
Ritu Karwal, Accenture
Xiaohua Shu, Accenture
Mon, 4:30 PM - 4:50 PM, Location: Lone Star A

The goal of this work is to increase awareness of investigators, medical writers and statisticians working on the development, planning and conduction of the randomized clinical trial (RCT) about real importance of the statistical sections of the corresponding study protocols describing calculations of the sample size for this trial. Every RCT is carefully and thoroughly planned from its very early stage. This plan typically includes the main objectives of the trial, primary and secondary endpoints, method of collecting the data, sample size with scientific justification, methods of handling data, statistical methods and assumptions. This plan represents a natural part of the Protocol of RCT. One of the key aspects of this plan is an estimation of the required sample size – a number of subjects to be enrolled in the RCT. Normally every protocol developed for upcoming RCT contains special section describing statistical considerations and medical justifications for the proposed number of patients/participants of the trial. The authors reviewed a large number of protocols available on the government website clinicaltrials.gov and came to conclusion that in many cases sample size calculations are still inadequately reported, often erroneous, and based on assumptions that are frequently inaccurate. The paper illustrates how far from each other can be numbers described in the Protocol (sample size of RCT) and estimations obtained using the same assumptions (to verify replicability) that were carefully described in sections of protocols devoted to the calculation of the sample size. Such a situation raises questions about how sample size is calculated in RCT and is supposed to attract more attention to still actual need to perform a correct estimation of sample size.


SA-149 : Simple and Efficient Bootstrap Validation of Predictive Models Using SAS/STAT® Software
Matthew Slaughter, Kaiser Permanente Center for Health Research
Isaiah Lankham, University of California Office of the President
Tues, 8:30 AM - 8:50 AM, Location: Lone Star A

Validation is essential for assessing a predictive model's performance with respect to optimism or overfitting. While traditional sample-splitting techniques like cross validation require us to divide our data between model building and model assessment, bootstrap validation enables us to use the full sample for both. This paper demonstrates a simple method for efficiently calculating bootstrap-corrected measures of predictive model performance using SAS/STAT® procedures. While several SAS® procedures have options for automatic cross validation, bootstrap validation requires a more manual process. Examples focus on logistic regression using the LOGISTIC procedure, but these techniques can be readily extended to other procedures and statistical models.


Strategic Implementation

SI-006 : New Digital Trends and Technologies in Clinical Trials and Clinical Data Management
Srinivasa Rao Mandava, Merck
Mon, 8:30 AM - 8:50 AM, Location: Room 303-304

Abstract: Internet is a huge repository of information which is used to cater diverse needs of its users. Almost 5 billion people around the world now use the internet. The explosion of internet, Covid- 19 pandemic and social media expansion have been offered new digital trends and technological evolution in clinical trial operations and clinical data management sectors in terms of Decentralized clinical trial process. We are going to discuss here, some of the latest new trends of clinical trial industry and challenges in these changing times and also changing roles from data programmer to data scientist. Further, we are looking forward to the help from the industry in proposing a detailed curriculum for the current and future needs of career paths in clinical data operations and management towards a quick analysis and submissions of clinical data to regulatory agencies with quality for faster approvals using these technologies.


SI-014 : Quality Control With External Code
Bill Coar, Axio, a Cytel Company
Mon, 9:00 AM - 9:20 AM, Location: Room 303-304

In the CRO industry, we can create statistical reports in multiple ways: by writing our own source code or receiving source code from sponsors. In projects where we receive source code, the sponsor will provide us with SAS code written and verified on their systems. When there is a need for unblinding prior to the final analysis, such as in supporting a Data Monitor Committee(DMC), some sponsors seek an external statistical data analysis center(SDAC) to perform this task. The SDAC is expected to run the sponsor provided code in their own environment to produce an unblinded report used for interim monitoring of safety and efficacy. The primary assumption made about the SAS code is that it is tested/verified/validated (pick your favorite word) by the client. In fact, the SDAC has little control, or sometimes no control, over the SAS code that is received. Even though it is expected that SAS code meets the sponsor’s SOPs associated with quality control (QC), additional measures of QC are essential since the code was tested in a blinded manner on a different system. We identify two critical areas where QC is imperative. First is to ensure all programs run on the SDAC system as expected given the potential for different operating systems, SAS licenses and folder structures. Second is to ensure the unblinding process has been verified by the SDAC. In this presentation we focus on these two areas where QC when supporting DMCs where source SAS code is not in our management.


SI-024 : Applying Agile Methodology to Statistical Programming
Gina Boccuzzi, PROMETRIKA, LLC.
Mon, 9:30 AM - 9:50 AM, Location: Room 303-304

When working in a CRO environment, Waterfall project management methodologies are often the default and seem like the only way to guide a project. While a full transition to Agile-based project management may not always be practical or the right fit for every workplace, many Agile methodologies feature practices that can be easily implemented to optimize daily workflow. These tools include: Daily Standup, a daily team meeting to discuss progress and upcoming work; Sprint Planning/Retrospective, bi-weekly meetings to discuss the work needed in the coming weeks; Post-Mortem, a meeting at the end of a project to discuss how it went and what could have been done differently; and Cyclical Timelines, working on different components of a project at the same time rather than waiting to start one until the last is finished. Regardless of the overarching project management style, these tools can be implemented individually or in tandem, depending on the user’s needs. These tools are easy to learn and understand and can optimize programmers' time. Agile does not have to be a practice reserved only for the software engineering world; its tools are broad and can streamline communication and workflow.


SI-028 : Getting a Handle on All of Your SAS® 9.4 Usage
Stephen Sloan, Accenture
Tues, 11:30 AM - 11:50 AM, Location: Lone Star C

SAS is popular, versatile, and easy to use, so it proliferates rapidly through an organization. It can handle systems integration, data movement, and advanced statistics and AI, has links to a large amount of file types (Oracle, Excel, text, and others), and has functionality for almost every need. In addition to SAS Base, there are a large number of specialized SAS products, some of the most popular of which are SAS STAT, SAS OR, SAS Graph, SAS Enterprise Miner, and SAS Forecast Server. SAS EG allows for quick creation of useful artifacts and facilitates saving the generated code for later use as part or all of another program. As a result, sometimes it is difficult to track all the SAS programs and artifacts being used across an organization, economies of scale can be overlooked, and repetition and “reinventing the wheel” sometimes take place. Programs and macros developed in one area can be useful in other areas and they can be improved by this internal crowd-sourcing. Understanding all the places where SAS is used is also important when upgrading a system that makes heavy use of SAS, or when upgrading SAS itself to a new version like Viya. It can also help an organization identify which SAS products it is using and how much use these products are getting. To accomplish the above, we’ve developed a set of programs to search a Unix server or a Windows server or machine to find, catalog, and identify the SAS usage on the machine.


SI-042 : Lessons learned while switching over to SAS Studio from SAS Desktop.
Steve Black, Precision for Medicine
Mon, 10:30 AM - 10:50 AM, Location: Room 303-304

I’ve been working within the SAS desktop/server application for nearly 20 years and during that time have gotten to know my way around the block with SAS including shortcuts, abbreviations, keyboard macros, x commands and the normal SAS stuff to do some pretty fancy stuff. So moving over to SAS Studio was going to be a transition that I was interested in, but was not really looking forward to. Through this paper, I hope to create the paper that I wish that I had before I started working within SAS Studio so the transition would have been a little easier. Even though I’ve been working in SAS for nearly 2 decades, I still find little things that I did not know or use very much that I now find super helpful and useful, so this paper comes with a large caveat in that there may be solutions to some issues or clever workarounds that others have found that I did not find at the time this paper was written. In this paper, I will discuss the similarities between the desktop version and SAS studio and what work arounds SAS has created for options that are not available in SAS Studio. I’ll also point out some issues that I am having a tough time dealing with that I wish could be fixed. Finally, I’ll point out some things that I enjoy about SAS studio that are a real benefit to me now that I've adjusted to the new environment.


SI-044 : Enterprise-level Transition from SAS to Open-source Programming for the whole department
Kevin Lee, Genpact
Mon, 11:30 AM - 11:50 AM, Location: Room 303-304

The paper is written for those who wants to learn the enterprise-level transition from SAS to open-source programming. The paper will introduce the transition project that the whole department of 150+ SAS programmers has completely moved from SAS to Open-source programming. The paper will start with the scopes of the project – Analytic platform switch from SAS Studio to R Pro Server, converting the existing SAS codes to R/Python codes, Window server to AWS Cloud computing environment, and the transition of SAS programmers to R/Python programmers. It will also discuss the challenges of the project such as inexperience in Open-source Programming, new analytic platform, and change management. The paper will introduce how the transition-support team, executive leadership and SAS programmers have overcome the challenges together during the project. The paper will also discuss the difference in SAS and Open-source language and programming, and it will show some examples of the conversion of SAS codes to R/Python codes. Finally, it will close with the benefits of the Open-source programming transition and the lessons learned from the project.


SI-055 : Implementing and Achieving Organizational R Training Objectives
Jeff Cheng, Merck & Co., Inc.,
Abhilash Chimbirithy, Merck & Co.
Amy Gillespie, Merck & Co., Inc.,
Yilong Zhang, Merck & Co., Inc.,
Mon, 1:30 PM - 1:50 PM, Location: Room 303-304

Clinical data analysis and reporting for a clinical study report, submission and internal decision making are complex activities requiring large amounts of data to be processed, analyzed, and reported according to regulatory and departmental processes. As such, tools which help maintain process control, increase productivity, and sustain resourcing are in demand by statistical programming organizations. With a large open-source library for data manipulation, statistical calculation and graph plotting functions, R Statistical Programming Language is one such tool that has gained popularity within the pharmaceutical. In this paper we will highlight how Merck implemented its training strategy for R with the goal of achieving introductory R proficiency in 75% of its statistical programming staff within one calendar year. We will share how the R curriculum was selected, planned, and implemented, which resulted in a robust training strategy that successfully achieved the training objective. We will also discuss the challenges and benefits of this R training strategy which could be very helpful for other organizations that might be contemplating a similar training initiative for its own employees.


SI-057 : External R Package Qualification Process in Regulated Environment
Jane Liao, Merck & Co., Inc.
Fansen Kong, Merck & Co., Inc.
Yilong Zhang, Merck & Co., Inc.
Tues, 2:30 PM - 2:50 PM, Location: Lone Star C

Traceability, reproducibility, and quality are critical components in clinical trial Analysis and Reporting (A&R). It is critical to ensure reproducibility by providing a centralized library of qualified R packages. The centralized library also needs to provide a traceable and consistent environment for members of an organization to use R for both exploratory and regulatory deliverables. R packages are open-source software developed by a community of contributors around the world. R benefits from this de-centralized cohort of developers who have contributed many R packages to the Comprehensive R Archive Network (CRAN) and other code repositories. CRAN and other code repositories empower accelerated innovation but come at a cost of inconsistent quality of R packages. The lack of guaranteed accuracy or standardized testing may be a concern of using R in a regulated environment for clinical development. A centralized R library governed by a thorough qualification process is necessary to ensure process compliance, quality, traceability, and reproducibility for formal A&R deliverables using R for an organization. In this paper, we propose an end-to-end process to qualify and install both internally and externally developed R packages in a regulated R environment. The proposed process aligns with the white paper from the R validation hub by defining a set of pre-specified criteria for the external R package qualification review process.


SI-065 : PROC FUTURE PROOF 1.2 - Linked Data
Danfeng Fu, MSD
Ellie Norris, Merck
Suhas Sanjee, Merck
Susan Kramlik, Merck & Co., Inc.
Mon, 3:30 PM - 4:20 PM, Location: Room 303-304

Throughout the clinical study analysis and reporting (A&R) process, source data goes through several transformation steps in different phases. Although pharmaceutical companies have built streamlined processes to generate data and results in each phase, there is no automated, streamlined way to provide traceability as data moves from SDTM -> ADaM -> TLFs -> study reports. Data traceability is critical to ensure good quality and regulatory compliance. Previously we published a paper that evaluated recent advances in technology and the clinical trial programming skillset to identify opportunities for improved programming efficiencies. Last year, we published an overview of steps and challenges building a linked data proof of concept and a readout to provide data traceability from ADaM to SDTM. This year, we further evaluated table creation processes from ADaM datasets and expanded the use of linked data to enable automated traceability of analysis results in tables, listings, and figures (TLFs) back to ADaM datasets. We also devised a way to link analysis results/datapoints referenced in the study reports to TLFs which in turn can provide traceability back to ADaM & SDTM datasets. By representing all data from SDTM, ADaM and analysis results involved in the entire A&R process as linked data/graph database, we demonstrate end to end traceability from clinical study reports to SDTM data.


SI-067 : Finding Accord: Developing an eCOA Data Transfer Specification (DTS) All Can Agree On.
Frank Menius, YPrime
Monali Khanna, YPrime
Vincent Allibone, YPrime
Mon, 2:30 PM - 2:50 PM, Location: Room 303-304

When building a study where electronic Clinical Outcome Assessments (eCOA) are collected, the data transfer is often one of the last items considered and due to this can often be a source of headache for data providers and consumers alike. Assumptions are often made by one or more party about aspects of the collected data or their transfer that are easy to confuse or overlook. This paper will not only focus on bringing the topic of the data transfer closer to the beginning of the study design discussion but will also focus on the ins and outs of drafting a complete and concise transfer agreement that will work for all. We will discuss best practices for the industry, items to include or exclude, common sources of issues and errors, and real-world examples of the good and bad of a data transfer. Reusability, machine and human readability, and technical aspects will also be included. It is possible to make an eCOA data transfer clear, quick, easy, and painless. Let us show you how.


SI-097 : Ensuring Distributed Data custody on Cloud Platforms
Ben Bocchicchio, SAS Institute
Sandeep Juneja, colleague
Tues, 10:30 AM - 10:50 AM, Location: Lone Star C

Many companies as part of their cloud strategies have moved significant amount of data into multiple cloud vendors storage. Each cloud platform has its own Identify and Access Management process. Generating a holistic view of the data spread across multiple cloud platforms generate new challenges. In this paper we talk about different options as how secure connections can be established through various data exchange mechanisms used between various cloud platforms. This will be done in a controlled and audited way to generate data lakes and pools while maintaining data custody and data integrity.


SI-099 : Cyber Resiliency – Computing Platform Modernization
Susan Kramlik, Merck & Co., Inc.
Eunice Ndungu, Merck & Co Inc.
Hemu Shere, Merck
Tues, 11:00 AM - 11:20 AM, Location: Lone Star C

This paper describes selected cyber-resiliency steps taken, challenges encountered, and lessons learned when implementing a modernization program to upgrade the statistical computing environment at our company. The components of the transformation included moving from Unix to Linux, integration with a SAS Grid infrastructure system, and addition of web-based SAS Studio. The project’s scope became larger than initially anticipated. System Development Lifecyle (SDLC) practices were also required because regulatory content is created using the computing platform. Careful planning, collaboration and alignment between the business users, IT, and Quality Assurance drove the success despite the challenges. Significant participation from the user community was also instrumental in the success of this project. Lessons learned were shared after the project deployment. The planning, scope change, adaptation, user participation, execution, and lessons learned may be of value to others undertaking similar endeavors.


SI-162 : Just Say No to Data Listings!
Mercidita Navarro, Genentech
Nancy Brucken, IQVIA
Aiming Yang, Merck & Co., Inc.
Greg Ball, Merck & Co., Inc.
Tues, 1:30 PM - 1:50 PM, Location: Lone Star C

Sponsor companies often create voluminous static listings for Clinical Study Reports (CSRs) and regulatory submissions, and possibly for internal use to review participant-level data.  This is likely due to the perception that they are required and/or lack of knowledge of various alternatives.  However, there are other ways of viewing clinical study data that can provide an improved user experience, and are made possible by standard data structures such as the Study Data Tabulation Model (SDTM). The purpose of this paper is to explore some alternatives to providing a complete set of static listings and make a case for sponsors to begin considering these alternatives. We will discuss the recommendations from the PHUSE white paper, “Data Listings in Clinical Study Reports.”


SI-164 : Quality Gates: An Overview from Clinical SAS® Programming Perspective
Nagadip Rao, Eliassen Group
Pavan Jupelli, Eliassen Group
Tues, 2:00 PM - 2:20 PM, Location: Lone Star C

Clinical trials are conducted in pursuit to answer a clinical research question, by generating relevant data, and analyzing it in order to validate an initial hypothesis. Data quality, thorough data analysis, and reporting, plays a critical role in determining our confidence in clinical trial outcomes. An endower like clinical trial, which involves complex multi-disciplinary processes, with partner entities spanning across the globe, working together under stringent regulations to generate relevant data and perform data analysis, assuring quality is not trivial task. In an attempt to improve clinical trial data and analysis quality there are many techniques utilized over the years, one such technique which is successfully utilized across multiple industries is quality gate (QG’s) approach. QG’s are formal quality checklist evaluations done on intermediate milestone deliverables and related documentation at predetermined milestone checkpoints against pre-defined criteria’s in a clinical study process flow. The purpose of QG is to get an intermediate assessment of quality and reduce/eliminate any potential risks by facilitating early detection, discussion and resolution of identified issues. Clinical SAS® programming plays a vital role in execution of clinical trial projects. The quality of programming deliverables have considerable impact on overall quality of clinical trial results, and hence are assessed during QG’s, in case, if QG’s are planned for a given clinical trial. This paper provides an overview of QG’s in clinical trial projects from a clinical trial programming perspective and discusses key practical aspects and benefits of incorporation of QG’s into overall programming processes.


SI-168 : Vendor Audit: What it is and What to Expect from it
Parag Shiralkar, Sumptuous Data Sciences, LLC
Nagadip Rao, Eliassen Group
Mon, 2:00 PM - 2:20 PM, Location: Room 303-304

Audits are primary instruments to assess the processes, procedures, and practices of an organization. It is common in pharmaceutical industry to utilize multiple contracted service providers or ‘Vendors’ to support various clinical trials activities for the sponsor. Likewise, the sponsors get audited by regulatory agencies. All audits in pharmaceutical industry are geared towards ensuring that the organization follows good clinical practices or GCPs and other regulatory requirements for data management and reporting. The primary objective of ‘vendor audits’ is to ensure that there is sufficient risk management and mitigation procedures executed on sponsor side to ensure integrity of clinical data management and reporting processes. These audits can be conducted pro-actively to support the vendor selection process or can be done re-actively to investigate the practices of the sponsors and providers. Typically, the audit conduct is done by assessment of provider’s quality management system, technological platform, and personnel records. This paper provides an overview of vendor audit from sponsor as well as vendor perspective. It also discusses typical nature of audit findings and possible resolution approaches.


SI-187 : Creating an Internal Repository of Validated R Packages
Phil Bowsher, RStudio Inc.
Cole Arendt, RStudio
Tues, 3:30 PM - 3:50 PM, Location: Lone Star C

RStudio will be presenting an overview of current developments for creating an internal repository of R packages for the R user community at PharmaSUG. In this paper, RStudio will review the steps needed to create an internal repo. This talk will review how to create an internal repo for using R within the regulatory environment. This is a great opportunity to learn about best practices when surfacing out controlled packages in R privately within a company. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to the current landscape of validation as well as recent developments for sharing internal packages. RStudio will share insights and advice from the last 7 years in helping pharma organizations incorporate R into clinical environments.


SI-191 : Orchestrating Clinical Sequels with a Strategic Wand - Challenges of introducing Rolling CSRs in a Master Protocol
Neharika Sharma, GlaxoSmithKline Pharmaceuticals
Tues, 4:00 PM - 4:20 PM, Location: Lone Star C

The requirement of fast paced drug development is often leading to a heightened intricacy of trial designs and upsurge of multifaceted questions to be answered from the same study. This further induces pivotal questions like how to maximize the existing resources to intensify the overall compound strategy, how to be submission ready etc. We had a similar situation on a First Time in Human Trial, a master protocol with an adaptive design, for a gamechanger tried out in multiple cancer indications at different dosages. Introducing the concept of Rolling CSRs for the very first time, as programming lead I drove the locking down of the ready cohorts based on a frequent scrutiny of the ongoing subjects versus the targeted powered sample size, upgraded at regular intervals through futility analysis. In the talk, I would also highlight the various challenges involved e.g. database soft lock/partial unblinding, driving cross functional agreement on the cohort selection and minimal outputs supporting the objective, automated tool to generate the reusable sections of CSR body etc. The strategic involvement of my role further expanded when a combination drug from another pharmaceutical company was introduced, demanding constant negotiations over decisions and safety reporting along with mitigating the unforeseen impacts of Covid-19 on the trial. I would also like to share the success story of striking a balance not only between supporting publications versus future readiness but also between the existing technology(SAS) versus new technology(R) especially when you have majority of the team (including FSPs) untrained on R.


Submission Standards

SS-023 : Lessons Learned from Using P21 to Create ADaM define.xml with Examples
Yizhuo Zhong, Merck
Christine Teng, Merck
Majdoub Haloui, Merck & Co. Inc.
Tues, 11:00 AM - 11:20 AM, Location: Lone Star A

Define.xml is a required component for the FDA, PMDA and NMPA submission data packages. It provides metadata for collected data as well as derived analysis data. Pinnacle 21 Enterprise (P21E) is a web-based tool that is commonly used by many sponsors and CROs to generate the define.xml. P21E supports multiple versions of CDISC standards and regulatory agency’s business rules. During validation of ADaM datasets and ADaM define.xml, error and warning messages will be reported by P21E based on the configuration of data package used. These issues must be addressed to comply with CDISC and agency business rules. This paper will focus on the lessons learned through some examples when creating ADaM define.xml using P21E. A flow chart of suggested process and checking steps will be illustrated. Some best practices on ADaM datasets specification development will be discussed as well.


SS-034 : Damn*t, The Define!
Julie Ann Hood, Pinnacle 21
Sarah Angelo, Pinnacle 21
Tues, 11:30 AM - 11:50 AM, Location: Lone Star A

Validation rules for define.xml have been evolving for over a decade, and sponsors are expected to use these rules to validate prior to submission. However, even after years of improvements to rule descriptions in order to provide further guidance, validation rules can still prove difficult to interpret. Since the define.xml is generally not completed until the end of the study, many users may not be familiar with the validation rules or sometimes even the define.xml file itself. In this paper, we will cover some of the more obscure define validation rules to help provide clarity on what the rules are actually checking for, how to investigate the issue, and the necessary steps to correct the define.xml. This guidance will hopefully empower others in the industry to feel more optimistic about tackling define validation issues and refrain from yelling, "Damn*t, the define!"


SS-035 : FDA Advisory Committee Meetings– A Statistical Programmer’s Survival Guide
Phil Hall, Edwards Lifesciences
Tues, 1:30 PM - 1:50 PM, Location: Lone Star A

Supporting your company’s presentation to an FDA Advisory Committee Meeting may be one of the most important and high-profile things you will do during your career. As a statistical programmer, you’ll have to do a lot preparation and be able to provide analyses to answer any questions during the meeting. This paper explains what to do in the months leading up to the meeting in order to be fully prepared. Although this paper is aimed at statistical programmers, it is of use to all involved in the preparation for the committee meeting, including statisticians and project managers.


SS-039 : Real Time Oncology Review Readiness from Programming Perspective
Binal Mehta, Merck & Co INC
Patel Mukesh, Merck & Co INC
Tues, 2:00 PM - 2:20 PM, Location: Lone Star A

This paper will cover the steps for preparing an RTOR submission from the programmer’s perspective. It will describe which components are submitted as part of the RTOR package and when the programmer should kick off preparation for these components. Timelines for the components in RTOR versus the final submission will be highlighted and how to plan for parallel activities and their impact on preparation for the final submission. Various challenges faced by the statistical programmer will be presented along with steps to take to achieve meaningful resolution of these challenges. Lastly, this paper will also provide insight into coordinating activities between cross functional teams.


SS-079 : eCOA and SDTM: Bring Order to the Wild West
Vincent Allibone, Yprime
Frank Menius, YPrime
Monali Khanna, YPrime
Tues, 2:30 PM - 2:50 PM, Location: Lone Star A

As the implementation of electronic Clinical Outcome Assessments (eCOA) continues to grow, the need for data standards is more apparent than ever. Data standards allow data to be processed and exported more efficiently, leaving us with high quality, organized data. The Clinical Data Interchange Standards Consortium (CDISC) have developed a model for organizing and formatting data in the form of the Study Data Tabulation Model (SDTM), which, when implemented, allows data to be submitted in an organized and clean way. In theory, the application of SDTM to eCOA data is straightforward, however, to the outsider or newcomer, it can seem a daunting and confusing process. With several different data classes, domains, variable roles and core variables that data could fall under, annotating screen reports for eCOA data seems to be straightforward for the experienced and an almost impossible task for the inexperienced. This paper will share tips and techniques of learning standards and applying them, specifically, to eCOA data by providing real-life examples, whilst summarizing some of the core guidelines of SDTM.


SS-087 : Including Population Information in ADaM define.xml for Better Understanding of Datasets
Hong Qi, Merck & Co., Inc.
Majdoub Haloui, Merck & Co. Inc.
Guowei Wu, Merck & Co., Inc.
Tues, 3:30 PM - 3:50 PM, Location: Lone Star A

To create analysis datasets for clinical trials, an important element is population selection criteria in the ADaM (Analysis Dataset Model) dataset specification to define study populations included in each dataset based on the Statistical Analysis Plan (SAP). Per current industry guideline/practice, population selection criteria or population information are not required to be listed directly in the submission document for each ADaM dataset, such as the analysis data definition document, ADaM define.xml, and Analysis Reviewer’s Guide (ADRG). Therefore, this information is not described in ADaM define.xml and often missed in ADRG at the dataset level. Meanwhile, no software is available to automatically include the population selection criteria in define.xml. To help regulatory agencies for an easier review, effort can be made to communicate the study population enclosed in each submitted analysis dataset. This paper discusses the importance of including population information in a submission document and the current gaps. It explains why ADaM define.xml/Dataset-level metadata is the ideal document/section to include this information, and it provides implementation approaches.


SS-110 : A SAS Macro to Support the Supplemental Qualifier Section in cSDRG
Jamuna Purma, Merck
Bindya Bindya Vaswani, Merck
Tues, 4:00 PM - 4:20 PM, Location: Lone Star A

The programing group is responsible for completing the Clinical Study Data Reviewer's Guide (cSDRG). The cSDRG should include a high-level overview of the SDTM data submitted for individual study. The PHUSE template has a supplemental qualifier section 3.4. Traditionally, this was filled in by manually checking each dataset tab in the analysis dataset specification to identify the supplemental domain variables included in the statistical analysis. This paper introduces a SAS macro that identifies supplemental qualifier variables used in supporting key analysis. It creates a table with required columns that can be easily transferred into the cSDRG section 3.4. Additionally, this macro flags any supplemental qualifier variables specified in the ADaM programming specification but not used in generating analysis datasets to improve the quality of cSDRG section 3.4 and ADaM programming specification.


SS-152 : Explanations in the Study Data Reviewer’s Guide: How’s It Going?
Kristin Kelly, Pinnacle 21
Wed, 8:30 AM - 8:50 AM, Location: Lone Star A

In 2018, the paper ‘Best Practice for Explaining Validation Results in the Study Data Reviewer’s Guide’ was presented at the PharmaSUG conference. Its focus was to provide sponsors with some best practices for explaining validation results in the Study Data Reviewer’s Guide (cSDRG) as well as guidance for interpreting some of the more confusing validation rules in Pinnacle 21. This presentation will focus on metrics gathered across industry since that time of the quality of explanations as well as best practices that will aid in improving them even more.


SS-154 : A Standardized Reviewer’s Guide for Clinical Integrated Analysis Data
Srinivas Kovvuri, ADC Therapeutics USA
Kiran Kundarapu, Merck & Co., Inc
Satheesh Avvaru, Alexion AstraZeneca Rare Disease
Randi McFarland, Independent Consultant
Wed, 9:00 AM - 9:20 AM, Location: Lone Star A

The Integrated Summary of Safety (ISS) and Integrated Summary of Efficacy (ISE) are vital components of a successful submission for regulatory review in the pharmaceutical and biotechnology industry. The submission package includes integrated Data Reviewer’s Guide (DRG) along with integrated database(s) (ISS and/or ISE) that allow reviewers to better understand traceability of the data, pooling strategy, analysis considerations, subject-specific significant information, and conformance findings. The PHUSE Analysis Data Reviewer’s Guide (ADRG) applies to a single study. Sponsors have been adopting this template and making amendments when creating a reviewer's guide for integrated analysis data submission. PHUSE recognized the need for a new, standardized template and guidelines for integrated analysis data reviewer’s guide. The integrated analysis data reviewer’s guide (iADRG) template was developed by PHUSE considering integration strategies, challenges, and regulatory recommendations. This paper shares integration approaches that were considered and provides structure for context of the integrated analysis data to assist the regulatory reviewer.


SS-158 : A Practical Approach to Preparing Documentation for Clinical Registries
Jennifer McGrogan, Reata Pharmaceuticals
Rex Tungala, Reata Pharmaceuticals
Mario Widel, Reata Pharmaceuticals Inc.
Wed, 9:30 AM - 9:50 AM, Location: Lone Star A

Conducting a clinical trial includes compliance with legal aspects, specifically fulfilling legal requirements at certain times during the study. Among those requirements is registering and posting clinical trial information, such as select trial results. Depending on where the trial is conducted, there are different clinical registries that are expected to be used. The processes are documented on the corresponding websites where the trials must be registered, however, navigating these websites may not be trivial. While it is possible to manually enter the numbers into the registry, some effort is needed to ensure that the information that has been entered is accurate. For example, with modifications to the programs that produce select trial results, it is possible to automate part of the uploading process, leading to an easier, and quicker direct upload to the registry. In this paper, we will provide case studies and examples, illustrating the process for efficiently creating and validating documents ready for upload to the clinical registries.


e-Posters

EP-007 : Clean Messy Clinical Data Using Python
Abhinav Srivastva, Exelixis Inc
Mon, 10:00 AM - 10:45 AM, Location: E-Poster

Data in clinical trials can be transmitted in various formats such as Excel, CSV, tab-delimited, ASCII files or via any Electronic Data Capture (EDC) tool. A potential problem arises when data has embedded special characters or even non-printable (control) characters which affects all downstream analysis and reporting; in addition to being non-compliant with CDISC standards. The paper explores various ways to handle these unwanted characters by leveraging the capabilities of Python language. The discussion begins with correcting these characters using their ASCII code, followed by Unicode, and later exploring normalization techniques such as NFC, NFD, NFKC, NFKD to normalize unicode characters. Often times, data cleaning will require a combination of these techniques to get the desired result.


EP-032 : Using SAS ® to Create a Build Combinations Tool to Support Modularity
Stephen Sloan, Accenture
Tues, 3:00 PM - 3:45 PM, Location: E-Poster

With SAS ® PROC SQL we can use a combination of a manufacturing Bill of Materials and a sales specification document to calculate the total number of configurations of a product that are potentially available for sale. This will allow the organization to increase modularity and reduce cost while focusing on the most profitable configurations with maximum efficiency. Since some options might require or preclude other options, the result is more complex than a straight multiplication of the numbers of available options. Through judicious use of PROC SQL, we can maintain accuracy while reducing the time, space, and complexity involved in the calculations.


EP-037 : Sponsor-defined Controlled Terminologies: A Tiny Key to a Big Door
Shefalica Chand, Seattle Genetics, Inc.
Mon, 10:00 AM - 10:45 AM, Location: E-Poster

It takes a village to bring a novel treatment to the patients. One of the keys to getting faster drug approval is submitting a data package to the agencies that is easy to maneuver and unlock an efficient and quality review. There are several aspects contributing to a robust submission data package, and Controlled Terminology (CT) is one of them. CT is the collection of code lists and values used within a data set. CDISC regularly shares industry standard CT and code list values. Apart from these CDISC-compliant CT, sponsors are allowed to add sponsor-defined CT and additional code list values in extensible CDISC CT. Adding sponsor-defined CT for as many variables as possible gives FDA reviewers better insight into the submission data. These CT and code lists are summarized into submission data dictionary files such as define.xml. It can be a tedious task for sponsors to present sponsor-defined code list values, especially for larger studies and data integration projects. This paper will share an approach for collate sponsor-defined code list values and CT in a fast and easy manner, removing any manual intervention. A utility and instructions will be shared that can help not just summarize sponsor-defined CT and code lists, but also perform checks and validation that are typically not done by Pinnacle 21 validator or can be done upstream to avoid late changes and updates. This efficiency adds value for programming teams and eventually our patients.


EP-063 : Unravel the mystery around NONMEM data sets for statistical programmers
Sabarinath Sundaram, Seagen Inc
Lyma Faroz, Seagen Inc.
Mon, 3:00 PM - 3:45 PM, Location: E-Poster

Having a deeper knowledge of the relationships between pharmacokinetics (PK) / pharmacodynamics (PD) and exposure-response effects is crucial to the product development and drug registration. Nonlinear mixed-effects analysis is typically adopted for any repeated measurements or longitudinal data analysis such as PK and PKPD using a software called Nonlinear Mixed Effects Modeling (NONMEM). A high-quality NONMEM data set is essential for this important decision-making modeling. Statistical programmers must have a solid understanding of PK and PD data associated with the study drug components and the study design. For a NONMEM data set, statistical programmers need to input dosing information, scheduled timepoints, PK data, PD data (labs, response, biomarker, and adverse events) as assigned by the PK scientist, as well as covariant information like demographics, baseline laboratory results, and vital signs and disease characteristics among others. To perform data-driven and model-based updates, a thorough understanding of the PK/PD data is required along with a high degree of collaboration with PK scientists. This paper will discuss the basics for creating NONMEM data sets and some pointers on using the information from various sources to create a NONMEM data set that is different from traditional CDISC-compliant data sets. Additionally, this paper introduces details about standard variables that are expected in such data sets.


EP-068 : Begin with the End in Mind: eCOA data from collection to analysis
Frank Menius, YPrime
Monali Khanna, YPrime
Vincent Allibone, YPrime
Tues, 10:00 AM - 10:45 AM, Location: E-Poster

Similar to Stephen Covey’s 7 Habits of Highly Effective People, highly effective clinical trials should “begin with the end in mind”. Rigorous analysis of data followed by successful submission to regulatory agencies should be the end goal of data collection. It sounds simple and straightforward, especially when dealing with questionnaire data often associated with electronic Clinical Outcome Assessments (eCOA) but, in an industry with increasing knowledge silos and outsourcing, it often is quite the opposite. Simple ideas like collecting data that are usable for analysis can often get lost in the machinery of running a study. Often data management and statisticians are left trying to make sense of incomplete, dirty, or *gasp* free text data. Many times assumptions are made during study design that do not clearly translate to collected data usable in analysis This paper will focus on steps that all stakeholders can take to make sure that the end goal is kept in mind, from eCOA design, to data collection, to analysis, and finally submission. We will touch on standardization like Standard Data Tabulation Model plus or minus (SDTM+/-); controlled terminology; formatting; guiding a design team; asking the right questions; existing resources and libraries such as Questionnaires, Ratings, and Scales (QRS) supplements produced by the Clinical Data Interchange Standards Consortium (CDISC); and drafting a clear data transfer agreement. Most importantly this paper will address getting the right people in the discussion at the right time - the beginning.


EP-096 : Bookmarking CRFs More Efficiently
Noory Kim, SDC
Tues, 10:00 AM - 10:45 AM, Location: E-Poster

Preparing an annotated CRF (aCRF) entails adding PDF bookmarks. If done manually, this can be very time-consuming. This paper will show how to automate aspects of this process, including the specification of bookmark properties required by the FDA or CDISC (namely, nested levels and inherit zoom magnification). This paper assumes the reader knows how to edit the Windows PATH environment variable and use a Windows command line.


EP-156 : Automation of Clinical Data extracts using Cloud Applications
Syam Prasad Chandrala, Allogene
Madhusudhan Nagaram, Allogene Therapeutics
Chaitanya Chowdagam, Ephicacy Lifescience Analytics
Jegan Pillaiyar, Ephicacy Lifescience Analytics
Kunal Chattopadhyay, ZS Associates
Mon, 3:00 PM - 3:45 PM, Location: E-Poster

Data Managers and external vendors post the data in the form of zip files into various storage locations such as BOX or SharePoint or FTP etc. Some of these data extracts are scheduled periodically and some are scheduled bimonthly or on ad Hoc basis. In addition to this, these data extracts could be in various formats and most of the times the files are password protected. It is programmer’s responsibility to make sure, current data as well as historic data, are stored in the appropriate folders and in consistent format such as SAS7BDAT as a source for creating SDTM, ADaM datasets. Programmers spend a lot of time and effort in managing this process manually. By leveraging the Cloud based applications such as AWS secrets manager, AWS lambda functions in combination with python scripting, SAS program and scheduling shell scripts, we have made this process automated and serverless. Latest Data from SFTP will be identified, stored into AWS S3 location and unzipped using the correct password and save the files an appropriate folder and send email to the team for success or failure of the data extracts along with the logs. This data will be posted on data lake to make it available for data analytics tools such as SAS, Spotfire, tableau, R etc. with minimal or no intervention of programmer. This paper demonstrates flow of data though multi-level cloud-based serverless architecture conjugated with python, SAS programs, and shells scripts.


EP-176 : Automation using SAS Makes it Easy to Monitor Dynamic Data in Clinical Trial
Wenjun He, EMMES
Tues, 3:00 PM - 3:45 PM, Location: E-Poster

A successful clinical trial is the teamwork composed of study startup, conduct, data collection, monitoring and data cleaning, among which it is critical that the study is implemented as specified in the protocol and the data is clean and of good quality. As most electronic data capture (EDC) system is setup to export data as SAS datasets, the clinical SAS programming team can leverage these datasets and provide metrics and tools to facilitate dynamic data monitoring, improving the quality of research study data and enhancing human subject protection especially in risk-based monitoring. This paper presents example SAS code using SAS macros and other SAS facilities to automate tracking enrollment progression, site metrics reporting, and safety-based medical monitoring, which can benefit EDC clinical reporting and trial management processes.