Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2021 will feature nearly 200 paper presentations, posters, and hands-on workshops. Papers are organized into 14 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 02-Apr-2021.


Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-008 Steve Black Don't send the Macros, send the Catalog! The SAS Macro Catalog: a perfect way to send your macros to the client without sharing the code.
AP-018 Stephen Sloan
& Kirk Paul Lafler
A Quick Look at Fuzzy Matching Programming Techniques Using SAS Software
AP-020 Stephen Sloan Twenty Ways to Run Your SAS Program Faster and Use Less Space
AP-030 Richann Watson
& Louise Hadden
"Bored"-Room Buster Bingo - Create Bingo Cards Using SAS ODS Graphics
AP-034 Jayanth Iyengar From %Let to %Local; Methods, Use, and Scope of Macro Variables in SAS Programming
AP-041 Scott Horton Laboratory Results in Relation to Concomitant Medications (CMs)
AP-042 Yun (Julie) Zhuo Survival Statistics with PROC LIFETEST and PROC PHREG: Pitfall-Avoiding Survival Lessons for Programmers
AP-048 Tong Zhao
& Jin Shi
Using Stata for handling CDISC compliant datasets and outputs
AP-053 Jeff Xia A SAS Macro to Generate Slim Version of Define for RTOR
AP-066 Jagadish Katam Dynamic macro using the convergence dataset to generate the Odds Ratio (OR), Risk Ratio (RR) and Risk Difference (RD) per SOC/PT using Genmod procedure
AP-067 Shuqi Zhao Imputation for Missing Dosing Time in NONMEM PopPK Datasets
AP-071 Chen-Jung Wu An efficient method to review annotation and create bookmark for case report form (acrf.pdf) using SAS program, FDF files, excel and Javascript in Acrobat
AP-088 Jian-An Lu
& Jinwei Yuan
Using SAS for Forest Plots in AMNOG Meta-Analysis Reporting
AP-092 David Horvath PROC IMPORT and more. Or: when PROC IMPORT just doesn't do the job.
AP-095 David Horvath Lazy Programmers write Self-Modifying code OR Dealing with XML file Ordinals
AP-099 Rakesh Kanojia Unix: A Best Friend of SAS
AP-101 Jeremy Gratt
& Aditya Tella
Re-pagination of ODS RTF Outputs to Automate Page Breaks and Minimize Splits Across Pages
AP-103 Raghava Pamulapati
& Sandeep Meesala
Developing SAS Macro to Find Subject's Latest Available Date from Entire Database
AP-106 Benjamin Straub Bookdown and Blogdown: Deep dive of the R packages and components needed to create documentation and websites.
AP-117 Kirk Paul Lafler Under the Hood: The Mechanics of Various SQL Query Optimization Techniques
AP-120 Dan Xiao
& Jing Su
& Shuqi Zhao
& Xiang Liu
Programming Time Varying Concomitant Medication Covariates in NONMEM PopPK Dataset
AP-131 Abu Zafar Mohammad Sami An R Shiny Application: Validated Analysis Results Using R and SAS
AP-132 Sathish Saravanan
& Kameswari Pindiprolu
100% Menu Driven SAS Windows Batch Job Runner and Log Analyzer
AP-151 Keith Shusterman
& Mario Widel
Using PROC FCMP to Create Custom Functions in SAS
AP-156 Nagalakshmi Kudipudi "Packages" in R for Clinical Data Analysis, Let's Demystify
AP-174 Manasa Gangula Overlay Graphs: A Powerful Tool to Visualize and Simplify Complex Data
AP-184 David Franklin Thwarting Nasties
AP-195 Kirsty Lauderdale
& Kjersten Offenbecker
How to ingratiate yourself with the "old-timers"
AP-199 Troy Hughes
& Louise Hadden
Should I Wear Pants in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures that Leverage SAS Arrays
AP-204 Phil Bowsher
& Sean Lopp
R Reproducibility Best Practices for Pharma
AP-207 Liming Xie
& Wei Wei
GCIG Criteria, Programming Makes It Easy

Applications Development

Paper No. Author(s) Paper Title (click for abstract)
AD-002 Chengxin Li
& Michael Pannucci
& Toshio Kimura
Macro Development from Design to Implementation Applied to Occurrence Data Frequency Analysis
AD-036 Noory Kim
& Bhagyashree Shivakumar
Annotating CRFs More Efficiently
AD-054 Jundong Ma
& Zhiping Yan
How to Translate RTF Documents
AD-060 Yang Gao
& Christine Teng
A Diagnostic Report for SAS Submission Programs
AD-064 Sandeep Juneja
& Ben Bocchicchio
Harness analytical strengths: SAS, R, Python Web Services
AD-079 Peikun Wu
& Uday Preetham Palukuru
& Sarad Nepal
& Yiwen Luo
& Yilong Zhang
Analysis and Reporting in Regulated Clinical Trial Environment using R
AD-100 Anne Petersen
& Cristina Boschini
& Tobias Kröpelin
A metadata driven approach to mock TFL generation
AD-109 Terek Peterson
& Hayley Jeffery
& Justin Jaeschke
Solution when Global Pandemic resurrects paper-based PROs creating collection headaches for Data Managers
AD-112 WenHsuan Wu An Application of Flexible Operation and Position Page Number for RTF Outputs
AD-133 Sumit Pratap Pradhan Archival Tool: Automation using PowerShell
AD-162 Jose Lacal Generating Synthetic CDISC Clinical Trial Data.
AD-164 Hrideep Antony
& Aman Bahl
R-Shiny based customized data viewer and search engine
AD-168 Lex Jansen Extracting Data Standards Metadata and Controlled Terminology from the CDISC Library using SAS with PROC LUA
AD-175 David Bosak The SASSY System: Making R Easier for SAS Programmers
AD-176 Mike Stackhouse
& Nathan Kosiba
Open-Source Development for Traditional Clinical Reporting
AD-179 Igor Goldfarb
& Ella Zelichonok
Macro to Compare Titles and Footnotes in Produced TLF and Corresponding Shells
AD-185 David Franklin Generating the Demographics and Adverse Event Tables Using Excel and VBA
AD-205 Phil Bowsher
& Sean Lopp
Productionalizing Shiny Deployments

Artificial Intelligence (Machine Learning)

Paper No. Author(s) Paper Title (click for abstract)
AI-010 Shubhranshu Dutta Integrating Digital Images for Ophthalmology Trials using Machine Learning Techniques for Image Analysis.
AI-012 Farha Feroze
& Ilango Ramanujam
Application to Automate Clinical Study Report Generation using AI
AI-049 Kevin Lee How I became a Machine Learning Engineer from a Statistical Programmer
AI-140 Sumeet Subhedar Virtual human organs in clinical trails
AI-163 Hrideep Antony
& Aman Bahl
Random Forest Machine Learning Application using R Shiny for Project Timeline Forecasting

Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-014 Jennifer Fulton CDISC ADaM Phases, Periods, and Subperiods: A Case Study
DS-033 Madhusudhan Papasani
& Swathi Kotla
& Elisabeth Pyle
Neoadjuvant and Adjuvant Oncology Clinical Trials and Considerations for Designing the Subject Level Analysis Dataset (ADSL)
DS-037 Carey Smoak Creating Hyperlinks in SDTM Specifications
DS-080 Sandra Minjoe When Should I Break ADaM Rules?
DS-081 Sandra Minjoe Adding Rows to a BDS Dataset: When to Use DTYPE in Addition to Metadata
DS-102 Oksana Mykhailova
& Andrii Klekov
ADaM-like Dataset: how to do big things in a short time
DS-130 Hansjörg Frenzel Untangling the Knot - Implementing Analysis Results Standards (Using SAS)
DS-142 Kirsten Langendorf
& Johannes Ulander
Biomedical Concepts - An Emerging CDISC Data Standard: A Stepwise Approach for Building a Library of Data Definitions
DS-165 Hrideep Antony
& Aman Bahl
Validation of CDISC specifications using VALSPEC utility macro
DS-172 Vamshi Matta
& Savithri Jajam
& Lavanya Peddibhotla
Rescreened Subjects, Data Collection and Standard Domains Mapping

Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-040 Manohar Modem
& Bhavana Bommisetty
Graphs Made Easy Using SAS Graph Template Language
DV-061 Anilkumar Anksapur
& Raghava Pamulapati
Double Waterfall Plot Creation for Comparing Target Lesions Data
DV-078 Pavani Potuganti
& Sree Nalluri
Graphical Representation of Clinical Data using Heat Map Graph
DV-082 Ryan Yu Automated CONSORT Flow Diagram Generation by SAS Programming
DV-086 Ballari Sen Creating Patient Profiles Using SAS® Software: Proc Report & SAS ODS
DV-113 Swapna Ambati Data Visualization in Real World Data (RWD) and Health Economics Outcomes Research (HEOR) using (Statistical Analysis Software) SAS
DV-114 John Henderson
& Lili Li
Customize and Beautify Your Kaplan-Meier Curves in Simple Steps
DV-115 Kirk Paul Lafler Making Your SAS Results More Meaningful with Color
DV-187 Louise Hadden Dressing Up your SAS/GRAPH and SG Procedural Output with Templates, Attributes and Annotation
DV-188 Louise Hadden Visually Exploring Proximity Analyses Using SAS PROC GEOCODE and SGMAP and Public Use Data Sets
DV-192 Kriss Harris Super Static SG Graphs
DV-200 Soujanya Konda A Sassy Substitute to Represent the Longitudinal Data - The Lasagna Plot

Hands-On Training

Paper No. Author(s) Paper Title (click for abstract)
HT-009 Jayanth Iyengar Understanding Administrative Healthcare Datasets using SAS programming tools
HT-118 Kirk Paul Lafler Essential Programming Techniques Every SAS User Should Learn
HT-127 Deanna Schreiber-Gregory Mitigating Multicollinearity: Using Regulation Techniques to Minimize the Effect of Collinearity
HT-197 Troy Hughes Yo Mama is Broke Cause Yo Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS Data Sets Through Exception Handling Routines
HT-206 Phil Bowsher
& Sean Lopp
R & Python for Drug Development
HT-210 Kevin Lee Machine Learning Programming Workshop - Natural Language Processing (NLP)
HT-212 Charu Shankar SAS Macro Efficiencies
HT-214 Vince DelGobbo Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You

Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-093 Yura Suzuki
& Yuichi Koretaka
& Ryo Kiguchi
& Yoshitake Kitanishi
Why Data Scientists need leadership skills? Story of Cross-Value Chain Data Utilization Project
LS-116 Kirk Paul Lafler Differentiate Yourself
LS-148 Sarah McLaughlin How to handle decision making in the CRO / sponsor relationship in matters that require expertise
LS-152 Srinivasa Rao Mandava Virtual Recruitment and Onboarding of Mapping Programmers during Covid 19- Merck's RaM Mapping Experience
LS-193 Kriss Harris Leadership and Programming
LS-194 Kirsty Lauderdale Managing Transitions so Your Life is Easier

Medical Devices

Paper No. Author(s) Paper Title (click for abstract)
MD-035 Julia Yang
& Silvia Faini
& Karin LaPann
ADaM Implementation Guide for Medical Devices
MD-038 Carey Smoak Types of Devices in Therapeutic Area User Guides
MD-044 Phil Hall
& Tikiri Karunasundera
& Subendra Maharjan
& Vijaya Vegesna
Multiple Successful Submissions of Medical Device Clinical Trial Data in the US and China using CDISC

Quick Tips

Paper No. Author(s) Paper Title (click for abstract)
QT-007 Frederick Cieri How to Write a SAS Open Code Macro to Achieve Faster Updates
QT-021 Stephen Sloan Running Parts of a SAS Program while Preserving the Entire Program
QT-026 Hengwei Liu From SAS Dataset to Structured Document
QT-027 Richann Watson
& Louise Hadden
What Kind of WHICH Do You CHOOSE to be?
QT-065 Jagadish Katam Unnoticed individual and mean line plot issue while using the SGPLOT procedure and the solution to avoid
QT-069 Mindy Wang Writing Out Your SAS Program Without Manually Writing it Out- Let Macros Do the Busy Work
QT-075 Lasse Nielsen Shiny decisions - An R-shiny application for decision frameworks
QT-096 David Horvath NOBS for Noobs
QT-098 David Horvath Wildcarding in Where Clauses
QT-111 Xingxing Wu
& Bala Dhungana
A Unique Way to Identify the Errors in SDTM Annotations Using SAS
QT-129 Samadi Karunasundera
& Jun Wang
Easing the Process of Creating CRF Annotations
QT-134 Rowland Hale Using Regex to Parse Attribute-Value Pairs in a Macro Variable
QT-139 Teckla Akinyi Common Dating in R: With an example of partial date imputation
QT-147 Shikha Sreshtha Overcoming the Challenge of SAS Numeric Date Comparisons
QT-159 Yuping Wu Manage SAS Log in Clinical Programming
QT-161 Frank Menius
& Monali Khanna
& Terek Peterson
& Dennis Sweitzer
"Who wrote this!?" Challenges of replacing Black Box software with more dynamic supportable processes utilizing Open-Source Languages
QT-180 David Franklin The Mean, But Not as You Know It!
QT-183 David Franklin PROC TABULATE and the Percentage
QT-186 Derek Morgan Where's Waldo? An SDTM Crawler
QT-189 Louise Hadden Management of Metadata and Documentation When Your Data Base Structure is Fluid: What to Do if Your Data Dictionary has Varying Number of Variables

Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-063 Jozef Aerts Generating FDA-ready Submission Datasets directly from EHRs
RW-126 Ginger Barlow A Programmer's Experience Researching Real-World Evidence (RWE) COVID-19 Data
RW-136 Lekitha Jk IOT or IOMT?
RW-160 Neha Srivastava
& Lavanya Peddibhotla
& Sivasankar Konda
Patient Registries - New Gold Standard for Real World Data
RW-198 Troy Hughes Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS Software
RW-202 Jayanth Iyengar NHANES Dietary Supplement component: a parallel programming project
RW-208 Cara Lacson
& Carol Matthews
Life After Drug Approval... What Programmers Need to Know About REMS
RW-209 Irene Cosmatos
& Michael Bulgrien
Standardizing Laboratory Data From Diverse RWD to Enable Meaningful Assessments of Drug Safety and Effectiveness

Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
SA-032 Deanna Schreiber-Gregory Back to Basics: Running an Analysis from Data to Refinement in SAS
SA-043 Ilya Krivelevich
& Simon Lin
Visualization of Sparse PK Concentration Sampling Data, Step by Step (Improvement by Improvement)
SA-047 Girish Kankipati
& Jai Deep Mittapalli
Build a model: Introducing a methodology to develop multiple regression models using R in oncology trial analyses
SA-062 Janaki Devi Manthena
& Varsha Korrapati
& Chiyu Zhang
SAS Proc Mixed: A Statistical Programmer's Best Friend in QoL Analyses
SA-138 Abhyuday Chanda Willing to Play with William's Design: An Exemplification of Randomization Schedule Generation using SAS
SA-141 Vidya Srinivas
& Ankita Sharma
BAYESIAN Approach on COVID-19 test results
SA-145 Chaithanya Velupam
& Kaushik Sundaram
Preserving the Privacy of Participants' Data by Data Anonymization
SA-153 Ritu Karwal Statistical Considerations for Decentralised Trials
SA-154 Sakshi Kaushik
& Sakthivel Sivam
Leveraging Generalized Additive Models in Assessing the Disease Progression through Digital Wearable Devices on Patients with Parkinson's Disease with Dementia
SA-155 Snehal Sanas
& Pradeep Umapathi
Evaluating anthropometric growth endpoints with Z-Scores and Percentiles
SA-211 Lois Wright
& Allison Sealy
& Bahar Biller
& Shawn Tedman
& Pritesh Desai
How simulation will impact the future of Healthcare & Life Sciences.

Strategic Implementation

Paper No. Author(s) Paper Title (click for abstract)
SI-019 Stephen Sloan
& Lindsey Puryear
Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
SI-028 Vincent Amoruccio
& Min Lee
& Daniel Woodie
A TransCelerate Initiative - how can you modernize your statistical environment?
SI-046 Amy Gillespie
& Susan Kramlik
& Suhas Sanjee
PROC FUTURE PROOF v1.1-- Linked Data
SI-068 Weiwei Guo
& Chintan Pandya
& Christine Teng
Lesson learned from a successful FDA Oncology Drug Advisory Committee (ODAC) meeting
SI-074 Aiming Yang
& Yalin Zhu
& Yilong Zhang
A Strategy to Develop Specification for R Functions in Regulated Clinical Trial Environments
SI-076 Susan Kramlik
& Eunice Ndungu
& Hemu Shere
The Search for a Statistical Computing Platform
SI-083 Sarad Nepal
& Uday Preetham Palukuru
& Peikun Wu
& Madhusudhan Ginnaram
& Ruchitbhai Patel
& Abhilash Chimbirithy
& Changhong Shi
& Yilong Zhang
Agile Project Management in Analysis and Reporting of Late Stage Clinical Trial
SI-084 Madhusudhan Ginnaram
& Amin Shirazi
& Simiao Ye
& Yalin Zhu
& Yilong Zhang
A Process to Validate Internal Developed R Package under Regulatory Environment
SI-108 Benjamin Straub Bookdown and Blogdown: Using R packages to document and communicate new processes to Clinical Programming
SI-166 Aman Bahl
& Steve Benjamin
& Hrideep Antony
Delight the Customer using agile transformation in Clinical Research
SI-173 Alyssa Wittle The Tortoise and the Hare - Lessons Learned in Developing ADaM Training
SI-196 Troy Hughes Chasing Master Data Interoperability: Facilitating Master Data Management (MDM) Through CSV Control Tables that Contain Data Rules that Support SAS and Python Data-Driven Software Design
SI-203 Phil Bowsher
& Sean Lopp
R Validation: Approaches and Considerations
SI-213 Stijn Rogiers
& Jean Marc Ferran
Manage TFLs Development in LSAF 5.3 using SAS and R code

Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-016 Xiang Wang
& Daniel Huang
Design ADaM Specification Template that Simplifies ADaM Programming and Creation of Define XML in CDISC Era
SS-055 Mi Young Kwon
& Rohit Kamath
BIMO SAS Macros and Programming Tools
SS-058 Simiao Ye
& Yilong Zhang
Shiny App for Generating Data Dependency Flowchart in ADRG
SS-072 Siru Tang
& Yanhong Li
Challenges and Solutions for an ISM Development
SS-073 Yuan Yuan Dong
& Wang Zhang
& Lili Ling
New Requirements of Clinical Trial Data Submission for China Filing: An Implementation
SS-110 Shefalica Chand Assembling Reviewer-friendly eSubmission Packages
SS-137 Ranvir Singh Effective Approach for ADaM Submission to FDA and PMDA
SS-171 Pranav Soanker
& Santosh Shivakavi
A Lead Programmer's Guide to a Successful Submission


Paper No. Author(s) Paper Title (click for abstract)
EP-004 Hengwei Liu Migration of SAS Data from Unix server to Linux server
EP-006 Pareshkumar Paghdar
& Jayesh Patel
Handling Patient/Safety Narrative Header by SAS Programming
EP-013 Corey Evans
& Ting Su
& Qingshi Zhou
Reduce Review Time: Using VB to Catch Spelling Errors & Blank Pages in TLFs
EP-029 Hengwei Liu A Macro for Point Estimate and Confidence Interval of the Survival Time Percentiles
EP-039 Rui Huang
& Toshio Kimura
& Jennifer McGinniss
Data Visualization Using GANNO, GMAP and GREMOVE to Map Data onto the Human Body
EP-057 YuTing Tian
& Todd Case
Generating .xpt files with SAS, R and Python
EP-070 Lyma Faroz First Time Creating a Submission Package? Don't Worry, We Got You Covered!
EP-077 Alan Meier Introducing and Increasing Compliance of Good Programming Habits
EP-085 Huei-Ling Chen
& Heng Zhou
& Madhusudhan Ginnaram
& Yilong Zhang
Making Customized RTF Output with R Package r2rtf
EP-087 Raj Sharma Make Your Life Easy! How SpotFire and SAS together can help with better Data Review and Data Quality
EP-105 Kalyani Telu Restriction of Macro Variable Length - Dynamic Approach to Overcome
EP-122 Syam Prasad Chandrala
& Chaitanya Chowdagam
Interactive Clinical Dashboards using R Studio
EP-123 Sidian Liu
& Toshio Kimura
Intelligent Axis Scaling in SAS plots: %IAS
EP-124 Dennis Sweitzer No Black Boxes: An Utterly Transparent Batch Randomization System in R
EP-125 Yaling Teng Intermediate Dataset for Oncology Efficacy Endpoint ADaM Data - Here Come Some Details
EP-143 Steffen Müller Customizing define.xml files
EP-146 Giri Prasad
& Hareeshkumar Gurrala
e-Submissions key difference of FDA and PMDA
EP-157 Aditya Gadiko
& Christopher Hurley
& Anish Iyer
Fast, Efficient, and Automated Real-time Analytics for Clinical Trials
EP-158 Ravichandra Hugar
& Sandeep Lakkol
NONMEM: An Intricate Dataset Programming defined in a simpler way !!
EP-167 Frank Menius
& Monali Khanna
Forward Planning: How to get the most of your eCOA data
EP-169 Savithri Jajam
& Lavanya Peddibhotla
Split Character Variables into Meaningful Text
EP-178 Eli Miller Containers you can Count On: A Framework for Qualifying Community Container Images
EP-182 David Franklin Mashing Two Datasets Together
EP-190 Louise Hadden Looking for the Missing(ness) Piece
EP-191 Wenjun He Only Get What You Need - To Simplify Analysis Data Validation Report from PROC COMPARE Output


Advanced Programming

AP-008 : Don't send the Macros, send the Catalog! The SAS Macro Catalog: a perfect way to send your macros to the client without sharing the code.
Steve Black, Precision for Medicine

If you've ever gotten squeamish about sending a macro or a lot of macros to a client, SAS has created a way to help ease the tension. A SAS Macro Catalog allows you to send your macro along with the option of hiding the code or not. Saved as single file, the catalog is easy to transfer and apply to any study. A simple call will activate all the macros stored and then they can be used just as they are setup in your system. In this paper I will show how to: create a SAS Macro Catalog, add macros to the catalog, and manage the catalog once it has been setup. I'll then demonstrate how to create a macro which adds a large number of macros to the catalog that have already been created. With this new found skill you can create and send over one file and rest easy knowing that your super fancy macro logic is useable, safe and secure.

AP-018 : A Quick Look at Fuzzy Matching Programming Techniques Using SAS Software
Stephen Sloan, Accenture
Kirk Paul Lafler, sasNerd

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or "keys", and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or "messy" data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.

AP-020 : Twenty Ways to Run Your SAS Program Faster and Use Less Space
Stephen Sloan, Accenture

When we run SAS® programs that use large amounts of data or have complicated algorithms, we often are frustrated by the amount of time it takes for the programs to run and by the large amount of space required for the program to run to completion. Even experienced SAS programmers sometimes run into this situation, perhaps through the need to produce results quickly, through a change in the data source, through inheriting someone else's programs, or for some other reason. This paper outlines twenty techniques that can reduce the time and space required for a program without requiring an extended period of time for the modifications. The twenty techniques are a mixture of space-saving and time-saving techniques, and many are a combination of the two approaches. They do not require advanced knowledge of SAS, only a reasonable familiarity with Base SAS® and a willingness to delve into the details of the programs. By applying some or all of these techniques, people can gain significant reductions in the space used by their programs and the time it takes them to run. The two concerns are often linked, as programs that require large amounts of space often require more paging to use the available space, and that increases the run time for these programs.

AP-030 : "Bored"-Room Buster Bingo - Create Bingo Cards Using SAS ODS Graphics
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

Let’s admit it! We have all been on a conference call that just … well to be honest, it was just bad. Your misery could be caused by any number of reasons – or multiple reasons! The audio quality was bad, the conversation got sidetracked and focus of the meeting was no longer what it was intended, there could have been too much background noise, someone hasn’t muted their laptop and is breathing heavily – the list goes on ad nauseum. Regardless of why the conference call is less than satisfactory, you want it to end, but professional etiquette demands that you remain on the call. We have the answer – SAS®-generated Conference Call Bingo! Not only is Conference Call Bingo entertaining, but it also keeps you focused on the conversation and enables you to obtain the pertinent information the conference call may offer. This paper and presentation introduce a method of using SAS to create custom Conference Call Bingo cards, moving through brainstorming and collecting entries for Bingo cards, random selection of items, and the production of bingo cards using SAS reporting techniques and the Graphic Template Language (GTL). (You are on your own for the chips and additional entries based on your own painful experiences)! The information presented is appropriate for all levels of SAS programming and all industries.

AP-034 : From %Let to %Local; Methods, Use, and Scope of Macro Variables in SAS Programming
Jayanth Iyengar, Data Systems Consultants LLC

Macrovariables are one of the powerful capabilities of the SAS system. Utilizing them makes your SAS code more dynamic. There are multiple ways to define and reference macrovariables in your SAS code. From %Let and Call Symput to Proc Sql Into:. There are also several kinds of macrovariables, in terms of scope and other ways. Not every SAS programmer is knowledgeable about the nuances of macro variables. In this paper, I explore the methods for defining and using macrovariables. I also discuss the nuances of macrovariable scope, and the kinds of macrovariables from user-defined to automatic.

AP-041 : Laboratory Results in Relation to Concomitant Medications (CMs)
Scott Horton, United BioSource

In addition to viewing laboratory results relative to which study treatment a patient was receiving, often our clinical colleagues are interested in certain concomitant medications that might affect these results as well. This review can be done by comparing the dates in both a concomitant medication listing and a laboratory results listing-this type of review is by its nature very tedious. An approach to easily combine laboratory results with what concomitant medications a patient was one at that same time is discussed. The complexity of combining these two types of data is simplified by the use of a recursive macro that processes the concomitant medication data "one bite at a time" to create a new concomitant medications dataset containing records of all concomitant medications of interest for any given time point for each patient. This resulting concomitant medications dataset allows the two types of data to be easily combined for clinical review in listings or tables.

AP-042 : Survival Statistics with PROC LIFETEST and PROC PHREG: Pitfall-Avoiding Survival Lessons for Programmers
Yun (Julie) Zhuo, PRA Health Sciences

Survival statistics play a critical role in the analysis of efficacy in clinical trials. In SAS, the LIFETEST procedure compares the survivor function between study arms, and the PHREG procedure estimates the effect of study treatments on hazard rates. This paper shares the lessons we have learned from programming survival analysis with SAS for multiple sponsor clients. Topics vary from P-values, handling warning messages, to some life-saving SAS options. The goal is to point programmers to potential pitfalls and show them ways around it. Geared towards programmers, this paper does not intend to explain any statistical underpinnings of the analysis.

AP-048 : Using Stata for handling CDISC compliant datasets and outputs
Tong Zhao, LLX Solutions, LLC
Jin Shi, LLX Solutions, LLC

In cases where SAS software is not capable or financially beneficial to many researchers, STATA may be a good substitute for generating CDISC compliant data and results for company's use. In this paper, we shall explain how the CDISC standards work, how STATA can simplify many of the routine tasks encountered in handling CDISC datasets / TFLs, and the great efficiencies that can result from using STATA.

AP-053 : A SAS Macro to Generate Slim Version of Define for RTOR
Jeff Xia, Merck

Real-Time Oncology Review (RTOR) pilot program in FDA creates a channel for sponsors to provide earlier access of study data to agency reviewers to explore more efficient review of oncological submissions with improved efficiency, which could lead to safe and effective treatments for patients as early as possible. One of the required components in RTOR is corresponding ADaM datasets for key efficacy and safety tables/figures for pivotal study, which might be a subset of the entire ADaM datasets for pivotal study (i.e., excluding datasets and variables that are not involved in key efficacy and safety analysis). A corresponding slim version of define documents (define.xml and define.pdf) is also required. See full text in attached Word file

AP-066 : Dynamic macro using the convergence dataset to generate the Odds Ratio (OR), Risk Ratio (RR) and Risk Difference (RD) per SOC/PT using Genmod procedure
Jagadish Katam, Princeps

This paper talks about the displays, where we are supposed to present the OR, RR and RD per SOC and PT across the treatment using the genmod procedure. Sometimes, using this genmod the OR, RR and RD will not be created per SOC and PT when there are missing/less subjects in any of the treatment group. AE displays will have many SOCs and PTs. If a particular SOC or PT does not have the sufficient subjects across the treatments, the genmod procedure will throw an error and do not generate the OR, RR or RD results/ouputs. In these cases, the macro that I have created will be useful to dynamically handle the situation to produce the results. This macro is based on the convergence criteria. Using this macro, the convergence criteria will be checked using the genmod procedure for each SOC/PT for the OR, RR and RD values and store the convergence criteria in a separate dataset. We will then use this convergence dataset for the SOC/PT per OR, RR and RD to dynamically execute the genmod procedure depending on the convergence criteria there by avoid any such errors. This paper is going to discuss about the sequence of steps that this macro takes to avoid any such errors caused by genmod procedure.

AP-067 : Imputation for Missing Dosing Time in NONMEM PopPK Datasets
Shuqi Zhao, Merck & Co., Inc.

For late stage clinical trials with oral daily dosing regimen, complete dosing history data is generally not available. Following study design, patients get PK samples drawn at clinic sites, but between scheduled visits, patients are expected to take daily doses at home. Both date and time for PK samples are collected, while accurate time as to when the dose is taken for most of the doses is not available. When building NONMEM-ready PopPK datasets, however, both date and time for dosing records are required for calculation of actual relative time from first dose and actual relative time from last dose. Due to incomplete dosing history data, time imputation comes into play. Imputation methods can be different depending on study design and data collection design, but the general rule is that the relative time of dose to PK sample doesn't get changed. This paper provides a step-by-step programming guide on how to impute time for dosing records when actual time information is only available for dose prior to PK sample. Topics such as how to unfold dosing records in EX domain into individual daily records and commonly seen data issues in relevant datasets will also be discussed.

AP-071 : An efficient method to review annotation and create bookmark for case report form (acrf.pdf) using SAS program, FDF files, excel and Javascript in Acrobat
Chen-Jung Wu, Firma Clinical Research

A common method to annotate case report form (acrf.pdf) is to use Acrobat comment tool to draw box and add text; however, this method is time consuming. If the case report form is changed due to protocol amendment, it may need another round to annotate manually. Besides, regular SDTM process involves several people to prepare the package including case report form so it always has some inconsistent issues related to typo, font size and font style. Therefore, converting annotation to excel can help to fix typo issue and update wording quickly; comparing the excel with the metadata in xpt and define.xml. Moreover, making use of excel to remap annotation to migrate page to new version of case report form is another efficient method if there are any changes for case report form due to the protocol amendment.

AP-088 : Using SAS for Forest Plots in AMNOG Meta-Analysis Reporting
Jian-An Lu, Independent SAS Consultant
Jinwei Yuan, Relypsa Inc, A Vifor Pharma Group Company

Making forest plots is an indispensable part of presenting study results in a Benefit Dossier for the AMNOG assessment. It allows one to intuitively and quickly display overall efficacy and safety outcomes in a comprehensive manner. This is particularly important given the scope of the assessment, involving a huge volume of statistics for multiple endpoints, multiple studies and multiple subgroups at both the individual and meta-analysis levels. This paper demonstrates, step by step, how to utilize PROC SGPLOT combined with SAS Annotate Facility from Graph Template Language (GTL) in SAS® 9.4 to create table-like graphic displays that present all required statistical information in one place as per the IQWiG guidelines. We hope this approach can serve as a primer, helpful for the sophisticated forest plots required in the AMNOG submission.

AP-092 : PROC IMPORT and more. Or: when PROC IMPORT just doesn't do the job.
David Horvath, PhilaSUG

PROC IMPORT comes in handy when quickly trying to load a CSV or similar file. But it does have limitations. Unfortunately, I've run into those limitations and had to work around them. This session will discuss the original CSV specification (early 1980's), how Microsoft Excel violates that specification, how SAS PROC IMPORT does not follow that specification, and the issues that can result. Simple UNIX tools will be described that can be used to ensure that data hilarities do not occur due to CSV issues. Recommendations will be made to get around some of PROC IMPORT limitations (like field naming, data type determination, limitation in number of fields, separator in data). CSV, TAB, and DLM types will be discussed.

AP-095 : Lazy Programmers write Self-Modifying code OR Dealing with XML file Ordinals
David Horvath, PhilaSUG

The XML engine within SAS is very powerful but it does convert every object into a SAS dataset with generated keys to implement the parent/child relationships between these objects. Those keys (Ordinals in SAS-speak) are guaranteed to be unique within a specific XML file. However, they restart at 1 with each file. When concatenating the individual tables together, those keys are no longer unique. We received an XML file with over 110 objects resulting in over 110 SAS datasets our internal customer wanted concatenated for multiple days. Rather than copying and pasting the code to handle this process 110+ times, and knowing that I would make mistakes along the way - and knowing that the objects would also change along the way, I created SAS code to create the SAS code to handle the XML. I consider myself a Lazy Programmer. As the classic "Real Programmers..." sheet tells us, Real Programmers are Lazy. This session reviews XML (briefly), SAS XML Mapper, SAS XML Engine, techniques for handing the Ordinals over multiple days, and finally discusses a technique for using SAS code to generate SAS code.

AP-099 : Unix: A Best Friend of SAS
Rakesh Kanojia, Mumbai university

SAS® is a wonderful programming tool but when it uses with Unix commands, its power exceeds. By using simple Unix command, SAS program can be automated easily. Imagine we have to run the model every month and model's output have the same name of previous name. It will override the output which we don't want to be, in such scenario Unix command can very handy to create the new folder, rename the output name. Unix can also be handy where we need to use the file which are latest created. Many more things can be done from taking the backup to moving the files and renaming the files. So, we called Unix a best friend of SAS®.

AP-101 : Re-pagination of ODS RTF Outputs to Automate Page Breaks and Minimize Splits Across Pages
Jeremy Gratt, Modular Informatics LLC
Aditya Tella, Seagen

One of the more tedious tasks of clinical study report table creation involves planning out page breaks for RTF outputs. To improve the review of table outputs, there are groups of rows we would like to keep together to avoid splitting across pages. For presentation of individual parameters our goal is to group statistics or categories within a parameter on the same page; for by-visit displays we group rows by visit or by visit/parameter; and for multi-level categorical displays such as AEs and conmeds we try to minimize the splits within hierarchical tiers. Avoiding these inconvenient splits requires manual work on the part of programmers to generate organized outputs that aid communication. An automated approach to page breaks would relieve programmers of this manual task and improve the quality of outputs. We present background, implementation experience, and programming details for a SAS pagination macro that employs an algorithm to automate page breaks while minimizing in-group splits across pages. The macro reads a standard ODS RTF file as input and generates a re-paginated RTF along with a QC data set. The SAS macro components include an RTF parser to read the input RTF file, an algorithm to calculate expected word-wrapping, a page break algorithm to output table rows, and a verification step to compare and confirm the output RTF content matches the QC data set.

AP-103 : Developing SAS Macro to Find Subject's Latest Available Date from Entire Database
Raghava Pamulapati, Merck
Sandeep Meesala, Merck

In Data Analysis and Reporting, programmers often need to find the subject's latest available date for analysis. To find subject's latest available date, we need to check every possible date variable from the entire study database. In general, to find the latest available date from multiple date variables, programmer needs to either sort the data by using date variable and take the latest date by FIRST.VARIABLE or LAST.VARIABLE or using GROUP BY Statement and MAX Function in PROC SQL. The challenge here is, programmer need to write and execute the above steps for each date variable in all datasets in study database. This process will be time consuming and execution may take more time depending on SAS Procedure or DATA step used while developing the logic, this even makes it hard to debug. This paper mainly focuses on step-by-step process to develop a macro to find subject's latest available date using SAS® variable level metadata with efficient techniques like PROC SQL and CALL EXECUTE statements to reduce the length of code and SAS® execution time. This macro will output the dataset with subject's ID, latest date, dataset names, and variable names containing the latest available date. This macro provides flexibility to users to handle scenarios like, datasets with multiple date variables, selective option to choose only required datasets or/and date variables and select the date from the subset of records.

AP-106 : Bookdown and Blogdown: Deep dive of the R packages and components needed to create documentation and websites.
Benjamin Straub, GlaxoSmithKline

This is a companion paper to Bookdown and Blogdown: Using R packages to document and communicate new processes to Clinical Programming from the Strategic Implementation section. This paper will focus on the programming aspects of implementing and building documentation and websites from the R packages bookdown and blogdown. Readers will be given a deep-dive of the different pieces of code, processes and tools needed to create the respective outputs from the packages. It is hoped that readers have some knowledge of programming in RStudio as well as using version-control software such as GitHub, but efforts will be made to make the content accessible to all levels. The first half of the paper will have a thorough walk through of the code for simple projects that can create simple documentation or websites. The second half of the paper will see this discussion scaled up to more complicated outputs such as blog posts, embedded Shiny apps and videos. Please note that the projects discussed are for a Clinical Programming audience but could be repurposed for other situations. A public GitHub repository has been established for the simple and advanced projects that uses both packages with example code and data sets. A reader with access to GitHub and RStudio could publish documentation or a website by following the steps found in this paper.

AP-117 : Under the Hood: The Mechanics of Various SQL Query Optimization Techniques
Kirk Paul Lafler, sasNerd

The SAS® software and SQL procedure provide powerful features and options for users to gain a better understanding of what's taking place during query processing. This presentation explores the fully supported SAS® MSGLEVEL=I system option and PROC SQL _METHOD option to display valuable informational messages on the SAS® Log about the SQL optimizer's execution plan as it relates to processing SQL queries; along with an assortment of query optimization techniques.

AP-120 : Programming Time Varying Concomitant Medication Covariates in NONMEM PopPK Dataset
Dan Xiao, Merck
Jing Su, Merck
Shuqi Zhao, Merck
Xiang Liu, Incyte Corporation

Concomitant medication (ConMed) covariates are very important time dependent variables for population PK (PopPK) modeling. Since concomitant medication may change the absorption, distribution, metabolism and excretion of the investigated medicine, it is required to construct the concomitant medication covariates and show the usages of ConMed before or on the dosing day. In this paper, we introduce three different strategies to program the time varying ConMed covariates into NONMEM PopPK dataset. The SAS code run under SAS 9.3 or 9.4. Intended for audience has some NONMEM PopPK knowledge.

AP-131 : An R Shiny Application: Validated Analysis Results Using R and SAS
Abu Zafar Mohammad Sami, mainanalytics GmbH

The production of a validated statistical manuscript in an accurate and timewise way is a very challenging task when analyzing a clinical trial. Due to its sustainability and reliability, SAS has been the programming tool of choice for many years. However, R is gaining importance in this arena because of the flexible usage of packages. The scope of this application tool is to interactively produce the validated industry-standard TLGs for various statistical methods such as regression and survival analysis, two-way tests and HY's law. The user can interactively choose parameters to produce the desired outputs simultaneously through R and SAS scripts and thus cross-check the results from the same input data. In this way, a platform-independent and accurate result is achievable in a quick manner. Moreover, the possibility of executing SAS code from the R Platform enhances data traceability.

AP-132 : 100% Menu Driven SAS Windows Batch Job Runner and Log Analyzer
Sathish Saravanan, Quartesian Research Private Limited
Kameswari Pindiprolu, Quartesian Research Private Limited

SAS is the widely used software in clinical research industries for clinical data management, analysis & reporting. Every clinical study requires numerous SAS programs to be developed for generating Datasets (D), Tables (T), Listings (L) and Graphs (G). The program rerunning is evident due to programming specification changes for DTLG and Ongoing Studies. At the same time for each and every rerun, it is mandatory to ensure the logs of every submitted program is free from SAS defined potential messages. Various methods have been discussed about bulk rerunning of programs and analyzing the log and none of them is menu driven approach. Companies are investing so much of money to develop and maintain such a menu driven tool that performs bulk rerunning of the SAS programs along with the log analysis. The purpose of this paper is to provide a Microsoft Excel VBA based tool that is 100% Menu Driven to perform bulk running of the SAS programs along with Log analysis, process summary and easily accessing the particular log file from the several files. It also creates a SAS program file with all the selected programs to be used for the later use. The proposed approach needs only 4 inputs from the user as follows. 1. Program Location 2. Log location for the log files to be saved 3. LST output location for the LST output file to be save 4. Selection of the programs to run

AP-151 : Using PROC FCMP to Create Custom Functions in SAS
Keith Shusterman, Reata
Mario Widel, Reata Pharmaceuticals Inc.

PROC FCMP is a powerful but somehow underused tool that can be leveraged for creating custom functions. While macros are of vital importance to SAS programming, there are many situations where creating a custom function can be more desirable than using a macro. Particularly, custom functions created through PROC FCMP can be used in other SAS procedures such as PROC SQL, PROC PRINT, or PROC FREQ. As an alternate approach to SAS macros, PROC FCMP can be used to create custom functions that process character and numeric date and datetime values. In this presentation, we will share two custom functions created using PROC FCMP. The first creates ISO 8601 character values from raw date inputs, and the second creates numeric SAS dates and datetimes from ISO 8601 inputs. We will also show how these newly created custom functions can be called in other SAS procedures.

AP-156 : "Packages" in R for Clinical Data Analysis, Let's Demystify
Nagalakshmi Kudipudi, Quartesian Research Private Limited

R is a high-level, interpreted and an open-source programming language-based software which is known for statistical analysis and graphical reporting. R offers a Plethora of 'Packages' for data import, wrangling, visualisation and analysis. The packages are extensions to the R statistical programming language and fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, sample data in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as CRAN (the Comprehensive R Archive Network), GitHub. GitHub, Inc. provides access control and several collaboration features such as bug tracking, feature requests, task management, continuous integration and Wikis for every project. Its more advanced professional and enterprise services are commercial. Free GitHub accounts are commonly used to host open-source projects. A group of packages called the Tidyverse, which can be considered a "dialect of the R language", is increasingly popular in the R ecosystem. Packages like data.table, dplyr, tidyr and knitr etc. containing functions which can efficiently manipulate complex data sets and create TLF's. The scope of this paper is to present the RStudio program that demonstrate the relevant 'Packages' in R for generating the summary tables of Clinical Data. Healthcare represents an exponential growth in the volume of data collected in ever more elaborate Clinical Trails. To meet these demands, Clinical Data Scientists are increasingly choosing open-source solutions to leverage the active open-source communities of experienced developers and statisticians for Clinical Data Analysis.

AP-174 : Overlay Graphs: A Powerful Tool to Visualize and Simplify Complex Data
Manasa Gangula, Cytel Inc.

SAS® provides an extensive set of graphs for different needs and overlaying these graphs can be powerful in visualizing complex data. By overlaying graphs, we can display the relationship among multiple parameters in a single snapshot so the reviewer will get an explicit idea about the data without any further processing. In this paper we will be looking at different ways to create an overlay of vertical bar chart with scatter plot and bubble plot using sample data. We will be using Graph template language (GTL) and SAS/GRAPH procedures to achieve the result. Also, we will briefly discuss the limitations of using PROC SGPLOT procedure in this scenario.

AP-184 : Thwarting Nasties
David Franklin,

When starting any program, some form of planning is involved, then the actual writing of the code, then finally the checks to see that it producing what was asked. This paper takes a brief and lighthearted look at each of these stages, providing a few tips for avoiding some of the many pitfalls and gives a few pieces of SAS code that are useful in development of your program, hopefully avoiding having to say that dreaded phrase, "I have to write it from scratch!".

AP-195 : How to ingratiate yourself with the "old-timers"
Kirsty Lauderdale, Clinical Solutions Group
Kjersten Offenbecker, GSK

Ever doubted that your program code would stand up to the review of some of the great* programmers, ever wondered if your logic process makes sense and that your programming style would pass the "white-glove" test. This paper will help guide you through some of the steps that long time programmers use to ensure their program codes are readable, transferrable and robust. Start your journey to becoming a well-known name in the industry by following some of these handy tips and tricks, and instantly elevate your programming code to previous unattained levels. * Definition of great may be self-proclaimed.

AP-199 : Should I Wear Pants in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures that Leverage SAS Arrays
Troy Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.

Decision tables operationalize one or more contingencies and the respective actions that should be taken when contingencies are true. Decision tables capture conditional logic in dynamic control tables rather than hardcoded programs, facilitating maintenance and modification of the business rules and decision rules they contain-without the necessity to modify the underlying code (that interprets and operationalizes the decision tables). This text introduces a flexible, data-driven SAS® macro that ingests decision tables-maintained as comma-separated values (CSV) files-into SAS to dynamically write conditional logic statements that can subsequently be applied to SAS data sets. This metaprogramming technique relies on SAS temporary arrays that can accommodate limitless contingency groups and contingencies of any content. To illustrate the extreme adaptability and reusability of the software solution, several decision tables are demonstrated, including those that separately answer the questions Should I wear pants and Where should I travel in the Portuguese expanse? The DECISION_TABLE SAS macro is included and is adapted from the author's text: SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality (Hughes, 2019).

AP-204 : R Reproducibility Best Practices for Pharma
Phil Bowsher, RStudio Inc.
Sean Lopp, RStudio

RStudio will be presenting an overview of best practices for R reproducibility for the R user community at PharmaSUG. This talk will share resources that will help guide R users and admins in reproducibility methods for R projects in clinical environments. This is a great opportunity to learn about approaches for best-supporting reproducibility for R in regulated environments. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to managing R and packages for reproducing the code environment in a clinical workspace. This talk will break down various strategies to improve the reproducibility of your data science projects.

AP-207 : GCIG Criteria, Programming Makes It Easy
Liming Xie, BeiGene
Wei Wei, BeiGene

Ovarian cancer is a heterogeneous and genomically unstable cancer and as technologies advance, there is a great potential to incorporate biomarkers into ovarian cancer diagnosis, prognosis, and therapy choices. Carcinoma antigen-125 (CA-125) is the most commonly used biomarker in ovarian cancer and has been examined in many pivotal clinical trials for drug approval in ovarian cancer. The Gynecological Cancer Intergroup (GCIG) proposes the criteria for CA-125 response and progression and specified the situations which criteria could be used in. In addition to Response Evaluation Criteria in Solid Tumors (RECIST) criteria, GCIG CA-125 response and progression criteria have become more and more popular in clinical trials of ovarian cancer and usually serve as one of the secondary endpoints. However, GCIG criteria are more complicated than RECIST criteria and not programming-friendly, especially when combined with RECIST criteria Version 1.1. Thus, we would like to share our experience and developed tools on implementing GCIG CA-125 response and progression criteria, and propose a reliable solution from programming point of view.

Applications Development

AD-002 : Macro Development from Design to Implementation Applied to Occurrence Data Frequency Analysis
Chengxin Li, Regeneron Pharmaceuticals Inc.
Michael Pannucci, Regeneron Pharmaceuticals Inc.
Toshio Kimura, Regeneron Pharmaceuticals Inc.

This paper demonstrates macro development from design to implementation detailing the macro design concept including dataset requirements, parameterization, statistical analysis, reporting and utility functions. While this paper uses occurrence data analyses as the motivating example, the concepts are generally applicable to macro development for any statistical analysis. The paper contains two main sections: 1) General macro design concepts and 2) organization and flow of macro implementation modules with specific considerations for occurrence dataset structure (OCCDS) analysis components. The macro design concept includes key decisions regarding the scope of the macro. One design concept is to leverage the analysis-ready principle thereby requiring analysis-ready datasets such as ADaM OCCDS. As such, the macro will not include data building steps. Another macro design decision is parameterization. This macro design aligns input parameters with analysis results metadata (ARM); therefore, not only will the macro output ARM as a description of the analysis, it will use ARM as the specification and input that drives the analysis. The organization and flow of the macro implementation modules will then be discussed along with explanation of core methods utilized in the macro including initiation, checks, analysis, formatting, reporting and clean up. OCCDS frequency analysis specific considerations will be explained as it applies to the various design considerations and implementation modules. By following the depicted methods, similar applications for different statistical analyses can be developed.

AD-036 : Annotating CRFs More Efficiently
Noory Kim, SDC
Bhagyashree Shivakumar, SDC

Annotating an SDTM CRF can be tedious and time-consuming when done entirely by hand. How can we simplify this process and carry over changes from one draft of the CRF to the next? This paper discusses how to (1) generate annotation metadata and (2) transfer the annotations from Excel to PDF and back again, making updates along the way. This approach allows us to position CRF annotations by (1) using drag-and-drop movements within a PDF editor to place annotations rapidly where they are needed and/or (2) assigning precise numeric coordinates within Excel to place annotations precisely and line them up neatly. We can also update other attributes such as font size. This paper assumes the reader has access to (1) a PDF editor that can import/export XFDF files and (2) SAS XML Mapper; it does not assume knowledge of programming languages other than SAS.

AD-054 : How to Translate RTF Documents
Jundong Ma, Dizal Pharma
Zhiping Yan, Dizal Pharma

Sometimes we need to translate RTF documents from source language to target language. There're two options for this task: one is to translate all the datasets and programs and rerun all the programs to generate all the translated RTF documents, the other is to translate the RTF documents directly. The 2nd way is more efficient and less prone to errors. Sometimes it's the only way when we have RTF documents provided by a vendor with no datasets and programs available. This paper introduces a SAS® macro tool that utilizes a user-defined dictionary to replace all the source language texts with target language texts. We only used it to translate RTF documents from English to Chinese. It can also be used for other languages translation.

AD-060 : A Diagnostic Report for SAS Submission Programs
Yang Gao, Merck & Co., Inc.
Christine Teng, Merck & Co., Inc.

To submit a data package to a health authority for product approval, clinical programmers must ensure analysis programs meet the regulatory compliance requirements and good business practices. Each company has own standard operating procedure (SOP) and checking tools that assist in the development of regulatory-compliant SAS programs. The checking tools are used to ensure these SAS programs are ready for internal and external audits. This paper presents a SAS utility macro with multiple features to provide detailed information for each SAS program. This SAS utility macro may provide supplemental features for existing checking tools. The key features presented include: (1) checking correspondence between SAS programs and associated log, lst, programs (in submission ready format) as well as analysis outputs, (2) listing program's last modified date (system date), version date and last revision date in production area to identify inconsistent dates, (3) listing the output counts (SAS log, lst, etc) for each SAS program. The outputs from this utility macro will be summarized in an excel workbook using SAS ODS. This utility macro facilitates the process of identifying programming compliance issues. It substantially reduces the resource and time for reviewing potential issues for submission programs during the analysis and reporting lifecycle. This excel diagnostic workbook provides the compliance status of SAS programs in an organized format.

AD-064 : Harness analytical strengths: SAS, R, Python Web Services
Sandeep Juneja, SAS Institute Inc
Ben Bocchicchio, SAS Institute

Most common languages used for Analytics are SAS, R and Python. Each language has its strengths to carry out variety of Analytical tasks and its own respective syntax. Because of the different syntaxes, it is very difficult to find individuals with knowledge of all three languages. It would be very power if all three languages can be used from a programming language and platform of your choice. Web Services can facilitate this capability. A variety of powerful tasks can be implemented in the form of Web Services and can be called from various applications through the programming language of choice. This paper will introduce how REST web service endpoints can make a call to any of the three languages (SAS / R / Python) for an analytical task. The REST web services are deployed on a Linux Virtual Machine using python pyflask and can be consumed from any platform using a programming language of choice.

AD-079 : Analysis and Reporting in Regulated Clinical Trial Environment using R
Peikun Wu, Merck
Uday Preetham Palukuru, MERCK
Sarad Nepal, Merck
Yiwen Luo, Merck
Yilong Zhang, Merck & Co. Inc.

Analysis and reporting(A&R) of clinical data is an essential part of clinical trials. The use of R is popular in various stages in drug development. In particular, there are growing interest in analyzing clinical trial data using R and its associated software environment under a regulatory environment. To ensure the analysis and reporting is regulatory compliance, a formal A&R process using R is required to be defined within an organization. In this paper, we proposed an A&R workflow for data analysis and reporting using the R package folder structure. This proposed workflow will cover aspects of development and validation for Table, Listing and Figure (TLF) using R within a regulatory environment. The proposed workflow is illustrated by using a pilot study from Clinical Data Interchange Standards Consortium (CDISC). We hope this proposed A&R workflow can help build industry guidance on utilizing R for data analysis and reporting for clinical trials within an organization.

AD-100 : A metadata driven approach to mock TFL generation
Anne Petersen, Novo Nordisk A/S
Cristina Boschini, Novo Nordisk A/S
Tobias Kröpelin, Novo Nordisk A/S

At Novo Nordisk we developed a tool to create specifications for Tables, Figures and Listings (mock TFLs) utilising the same tools and metadata as when the actual TFLs are produced. The mock TFL generator consists of a repository of TFL templates in .txt or .png format (called unique shells) as well as output metadata, including instructions on content and layout for TFL programming. The mock TFLs are created by combining the unique shells with output metadata, such as titles, footnotes and programming instructions through a SAS script. The final mock TFL in .docx format is generated by the same tool used to generate the actual trial TFL. The main advantage of this tool is the single point of definition for the mock TFL and the actual trial TFL. This assures that the programmer only needs to maintain a single repository containing output and programming metadata. Additionally, the simple approach eases alignment across trials and constitutes a good starting point for the set-up of new trials. Finally, this ensures an easier review process, thereby increasing output quality and saving time during trial conduct.

AD-109 : Solution when Global Pandemic resurrects paper-based PROs creating collection headaches for Data Managers
Terek Peterson, YPrime
Hayley Jeffery, YPrime
Justin Jaeschke, YPrime

Unforeseen COVID-19, country wide lockdowns and site closures impacted data collection processes, shifting eCOA data collection to paper assessments. This paper explores challenges processing urgent paper assessments estimated at 6 months to complete. By using dynamic worksheets, integrated with each study's SQL database, allowed accurate and efficient data entry. eCOA systems employ parent/child relationship database models, allowing dynamic question entry in logical order. The solution mirrored this approach for paper that would maintain data integrity and provide a logical/simple tool for Data Management Drop down selection boxes were dynamically updated based on previously selected choices, allowing the team to have the correct options available at each step. Translation options were also implemented, allowing the Data Manager to select the applicable language for each assessment, reducing the risk of inaccurate data entry. Existing data correction processes were leveraged including implementing a verification step. As all data entered must be verified prior to committing, the tool processes the data to a staging table which can then be reviewed and validated by another Data Manager to ensure complete accuracy. An unexpected benefit included 24/7 data entry and this, along with the translation capabilities and simple process, ultimately led to the processing of large volumes of paper assessments in a fraction of the time; and cost; than initially expected.

AD-112 : An Application of Flexible Operation and Position Page Number for RTF Outputs
WenHsuan Wu, Pharmaceutical Co., Ltd

In clinical trials, Rich Text Format (RTF) is a general format that is used to create outputs like tables, figures, and listings (TFL). After preparing these outputs, combine multiple RTF outputs into one, then maintain the original content, title, footnote and number of pages is an arduous challenge for programmers. However, there are various methods to combine RTF outputs and there are no specific information or details on the operations of the page number syntax. Hence, this paper summarizes an application of page numbering syntax in RTF documents, including strength and weaknesses. In the meantime, to overcome all the drawbacks of the current page numbering method, this paper presents a page number positioning (PNP) method that uses bookmarks to be positioning the page number accurately with editing to RTF outputs, it can keep the operation flexible in various SAS generated RTF files. Break through the restrictions on other methods, PNP can keep the combined title and footnote completely when combining RTF outputs.

AD-133 : Archival Tool: Automation using PowerShell
Sumit Pratap Pradhan, SYNEOS HEALTH

In clinical trials, patient data is organized into standard format using SDTM. Further, Statistical analysis is performed using Analysis dataset (ADaM) and reporting (table, listing and figure). Several deliveries (e.g. dry run, draft run etc.) are performed for each clinical study. For each delivery, Biostat team performs following task at minimum: 1. Datasets (SDTM, ADaM) and/or TFLs (tables, listings and figures) are sent to sponsor 2. Backup is created to save files related to each delivery. For example - Specification, Programs, Outputs (datasets and reports), log files etc. 3. Upload documents on Trial Master File (TMF) which is needed for effective monitoring and supervision (audit). All above mentioned tasks can be performed manually but it will consume lot of unnecessary time and consistency can't be ensured. For example - Programmers might choose different set of files as backup based on their judgement. Automation is the best Solution for this scenario. In this paper, Acrhival Tool will be explained which has been developed using PowerShell. PowerShell is a cross-platform task automation and configuration management framework, consisting of a command-line shell and scripting language. GUI can be defined easily to take user input using PowerShell. Archival Tool has following features: • Tool has three modules- Delivery, Archive and TMF • Delivery module creates a package for each sponsor delivery • Archive module creates a back-up for each delivery • TMF module creates a collection of zip files and a supporting table of contents (TOC) that can be directly uploaded to the TMF

AD-162 : Generating Synthetic CDISC Clinical Trial Data.
Jose Lacal, NIHPO, Inc.

We present a Python-based platform that generates Synthetic CDISC Clinical Trial Data. The platform programmatically generates realistic synthetic patients ("SynthPatients") that have a complete synthetic health record ("SynthPHR"). SynthPatients live in a series of geographically-accurate synthetic cities ("SynthCity"). SynthPatients are then randomly selected to participate in synthetic Clinical Trials ("SynthTrials"). Platform users can define the parameters of a clinical trial and the platform generates all SDTM domains for the desired number of subjects.

AD-164 : R-Shiny based customized data viewer and search engine
Hrideep Antony, Syneos Health USA
Aman Bahl, Syneos Health

The number one reason why the Google search engine is most popular among all search engines is its ability to deliver millions of results in less than 0.19th of a second. In other words, users can view relevant information almost on a real-time basis. However, when it comes to the study data, the database viewers that are currently available do not facilitate the real-time search that produces results interactively, as and when the users search for a keyword within the data. This paper introduces an R-Shiny application that not only creates the results dynamically but also facilitates search at a database level as well as at a variable level. The application allows the user to select the variables that need to be displayed. It also gives the user control over the number of records to be displayed beside the ability to search the data as shown in figure 1 below. This search application supports multiple source formats such as .csv, .xls, .sas7bdat, etc. This paper will cover step by step methodology on how this application is built using R-Shiny and also provides in detail, the R-Shiny programming functions that are used to facilitate the real-time search.

AD-168 : Extracting Data Standards Metadata and Controlled Terminology from the CDISC Library using SAS with PROC LUA
Lex Jansen, SAS Institute Inc.

The CDISC Library is the single, trusted, authoritative source of CDISC Data Standards metadata and Controlled Terminology. It uses linked data and a REST API to deliver CDISC Data Standards metadata and Controlled Terminology in a machine-readable format to software applications that automate standards-based processes. This paper shows how metadata can be extracted in SAS through the CDISC Library API using a REST API request with PROC HTTP. PROC LUA will be used in SAS to manage the PROC HTTP requests. PROC LUA will also be used to parse JavaScript Object Notation (JSON) response strings with a JSON library to extract data before storing them in SAS data sets. The last part of the paper will focus on loading the extracted data sets into the Data Standards and Controlled Terminology areas of the Clinical Management module of SAS Life Science Analytics Framework.

AD-175 : The SASSY System: Making R Easier for SAS Programmers
David Bosak,

The sassy system is an R meta-package that make R easier for programmers whose primary experience is with SAS® software. These functions provide the ability to create data libraries, format catalogs, data dictionaries, and a traceable log. The system includes a datastep function that recreates the most basic functionality of the SAS® data step. The package also includes reporting capabilities reminiscent of those found in SAS®, and can output reports in text, rich-text, and PDF file formats. All combined, these functions allow you to write programs with an overall flow and thought-process that is more similar to a SAS® program. This paper will provide a brief overview of the system.

AD-176 : Open-Source Development for Traditional Clinical Reporting
Mike Stackhouse, Atorus Research
Nathan Kosiba, Atorus Research

As the pharmaceutical industry continues to embrace open-source solutions, gaps between the open-source ecosystem and the requirements of traditional clinical reporting pipelines continue to present themselves. These gaps are understandable, as regulatory requirements and decades of tradition cannot match up to the speed and freedom of software designed by consensus and open for review. Furthermore, with open-source as a novel concept within clinical reporting, contributors and consumers within this arena are sparse. Therefore, open-source solutions are not currently tailored to the specific needs of the clinical reporting pipeline. The way to close these gaps is not to wait or rely on the open-source contributors outside industry to close them for us, but rather to close them ourselves. This presentation will discuss the ongoing open-source work at Atorus, and how collective open-source development within the pharmaceutical industry can benefit both individual organizations and the industry as a whole.

AD-179 : Macro to Compare Titles and Footnotes in Produced TLF and Corresponding Shells
Igor Goldfarb, Accenture
Ella Zelichonok, Naxion

The goal of this work is to develop a macro that automates an important part of the final review process of TLF (tables, listings and figures). Specifically, the proposed macro compares titles and footnotes which are residing in the approved shells with titles and footnotes in the corresponding actual outputs and identifies the differences. The proposed tool can significantly simplify review work for biostatisticians and lead programmers who have to verify that TLF were generated correctly according to the shells. Final review of the produced TLF represents an important task in a process of statistical programming. Comparing and making sure that the actual outputs were created strictly in accordance with the approved shells' document (typically MS Word file) is a tedious process requiring scrupulous work that is subject to human errors. The proposed macro (developed in Excel VBA) automates this process. It reads the shell document and creates a SAS®-readable ordered table of content (TOC, Excel) in a matter of seconds. As a second step the developed macro reads the generated outputs (single Word file or set of them) that are reviewed and identifies their titles and footnotes. During the third stage the macro compares the two sets of information and marks the distinctions. Any further updates in the shell document and, in turn, in the created TLF, can be produced for review another time simply by rerunning this macro. Generally speaking, this macro can act as Version Control tool for titles and footnotes in created TLF and their hells.

AD-185 : Generating the Demographics and Adverse Event Tables Using Excel and VBA
David Franklin,

Microsoft Excel is not regarded as a package to do statistical analysis for a number of reasons, namely a spreadsheet is not technically advanced enough to ensure proper data/documentation management and workflows, and statistical functions are lacking or not beyond the capabilities of serious statistical computational needs. Despite this, Excel is still the leading software package used to do statistics. This paper looks at two distinct table types, the Demographics and the Adverse Event tables; and three distinct categories of data, the categorical, continuous and occurrence categories, and shows that Excel with VBA can indeed be used to produce these tables. While not used for publication or serious reporting, these could be used as a way to look at the data before 'official' reports are available. Along the way, we will also see how an Adverse Event Timeline Plot may be created.

AD-205 : Productionalizing Shiny Deployments
Phil Bowsher, RStudio Inc.
Sean Lopp, RStudio

More companies have begun migrating Shiny into production environments. In this talk, RStudio will discuss the various ways Shiny can be deployed to these clinical environments. RStudio will be presenting an overview of best practices for productionalizing Shiny deployments for the R user community at PharmaSUG. This talk will share resources that will help guide R admins and users in deploying Shiny with various supporting tools like continuous integration/deployment. This is a great opportunity to learn about approaches for deploying Shiny into production environments. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to recent developments for using tools like git with Shiny in your R clinical workspace as well as discuss other paths that can aid in managing Shiny deployments.

Artificial Intelligence (Machine Learning)

AI-010 : Integrating Digital Images for Ophthalmology Trials using Machine Learning Techniques for Image Analysis.
Shubhranshu Dutta, HCVSD

Ophthalmology related clinical trials involve high level of risk due to issues related to patient safety. With the advancement of imaging techniques, there is an abundance of patient data available for research. For every patient, an ophthalmology visit consists of various data points collected related to Eye Visual Acuity, Contrast Sensitivity, Eye Pressure, Visual Field Tests and Retinal scans with the Optic nerve. Measurements are gathered and recorded in every visit, data is processed and mapped for clinical research studies. While, the rest of the vision related parameters may be directly input by the physician, there is a lot of imaging data that can be standardized and made available for researchers. My paper would be focused on acquiring retinal scans data for patients, analyzing images, detecting abnormalities and integrating them in clinical trials. Retinal images will be analyzed using machine learning techniques to identify abnormalities. Tools and Techniques for image analysis will be discussed. Retinal abnormalities will be identified and change from baseline will be measured. Methods for analyzing 2-D Fundus imaging and techniques for optical coherence tomography (OCT) imaging will be reviewed. Approaches to data mapping, standardization and harmonization of imaging data in the context of clinical trials will be discussed in context of challenges and benefits. As an extension, technological advancements to remote patient monitoring using digital imaging technology will be discussed. Thanks to technological advances with increasing digitization, image analysis techniques show tremendous promise in patient care. All that we need is: "Think beyond boundaries and innovate".

AI-012 : Application to Automate Clinical Study Report Generation using AI
Farha Feroze, Symbiance Inc
Ilango Ramanujam, Symbiance Inc

Creating Clinical Study Report (CSR) highly manual and time consuming, which require other medical writer to perform the quality control of numbers on the Table, safety narratives and information from other documents. Significant amount of information to the CSR comes from other sources such as Protocol, SAP, Safety Narratives, In-text tables etc. The appendices section is constructed from CRF, TLF etc. Automating the CSR creation by utilizing emerging technology such as machine learning and Natural language processing (ML/NLP) will effectively reduce the manual efforts. This paper will discuss about one such Algorithm of Machine learning which was implemented in a tool. The tool is designed to produce pre-filled CSR with information from Protocol, SAP and other sources in the respective sections of ICH-E3 guideline which could save 60%-70% of the time for the Medical writers. They can focus on discussion points and interpretation of study results.

AI-049 : How I became a Machine Learning Engineer from a Statistical Programmer
Kevin Lee, Genpact

One of the most popular buzz words nowadays in the technology world is "Machine Learning (ML)." Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes. This is leading many organizations to seek experts who can implement Machine Learning into their businesses. The paper will be written for statistical programmers who want to explore Machine Learning career, add Machine Learning skills to their experiences or enter a Machine Learning fields. The paper will discuss about personal journey to become to a Machine Learning Engineer from a statistical programmer. The paper will share my personal experience on what motivated me to start Machine Learning career, how I started it, and what I have learned and done to be a Machine Learning Engineer. In addition, the paper will also discuss the future of Machine Learning in Pharmaceutical Industry, especially in Biometric department.

AI-140 : Virtual human organs in clinical trails
Sumeet Subhedar, Covance

Today, technology has already shown its promising outcomes in different fields, and was making its way in the field of drug development (data collection and analysis). But the recent pandemic had bought the world on the its knees and there was an urge for an effective vaccine/antivirals in fast and cost effective way. Today, we do have a solution available, but does it mark a beginning of wide usage of a technology: Virtual organs/body replication using computer simulation. In simple words, it is replacing actual humans with computer human. Developing a tool for different data collection points that are important for any drug research, example Demographics, will be a challenge, but seems to be achievable task in upcoming years. The presentation will focus on certain aspects that can be visualized if virtual organs or humans take over the actual humans and certain examples to explain different scenarios. "Virtual" is one of the prime word that have been recognized globally, used in the pandemic, to solve some or the other purpose, and the best part being we as humans have adapted it so fast that it seems to be a possibility to adapt the "Virtual patients/organs" as well. But, there are pros and cons for every new technology we apply, that too, on the data which is dealing with human life. Theoretically, it appears like a cake walk, but whether advancing technology is the final solution for speedy and cost effective drug development OR a traditional human touch is also required?

AI-163 : Random Forest Machine Learning Application using R Shiny for Project Timeline Forecasting
Hrideep Antony, Syneos Health USA
Aman Bahl, Syneos Health

Machine learning models can be used to build very powerful tools to facilitate decision-making, especially when the outcome is dependent on numerous and varying factors. This paper introduces an R-Shiny based application called "Project Success Forecast' that can predict the possibility of a project's success in a clinical trial environment, using a random forest machine-learning algorithm. This application allows the user to enter real-time information and predict the rate of success of the project undertaking. The rate of success is dependent on the various input factors that the user can select, such as deliverable type, number of resources, number of days, etc. as shown in figure 1 below. The greater the value of success prediction, the higher is the likelihood of project success. This application helps the team to re-assess the project resources or timelines even before the actual initiation of the project, leading to smoother project execution. This paper covers step by step methodology on how this machine-learning application is being built using R-Shiny and also provides in detail the usage of the random forest model for machine learning algorithms.

Data Standards

DS-014 : CDISC ADaM Phases, Periods, and Subperiods: A Case Study
Jennifer Fulton, Westat

Many typical clinical studies, especially Phase 1 or 2, are comprised simply of Screening, Treatment, and Follow-up time periods. But occasionally the Treatment portion of the study needs to be broken down further into analysis periods, and even sub-levels within one or more of the analysis periods. Two examples are crossover studies, and studies that include a break or "wash out" period between doses of study medication. If it is important, for example, to know what the study circumstances were at the time when a particular adverse event started, then the CDISC ADaM permissible Phase, Period, and Subperiod variables should be utilized. This paper will present a case study on a real clinical trial that presented interesting challenges for the correct implementation of the ADaM guidelines for these variables. In this study the route of administration was of interest, rather than comparing study drugs, and the route changed for the same subject from one set of visits to the next. Topics will include: key ADaM variables that come into play, ADSL variables and how they relate to other ADaM Basic Data Structure (BDS) domain variables, and SAS code to conquer some of the difficult or unusual obstacles. The paper assumes a basic knowledge of CDISC ADaM concepts.

DS-033 : Neoadjuvant and Adjuvant Oncology Clinical Trials and Considerations for Designing the Subject Level Analysis Dataset (ADSL)
Madhusudhan Papasani, Merck & Co., Inc.
Swathi Kotla, Merck & Co., Inc
Elisabeth Pyle, Merck & Co., Inc

Current ADaM classes (ADSL, BDS, OCCDS) support most cases of analysis datasets involving various study designs. However, the implementation of ADaM standards in complex study designs such as neoadjuvant and adjuvant oncology clinical trials can be challenging. Because of different treatments in different periods, and possibleperiod baseline definitions and treatment emergent flags, one must carefully design Subject Level Analysis Dataset (ADSL) and other analysis datasets with a comprehensive understanding of analysis needs with the right interpretation of ADaM standards. In this paper we focus on ADSL design with an emphasis on using the right phasing, period, sub-period and treatment variables for the needs of a neoadjuvant and adjuvant study. Further ADSL metadata describes the derivations and data origins for the variables

DS-037 : Creating Hyperlinks in SDTM Specifications
Carey Smoak, Aerotek

SDTM Specification documents (usually Excel files) can be difficult to navigate due to the large number of worksheets within the Excel file. In this paper, I show two methods for creating hyperlinks that all the user to more easily navigate the Excel file. One method involves using PROC FORMAT and the other method involves using a COMPUTE BLOCK in PROC REPORT. I have only shown two types of hyperlinks, but you can probably think of other hyperlinks that would be useful in navigating SDTM Specification documents. I given you to tools to add your own hyperlinks.

DS-080 : When Should I Break ADaM Rules?
Sandra Minjoe, PRA Health Sciences

The ADaM Model document (ADaM v2.1), Implementation Guides (ADaMIG v1.x) and Occurrence Data Structure document (OCCDS v1.0) include many rules and requirements for a general ADaM dataset and for an ADaM dataset of class Basic Data Structure (BDS). We are often able to follow all of these rules, but it is not uncommon to run into issues here and there. This paper describes a few of the most common issues you might find when trying to design an ADaM BDS dataset, and offers recommendations on how to decide when to break the rules.

DS-081 : Adding Rows to a BDS Dataset: When to Use DTYPE in Addition to Metadata
Sandra Minjoe, PRA Health Sciences

In simple ADaM Basic Data Structure (BDS) datasets, each SDTM row is the basis of one ADaM row. Things get more complicated when you need to add rows, such as to impute missing data or to create an entirely new parameter. In these more complicated cases, metadata is the primary way to explain how these derived rows differ from other rows. Sometimes BDS variable DTYPE (Derivation Type) can also be used, in addition to metadata, to describe a derivation within the data itself. This paper describes how different levels of metadata can be used for explaining derived rows, gives some examples of when to use and not use variable DTYPE, suggests what to do when variable DTYPE isn't appropriate, and provides automatable ways to determine if variables DTYPE and PARAMTYP are being used correctly.

DS-102 : ADaM-like Dataset: how to do big things in a short time
Oksana Mykhailova, Contract Research Organization Quartesian, Ukraine, Kyiv
Andrii Klekov, Quartesian

There are long-term studies where either interim, futility analysis, conference membership with an intermediate result, or several of them are applicable. Let's name these requests as an Ad Hoc request. In the case of such requests for the study, there is no concrete scope of work during the trial and we should do repeat work from time to time. By definition of CDISC, ADaM dataset is a type of analysis dataset that follows to fundamental principles described in ADaM model document. As you know ADaM dataset goes hand in hand with SDTM datasets but frequently SDTMs weren't finished or haven't stable structure and Ad Hoc requests need to be completed in time. In this case, we propose to generate ADaM-like datasets that will help you reduce time, split tasks, control the quality of outputs. This paper describes the concept of ADaM-like datasets, different types of them, their advantages when they should be used and approaches to create them for different types of requests.

DS-130 : Untangling the Knot - Implementing Analysis Results Standards (Using SAS)
Hansjörg Frenzel, PRA Health Sciences

In his 2019 PhUSE paper Chris Decker suggested moving from traditional TFL programming to generating analysis results data. Analysis results standards, which define how results data are stored as well as the associated metadata, support the separation of results generation from results presentation. Some of the benefits include avoiding to create results multiple times for multiple outputs, better documentation, traceability to the underlying data, reusability, and potential for automation when creating outputs for multiple stakeholders. This paper attempts to present the metadata for a possible analysis results standard and to show how they could be used for the automated creation of tables.

DS-142 : Biomedical Concepts - An Emerging CDISC Data Standard: A Stepwise Approach for Building a Library of Data Definitions
Kirsten Langendorf, S-cubed
Johannes Ulander, S-cubed

In the Quarter Four 2020 CDISC Newsletter it was announced that as part of the 360 project "Starting in January 2021, CDISC will kick off the CDISC 360 Implementation focused on Biomedical Concepts and collaborative curation." CDISC is introducing a new standard - the Biomedical Concepts (BCs) - in and iterative manner with the goal of showing specific and tangible value to the community. To get this work started (Collaboration with CDISC) we developed a mining tool that will provide the first draft of the BCs from reading a set of define.xml files. A set of define.xml redacting tools were developed to ensure that those who contribute with define.xml files are comfortable sharing this information. The redacted define.xml files were read into the BC mining tool to provide the superset of information to create BCs to form a first version of a BC Library. In this presentation we will present What is a biomedical concept and what are the benefits in using the BCs for data collection. This includes a demo of how BCs can make CRF/Forms creation more efficient and standardised in a tool that uses Biomedical Concepts. How was the BCs created using the BC mining tool? Is it feasible to kick-start the BC library using define.xml files? What are the challenges and how to proceed from this first step into an approved CDISC standard? What are the tangible values to the community of CDISC users? Can BCs be of use with the systems that we have today?

DS-165 : Validation of CDISC specifications using VALSPEC utility macro
Hrideep Antony, Syneos Health USA
Aman Bahl, Syneos Health

The quality of the SDTM/ADaM specifications has a key role in determining the quality of the Define.xml document, which is a key part of the CRT package submitted to the regulatory agencies for approval. One of the greatest hurdles in the submission process of preparing the Define document is that the majority of the issues in the specifications documents are only identified after the Pinnacle validation, which usually occurs at the proximity of the study database lock after the creation of the SDTM and ADaM datasets. This results in a monotonous task of updating the specifications numerous times before the Define.xml gets an acceptance nod from the Pinnacle 21. The process of identifying the issues in the specification document does not have to wait that long! What if there is a utility that will alert the author about the issues at the early stage of the creation of the specifications. Identifying the issues early in the specification will not only improve the quality of the overall submission but will also reduce the rework that is required to make updates to the specification and the data and further ease the creation of CRT packages without any surprises at a later stage of submission. This paper introduces a SAS utility macro called VALSPEC that will validate the SDTM and ADaM specification documents. The paper covers step by step methodology on the functionality of the VALSPEC utility along with the SAS program code behind the utility.

DS-172 : Rescreened Subjects, Data Collection and Standard Domains Mapping
Vamshi Matta, covance
Savithri Jajam, Covance
Lavanya Peddibhotla, covance

Allowing subjects to rescreen after screen failure is advantageous for subject as well as the Sponsor. One of the challenges while we work on SDTM mapping is how to handle the rescreened subjects data and should we consider the screen failure subjects historic data. This paper describes few challenges and solutions to overcome this problems.

Data Visualization and Reporting

DV-040 : Graphs Made Easy Using SAS Graph Template Language
Manohar Modem, Cytel
Bhavana Bommisetty, Vita Data Sciences

In clinical domain, we used to create graphs using traditional SAS/GRAPH procedures like PROC GPLOT. To create better graphs, SAS developed Graph Template Language (GTL). GTL has many significant advantages over traditional SAS/GRAPH procedures like more control over visual attributes of the graph, less annotation and minimal coding. Statistical Graphics (SG) procedures like PROC SGPLOT, SGPANEL etc., which require minimal coding use GTL to create commonly used graphs. But, writing graphs in GTL has more advantages compared to SG procedures and traditional SAS/GRAPH procedures. GTL code might look a little lengthy and intimidating, but once you understand how to read GTL code, it will be very useful to create not only simple graphs but also complex graphs. The purpose of this paper is to help the programmer understand and utilize various elements of GTL to create wonderful graphs.

DV-061 : Double Waterfall Plot Creation for Comparing Target Lesions Data
Anilkumar Anksapur, Merck & Co., Inc
Raghava Pamulapati, Merck

We usually use a waterfall plot to present individual patient's best overall percentage change and response to a treatment based on a parameter, such as the tumor burdens at each evaluation time point in Oncology Clinical Trials A double waterfall plot accommodates two treatment best overall percentage change and responses for each patient in a single plot for multi-treatment studies and crossover studies. We can also use double waterfall plots for intra-tumor studies. In an Intra-tumoral injection study, each patient has multiple target lesions with some lesions injected with treatment. A double waterfall plot presents the best overall change of injected target lesions and non-injected target lesions separately for each patient. We can use SAS® PROC SGPLOT to create double waterfall plots. The horizontal (X) axis across the plot represent patients in the order of best percentage change and it could base on best response to a specific treatment period or of a type of lesions; VBAR is for vertical bars of each patient's two best percentage changes. The vertical (Y) axis could be for the percent change from baseline, e.g., percent growth or tumor reduction by radiologic measurement. We will include more details on the PROC SGPLOT syntax and plot options in the paper. Furthermore, this paper will also discuss the steps to derive the required variables and dataset pre-processing to categorize tumor response.

DV-078 : Graphical Representation of Clinical Data using Heat Map Graph
Pavani Potuganti, Eliassen Group Biometrics and Data Solutions
Sree Nalluri, Eliassen Group Biometrics and Data Solutions

A heat map graph is a visualization technique of representing data in a two-dimensional form using variations in colors to show different data values like extreme colors represents extreme values. The goal of heat map is to provide a colored visual summary of information and giving a better insight. Heat Maps are commonly used to monitor region wise weather, thermal values, geographical heights and migrations of people etc. In clinical trial domains heat map graphs can be used to represent adverse events by treatment, metabolic abnormalities, infection rate in children vs young adults or male vs female etc. This paper demonstrates the relative efficiency of treatment groups versus placebo with a practical example using SAS® procedure and techniques enhancing the graph.

DV-082 : Automated CONSORT Flow Diagram Generation by SAS Programming
Ryan Yu, Regeneron Pharmaceuticals Inc.

Consolidated Standards of Reporting Trials (CONSORT) flow diagrams are often included in the Clinical Study Report to provide a bird's eye view of the flow of patients through the different stages of the trials. In the past these diagrams were often created manually. In this project, a SAS macro (SAS version 9.4) is developed to automate the CONSORT flow diagram generation from analysis data, ADaM ADSL (subject level analysis dataset). The SAS macro is designed to create and display three basic elements (box, arrow and text) in the diagram by SAS SGPLOT procedure. ADaM ADSL dataset is used as the input of the macro. ADSL includes all the information needed to create the diagram. The disposition table in Clinical Study Report (CSR) is usually generated from ADSL. Using same data source will ensure consistency between CSR tables and the CONSORT diagram. Specifically, based on ADSL and/or macro parameters specified by users, the macro will calculate/derive number of boxes needed, location and size of each box, direction of the flow from one box to another, and texts to be displayed inside each box. Generally, the CONSORT flow diagram macro will generate boxes for screening, screen failure, enrolled (randomized), summary by treatment group and participation status at different periods of the study.

DV-086 : Creating Patient Profiles Using SAS® Software: Proc Report & SAS ODS
Ballari Sen, Agios Pharmaceuticals,Inc.

Patient profiles provide comprehensive information for a single subject participating in a clinical study. The profile template includes relevant clinical data for a subject that can help understand adverse events, concomitant medications, exposure, lab findings and other significant events and findings as a narrative or a visual report. The Reports are constructed to convey information about a single patient in a concise output within a company for data review. It acts as a key to identifying why a subject experienced adverse event, why the subject took concomitant medication or the effects of dosage on a subject for the investigational treatment. These different requirements and specifications could only be possible through certain amount of customized programming. SAS provides rich procedures to efficiently create these important reports. This paper will discuss how to produce patient profiles utilizing SAS® software proc procedures: PROC REPORT with the data Step. These patient profiles can serve as a reliable and effective data output source for references in writing a study report, analyzing output for communicating between data management and clinical departments. This paper also utilizes SAS® ODS ability to sort out the actual data and create the output in PDF document with bookmarks embedded in the profile template.

DV-113 : Data Visualization in Real World Data (RWD) and Health Economics Outcomes Research (HEOR) using (Statistical Analysis Software) SAS
Swapna Ambati, Outcomes Research Manager

Real-world data (RWD) analytics in Health economics outcomes research (HEOR) generates real-world evidence (RWE) to provide useful information that helps in decision-making in healthcare setting and healthcare-related investment. RWE compliments clinical trial findings and may help fill knowledge gaps related to how medication is used in the real world. Data visualization techniques help explore, synthesize, and communicate the results of research studies. In this paper, we will explore data visualization methods using SAS analysis software and SAS Studio specifically for standard HEOR analysis such as patient cohort extraction steps, demographics, comorbidities, cost, utilization, treatment patterns, medication, adherence, incidence, prevalence, and survey related analysis. We will generate patient disposition charts, flow charts, bar charts, histograms, tree maps, bubble charts, pie charts, line charts, scatter plots, polar charts, Sankey plots, 3D maps, stacked charts, butterfly charts and so forth to study HEOR results specifically.

DV-114 : Customize and Beautify Your Kaplan-Meier Curves in Simple Steps
John Henderson, Abbott
Lili Li, Abbott

Kaplan-Meier curves are among the most frequently employed graphs to visualize and demonstrate the estimate of survival patterns in clinical trials. SAS® software offers simple tools for generating survival analysis results using PROC LIFETEST. Default graphical outputs by PROC LIFETEST are adequate for assessing simple outcomes; however, presentation of analysis results often requires various and complex customizations. Specifically, confidence interval is a very commonly requested addition to aid the interpretation of survival analysis data. Among different ways of displaying confidence intervals, confidence interval bracket is one particularly helpful feature that can provide a better representation of analysis at specific increments of interest. Yet adding confidence interval brackets usually requires advanced and complex coding as PROC LIFETEST default outputs incorporate background shading representing the pointwise confidence intervals not brackets only at specified time points of interest. This paper provides a straightforward and flexible approach by applying tools provided in the PROC SGPLOT procedure, together with implementation of annotation datasets, that allow sophisticated customizations of confidence interval brackets using survival data generated from PROC LIFETEST.

DV-115 : Making Your SAS Results More Meaningful with Color
Kirk Paul Lafler, sasNerd

Color can help make your SAS® results more meaningful. Instead of producing boring and ineffective results, users are able to enhance the appearance of their output, documents, reports, tables, charts, statistics, and spreadsheets to highlight and draw attention to important data elements, details, and issues, including using color in headings, subheadings, footers, minimum and maximum values, ranges, outliers, special conditions, and other elements. Color can be added to text, foreground, background, rows, columns, cells, summaries, totals, and traffic lighting scenarios. Topics include using color to results, documents, reports, tables, charts and spreadsheets can be enhanced with color, effectively add color to PDF, RTF, HTML, and Excel spreadsheet results using PROC PRINT, PROC FREQ, PROC REPORT, PROC TABULATE, and PROC SGPLOT and Output Delivery System (ODS) with style.

DV-187 : Dressing Up your SAS/GRAPH and SG Procedural Output with Templates, Attributes and Annotation
Louise Hadden, Abt Associates Inc.

Enhancing output from SAS/GRAPH® has been the subject of many a SAS® paper over the years, including my own and those written with co-authors. The more recent graphic output from PROC SGPLOT and the recently released PROC SGMAP is often "camera-ready" without any user intervention, but occasionally there is a need for additional customization. SAS/GRAPH is a separate SAS product for which a specific license is required, and newer SAS maps (GfK Geomarketing) are available with a SAS/GRAPH license. In the past, along with SAS/GRAPH maps, all mapping procedures associated with SAS/GRAPH were only available to those with a SAS/GRAPH license. As of SAS 9.4 M6, all relevant mapping procedures have been made available in BASE SAS, which is a rich resource for SAS users, and in SAS 9.4 M7, further enhancements were provided. This paper and presentation will explore new opportunities within BASE SAS for creating remarkable graphic output, and compare and contrast techniques in both SAS/GRAPH such as PROC TEMPLATE, PROC GREPLAY, PROC SGRENDER, and GTL, SAS-provided annotation macros and the concept of "ATTRS" in SG procedures. Discussion of the evolution of SG procedures and the myriad possibilities offered by PROC GEOCODE's availability in BASE SAS will be included.

DV-188 : Visually Exploring Proximity Analyses Using SAS PROC GEOCODE and SGMAP and Public Use Data Sets
Louise Hadden, Abt Associates Inc.

Numerous international and domestic governments provide free public access to downloadable databases containing health data. Examples presented include information from John Hopkins COVID-19 data base, state of Massachusetts COVID-19 statistics on communities and nursing homes, and the Centers for Medicare and Medicaid Services' Care Choices Nursing Home Data. This paper and presentation will describe the process of downloading data and creating an analytic data base which includes geographic data; running SAS®' PROC GEOCODE (part of Base SAS®) using free Tiger street address level data to obtain latitude and longitude at a finer level than zip code; and finally using PROC SGMAP (part of Base SAS®) with annotation and animation to create a visualization of a proximity analysis over time.

DV-192 : Super Static SG Graphs
Kriss Harris, SAS Specialists Ltd.

Would you like to know the secrets to quickly producing graphs just the way you want them? Such as the coloring and annotating? This submission will show you how to create the graphs you want through using a Kaplan Meier, Forest Plot, and Swimmer Plot example.

DV-200 : A Sassy Substitute to Represent the Longitudinal Data - The Lasagna Plot
Soujanya Konda, IQVIA Pvt Ltd

Data interpretation becomes complex when the data contains thousands of digits, pages, and variables. Generally, such data is represented in a graphical format. Graphs can be tedious to interpret because of voluminous data, which may include various parameters and/or subjects. Trend analysis is most sought after, to maximize the benefits from a product and minimize the background research such as the selection of subjects and so on. Additionally, dynamic representation and sorting of visual data will be used for exploratory data analysis. The Lasagna plot makes the job easier and represents the data in a pleasant way. This paper explains the tips and tricks of the Lasagna plot.

Hands-On Training

HT-009 : Understanding Administrative Healthcare Datasets using SAS programming tools
Jayanth Iyengar, Data Systems Consultants LLC

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers and insurance companies is massive and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of administrative healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.

HT-118 : Essential Programming Techniques Every SAS User Should Learn
Kirk Paul Lafler, sasNerd

SAS software boasts countless functions, algorithms, procedures, options, methods, code constructs, and other features to help users automate and deploy solutions for specific tasks and problems, as well as to access, transform, analyze, and manage data. This hands-on workshop presents essential programming techniques every pragmatic user and programmer should learn. Topics include determining the number of by-group levels exist within classification variables; data manipulation with the family of CAT functions; merging or joining multiple tables of data; performing table lookup operations with user-defined formats; creating single-value and value-list macro variables with PROC SQL; examining and processing the contents of value-list macro variables; processing repetitive data with arrays; and using metadata to better understand the contents of SAS datasets.

HT-127 : Mitigating Multicollinearity: Using Regulation Techniques to Minimize the Effect of Collinearity
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables are linearly related, or codependent. The presence of this phenomenon can have a negative impact on an analysis as a whole and can severely limit the conclusions of a research study. In this training, we will explore how to detect multicollinearity, and once detected, which regularization techniques would be the most appropriate to combat it. The nuances and assumptions of R1 (Lasso), R2 (Ridge Regression), and Elastic Nets will be covered in order to provide adequate background for appropriate analytic implementation. This training is intended for any level of SAS® user. This training is designed for an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.

HT-197 : Yo Mama is Broke Cause Yo Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS Data Sets Through Exception Handling Routines
Troy Hughes, Datmesis Analytics

Exception handling routines describe the processes that can autonomously, proactively, and consistently identify and respond to threats to software reliability, by dynamically shifting process flow and by often notifying stakeholders of perceived threats or failures. Especially where software (including its resultant data products) supports critical infrastructure, has downstream processes, supports dependent users, or must otherwise be robust to failure, comprehensive exception handling can greatly improve software quality and performance. This text introduces Base SAS® defensive programming techniques that identify when data sets are missing, exclusively locked, or inadequately populated. The use of user-defined return codes as well as the &SYSCC (system current condition) automatic macro variable is demonstrated, facilitating the programmatic identification of warnings and runtime errors. This best practice eliminates the necessity for SAS practitioners to routinely and repeatedly check the SAS log to evaluate software runtime or completion status. Finally, this text demonstrates wrapping exception handling routines within modular, reusable code blocks, thus increasing both software quality and functionality.

HT-206 : R & Python for Drug Development
Phil Bowsher, RStudio Inc.
Sean Lopp, RStudio

RStudio will be presenting an overview of the interoperability between R and python, the Tidyverse, Shiny and R Markdown for the R user community. This is a great opportunity to learn and get inspired about new capabilities for creating compelling analyses with applications in drug development. No prior knowledge of R, Python, RStudio or Shiny is needed. This short course will provide an introduction to flexible and powerful tools for statistical analysis, reproducible research and interactive visualizations. The hands-on course will include an overview of R and Python, Tidyverse for clinical data wrangling, how to build Shiny apps and R Markdown documents as well as visualizations using HTML Widgets for R. Immunogenicity assessments and other drug development examples will be reviewed and generated for each topic. Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny combines the computational power of R with the interactivity of the modern web. Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge. An introduction to databases will be reviewed as well as R web APIs.

HT-210 : Machine Learning Programming Workshop - Natural Language Processing (NLP)
Kevin Lee, Genpact

One of the most popular Machine Learning implementations is Natural Language Processing (NLP). NLP is a Machine Learning application or service which is able to understand human language. Some practical implementations are speech recognition, machine translation and chatbot. Sri, Alexa and Google Home are popular applications whose technologies are based on NLP. Hands-on Workshop of NLP Machine Learning Programming is intended for statistical programmers and biostatisticians who want to learn how to conduct simple NLP Machine Learning projects. Hands-on NLP workshop will use the most popular Machine Learning program - Python. The training will also use the most popular Machine Learning platform, Jupyter Notebook/Lab. During hands-on workshop, programmers will use actual Python codes in Jupyter notebook to run simple NLP Machine Learning projects. In the workshop, programmers will also get introduced to popular NLP Machine Learning packages such as keras, pytorch, nltk, BERT, spacy and others.

HT-212 : SAS Macro Efficiencies
Charu Shankar, SAS

The SAS macro language is a very versatile and useful tool. It is often used to reduce the amount of regular SAS code and it facilitates passing information from one procedure to another procedure. Furthermore, we can use it to write SAS programs that are "dynamic" and flexible. In this hands-on workshop, users will learn how to create macro variables and how to write macro programs.

HT-214 : Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You
Vince DelGobbo, SAS

This paper explains different techniques available to you when importing and exporting SAS® and Microsoft Excel data. You learn how to import Excel data into SAS using the IMPORT procedure, the SAS DATA step, SAS® Enterprise Guide®, and other methods. Exporting data and analytical results from SAS to Excel is performed using the EXPORT procedure, the SAS DATA step, SAS Enterprise Guide, the SAS Output Delivery System (ODS), and other tools. The material is appropriate for all skill levels, and the techniques work with various versions of SAS software running on the Windows, UNIX (including Linux), and z/OS operating systems. Some techniques require only Base SAS® and others require the SAS/ACCESS® Interface to PC Files.

Leadership Skills

LS-093 : Why Data Scientists need leadership skills? Story of Cross-Value Chain Data Utilization Project
Yura Suzuki, Shionogi & Co., Ltd.
Yuichi Koretaka, Shionogi & Co., Ltd.
Ryo Kiguchi, Shionogi & Co., Ltd.
Yoshitake Kitanishi, Shionogi & Co., Ltd.

Shionogi Data Science (DS) Office is a new organization established in April 2020. One of our mission is to contribute to planning and promotion, and to support from the aspects of statistics and data science across the value chain and to contribute to management decisions based on scientific evidence. Many members of the DS Office have lots of experience in statistical analysis, programming, and data management in clinical drug development. We are playing a new role as data scientists based on the skills that have cultivated up to now, while gaining further strengths and skills. We have more opportunities to interact with colleagues of each value chain, and we are promoting the solution of business issues based on various data. In addition to the ability to carry out data analysis, leadership skills are required such as the ability to identify and summarize their needs, the ability to draw out an analysis plan to solve the issues, the ability to carry it out in cooperation with the colleagues in the DS Office, the ability to show results in an easy-to-understand manner, and the ability to discuss next action within the team. In this paper, we will take up the "Cross-value chain data utilization project" as a concrete example and introduce the leadership skills required for mid-level data scientists.

LS-116 : Differentiate Yourself
Kirk Paul Lafler, sasNerd

Today's job, employment, contracting, and consulting marketplace is highly competitive. As a result, SAS® professionals should do everything they can to differentiate and prepare themselves for the global marketplace by acquiring and enhancing their technical and soft skills. Topics include describing how SAS professionals should assess and enhance their existing skills using an assortment of valuable, and "free", SAS-related content; become involved, volunteer, publish, and speak at in-house, local, regional and international SAS user group meetings and conferences; and publish blog posts, videos, articles, and PDF "white" papers to share knowledge and differentiate themselves from the competition.

LS-148 : How to handle decision making in the CRO / sponsor relationship in matters that require expertise
Sarah McLaughlin, Sumitomo Dainippon Pharma Oncology

Programming for clinical trials requires an ever-expanding skill set. Many of us are working on trials with team members from both a sponsor company and one or more CROs. In order for these partnerships to be both efficient and successful it is important to assess what expertise exists within each organization and at what level it can be found. No one individual knows everything needed to program successfully in an environment with complex, interdependent deliverables. Requirements change over time and differ between regulatory authorities. Expert review may be needed, but timelines can be derailed if it is done too late in the process after key decisions have already been made. This presentation details recommendations for assessing this problem. They include: Create an inventory of deliverables needed on a project Assess how robust the team's expertise is Identify individuals with expertise about specific items in the team either at the CRO or the sponsor company Identify areas of uncertainty. These may involve developing deliverables for an unfamiliar regulatory agency, a new indication, or under a new version of CDISC standards Decide if gaps in expertise should be addressed by the CRO, the sponsor or a joint team Actions that may need to be taken outside of the existing team include: Assess expertise within the sponsor company Escalate to more experienced staff in either organization Identify team members to research specific questions Hire outside specialists to review certain deliverables

LS-152 : Virtual Recruitment and Onboarding of Mapping Programmers during Covid 19- Merck's RaM Mapping Experience
Srinivasa Rao Mandava, Merck

Covid 19 has changed the way of life of human perspective due lockdown of office work while work from home have become of new way of life. It also changed dramatically the business hiring process and remote hiring become a key tool with modernization of their recruitment processes rapidly and innovatively. Thus, virtual recruitment and onboarding originated out of necessity in this crisis, is actively become a go-to method for keeping the hiring ball rolling without suffering the new drug submission processes across the pharmaceutical industry especially tier 1 companies like Merck. Vendors of sponsor companies like ours following these processes for long years, due to pandemic lockdown situation big pharma is also following same methodologies with advanced technologies, tools, and connectivities. RaM Mapping division of Merck Research Laboratories has developed a 20 point recruitment, onboarding and team assimilation process of new mapping programmers for operational need to support their portfolio enhancement in last March of 2020. We are going to discuss our established methodology in detail.

LS-193 : Leadership and Programming
Kriss Harris, SAS Specialists Ltd.

The world is looking for leaders, and you can be a leader! As a Programmer or Statistician you have the power to lead and influence the clinical study, and essentially, you have the power to improve the quality of people's lives. As a leader, you need to be able to step forward and put your hand up, although this can be a challenge because as Programmers and Statisticians, a lot of us can be introverted and live in our head more, and shy away from taking the lead. The message here isn't implying that introversion is something that needs to be cured. The message here as quoted by Susan Cain is merely implying that "Introverts need to trust their gut and share their ideas as powerfully as they can. This does not mean aping extroverts; ideas can be shared quietly, they can be communicated in writing, they can be packaged into highly produced lectures, they can be advanced by allies." Whether you are introverted or not, this presentation will help you to lead yourself, lead others, obtain your goals and ultimately help people live better quality lives. I'll be sharing with you my learnings from the Leadership Academy training which I took in 2019, and my other personal development learnings.

LS-194 : Managing Transitions so Your Life is Easier
Kirsty Lauderdale, Clinical Solutions Group

Transitions can be a very troublesome time for everyone involved; managers, programmers and clients. It seems like a very simple thing to manage, however people still struggle with programming transitions, leaving a huge knowledge gap when a programmer departs the team. Effectively managing the information flow during a transition makes life much simpler, for both the transitioner and the transitionee. The most common programming complaint from programmers taking over code is that there are chunks of code that do things that we aren't quite sure they should be, and sometimes code that we have no idea what it is actually doing. This paper lays out an effective transition plan, which can be applied to single program codes, full studies, and used across multiple roles - management, programming and statistics.

Medical Devices

MD-035 : ADaM Implementation Guide for Medical Devices
Julia Yang, Medtronic plc
Silvia Faini, LivaNova
Karin LaPann, PRA International

This paper presents the soon-to-be published ADaM Implementation Guide for Medical Devices-v1.0 (ADaMIG MD). The guide is intended to address the typical statistical analysis needs for clinical trials using medical devices. Medical devices can be analyzed alone or with regard to subjects, i.e., those using the devices. This drives the need for a new General Observation Class, referred to as the Device Level Analysis Dataset (ADDL). The ADDL plays a role similar to the role of the ADSL, except that the identifier variable is the device identifier instead of the subject identifier. In the case of a device study, ADDL must be included. ADDL allows basic device-level information to be collected and merged with any other dataset containing the device identifier. Medical device analysis also requires two other new classes: Medical Device Basic Data Structure (MDBDS), and the Medical Device Occurrence Data Structure (MDOCCDS). These two classes are introduced to support the analysis when the device identifier is required and the subject identifier is optional. A new SubClass under MDBDS, the Medical Device Time-to-event (MDTTE), is added for device survival analysis. These new classes and SubClass have been registered in Controlled Terminology. Along with the new metadata for the new classes and SubClass, the conformance rules are provided. This paper also presents examples of how to use the Study Data Tabulation Model for Medical Devices (SDTM-MD) to create ADaM datasets for medical device analysis.

MD-038 : Types of Devices in Therapeutic Area User Guides
Carey Smoak, Aerotek

Thirty-seven unique Therapeutic Area User Guides (TAUGs) have been published as of 2020. Thirty of these TAUGs either have examples of device data or mention device data. Of the twenty-three TAUGs which have examples of medical device data, there are twenty-one types of device data in these TAUGs. The most frequent types of device data are imaging, diagnostic tests and lab tests. These examples of device data would be useful in the next Medical Device SDTM Implementation Guide (SDTMIG-MD), particularly for ancillary devices used in pharmaceutical studies. Better examples of device data from medical device companies is needed from medical device companies for the next SDTMIG-MD. It is also suggested that reference documents for imaging, diagnostic tests and lab tests be developed to provide better consistency of presentation of device data in the TAUGs.

MD-044 : Multiple Successful Submissions of Medical Device Clinical Trial Data in the US and China using CDISC
Phil Hall, Edwards Lifesciences
Tikiri Karunasundera, Edwards Lifesciences
Subendra Maharjan, Edwards Lifesciences
Vijaya Vegesna, Edwards Lifesciences

It is not yet mandatory for medical device trial data to be submitted using CDISC but The Center for Devices and Radiological Health (CDRH) accepts clinical trial data in any format, including CDISC. This paper serves as a case-study of the successful regulatory submissions of four Edwards Lifesciences' trials, three in the US and one in China, which utilized SDTMs and ADaMs. There will be a review of the SDTM domains used for medical device-specific data and a general discussion of the submission approach.

Quick Tips

QT-007 : How to Write a SAS Open Code Macro to Achieve Faster Updates
Frederick Cieri, Consultant

This paper demonstrates how to write a SAS® macro using mostly open code statements to make a macro easier to debug, run, and understand. A macro is not an open code item, but open code techniques can be used to simplify macro code. Instead of having to run a whole macro to get results, an open code macro can be submitted by running just one data step or procedure at a time which saves time in running code and making updates. Examples are given to convert nested macros with '%if %then' and '%do' loops code to macros using primarily open code statements. The paper is for intermediate to advanced users of SAS® macro coding.

QT-021 : Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan, Accenture

The Challenge: We have long SAS ® programs that accomplish a number of different objectives. We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don't want to re-run the parts that ran successfully. We don't want to recreate data sets that were already created. This can take a considerable amount of time and resources and can also occupy additional space while the data sets are being created. We only need some of the results from the program currently, but we want to preserve the entire program. We want to test new scenarios that only require subsets of the program.

QT-026 : From SAS Dataset to Structured Document
Hengwei Liu, Daiichi Sankyo Inc

A structured document is an electronic document whose contents are organized into labeled blocks according to a schema. This is done through a mark-up language such as HTML or XML. If the information to be displayed in the structured document are saved in a SAS dataset, the document can be generated through programming. In this paper we discuss how to use SAS and some other programming languages to do that.

QT-027 : What Kind of WHICH Do You CHOOSE to be?
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

A typical task for a SAS® practitioner is the creation of a new variable that is based on the value of another variable or string. This task is frequently accomplished by the use of IF-THEN-ELSE statements. However, manually typing a series of IF-THEN-ELSE statements can be time-consuming and tedious, as well as prone to typos or cut and paste errors. Serendipitously, SAS has provided us with an easier way to assign values to a new variable. The WHICH and CHOOSE functions provide a convenient and efficient method for data-driven variable creation.

QT-065 : Unnoticed individual and mean line plot issue while using the SGPLOT procedure and the solution to avoid
Jagadish Katam, Princeps

This paper talks about the issue with Overlaying time series for individuals and mean values in a single graph using PROC SGPLOT and the solution to it. Using the SGPLOT procedure to develop the overlaying time series plot, the individual as well as mean values lines in graph will not terminate to the end of X-axis. So, they will appear as unending lines circling back. This paper will have the images of the plots through which the issue will be detailed. In addition to this, it will talk about the changes that a dataset needs before passing into SGPLOT procedure and other required steps. This is an unnoticed issue in the SGPLOT procedure and the solution to it will help the peers to pay more attention while working on the individual/mean line plots.

QT-069 : Writing Out Your SAS Program Without Manually Writing it Out- Let Macros Do the Busy Work
Mindy Wang, Independant Consultant

Problem arrived when we were asked to write out data to ASCII files using SAS datasets. The format is in Excel spread sheet, but there are hundreds of variables. It is tedious to write the program all by hand. By using macro variables there is an easier way for SAS to write out the text string for you, rather than you manually key in the details. It only takes a few second to run and the program can be easily applied to other datasets, as long as we have the formats the Excel or in SAS metadata. With the same logic, you can also modify the code for other repetitive tasks. For the SAS users who are thinking about using macros, but are kind of intimidated to use macros, this will be the presentation to get you into the door of starting to use macros. Once you learn the concept, the possibility is endless. This presentation will make their life a lot easier. You will have less busy work and less headache.

QT-075 : Shiny decisions - An R-shiny application for decision frameworks
Lasse Nielsen, Novo Nordisk

Taking a drug from development to market is a costly and time-consuming activity for all pharmaceutical companies. Trying with a drug that will not go into market can hence cost billions before it is cancelled. We have developed a R-shiny tool at Novo Nordisk which is built upon the framework developed by Frewer et. al. . This framework is based upon the efficacy seen in literature and/or clinical trials beforehand. This is done by establishing a lower limit of clinical relevance and a value for what clinical relevance that would like to be reached. Then provided the standard deviation of a clinical study, a probability of a stop and a go decision can be calculated using a selected distribution. The tool we have derived is a very user-friendly shiny application that takes this framework and produce both figures and tables. Together with this it takes multiple inputs to tailor to the users needs and updates it in real time. Furthermore, it has multiple unique features like comparing two different scenarios, adding multiple tabs when appropriate and hiding and exchanging inputs based on distribution selection. This application showcases how to build a shiny application which is to be used in a pharmaceutical company and shows some great implemented features. The implementation of this application will serve as an easily understandable, quantitative tool to support decision making and communication of the reasoning of that decision.

QT-096 : NOBS for Noobs
David Horvath, PhilaSUG

This mini-session will be a short discussion of the NOBS (number of observations) option on the SET statement. This includes one "gotcha" that I've run into with where clauses: NOBS is set before WHERE processing. If you have a reason to know the number of observations after the WHERE clause, another DATA step is needed.

QT-098 : Wildcarding in Where Clauses
David Horvath, PhilaSUG

There is wildcarding allowed within where clause like expressions by using special characters. But if you want to search for those specific characters, you have to override them. If you forget to escape the character, you'll get unexpected matches. Topics include: • proc sql; select x from y where x like %a/_b escape /; • _ matches 1 character • % matches any characters • / escapes the next character (says next is actual character, not special meaning) • set sorted_ddl (obs=max where=(name like '%/_ORD%' escape '/'));

QT-111 : A Unique Way to Identify the Errors in SDTM Annotations Using SAS
Xingxing Wu, Eli Lilly and Company
Bala Dhungana, Eli Lilly and Company

The annotated Case Report Form (aCRF) is a vital file of SDTM development and FDA submission. Therefore, it is very important to maintain the correctness of SDTM annotations in the aCRF. However, a single study would usually have several thousands of SDTM annotations and integrated database (IDB) would have over ten thousands of SDTM annotations. This makes it very difficult and time-consuming to check all the annotations manually. In addition, some errors in the SDTM annotations are not obvious and it is very easy to overlook these errors if checking them manually. To overcome these issues, this paper proposes a unique way to check the errors in SDTM annotations using SAS. This approach uses SAS to extract all the SDTM annotations from aCRF and then compares them with the pre-defined rules. The SDTM annotations with issues could then be identified if they do not follow these rules. The proposed approach can greatly improve the quality of aCRF in an efficient way.

QT-129 : Easing the Process of Creating CRF Annotations
Samadi Karunasundera, University of California, Berkeley
Jun Wang, Edwards Lifescience

The SDTM domains have to be annotated on the Case Report Form (CRF). This paper addresses two ways of efficiently conducting this action. The first method involves the manual consolidation of distributed work on distinct CRFs into a single collaborative pdf. The second method involves replicating the annotations programmatically by creating a dictionary of annotated standard forms which the program will use to upload and match up the annotations to the correct CRFs.

QT-134 : Using Regex to Parse Attribute-Value Pairs in a Macro Variable
Rowland Hale, Syneos Health

In some circumstances it may be deemed preferable to pass multiple values to a macro not via multiple parameters but via a list of attribute-value pairs in a single parameter. Parsing more complex parameter values such as these may seem daunting at first, but regular expressions help us make light of the task! This paper explains how to use the PRXPARSE(), PRXMATCH() and the lesser known PRXPOSN() regular expression functions in SAS® to extract in robust fashion the values we need from such a list. The paper is aimed at those who wish to expand on a basic knowledge of regular expressions, and although the functions are applied and explained within a specific practical context, the knowledge gained will have much potential for wider use in your daily programming work.

QT-139 : Common Dating in R: With an example of partial date imputation
Teckla Akinyi, GSK

R is increasingly being adopted in the pharmaceutical industry as part of submission packages which include ADaM data sets. It is imperative that clinical programmers upskill to handling dates within R. As an open source software there is a sprawl of information on the internet on this topic covering several ways to manipulate dates in R. This paper will consolidate information and focus on date handling using base R and the lubridate package. The focus will be to highlight functions useful to common transformations during the creation of ADaM datasets to the clinical programmer. This will include accessor functions used to extract parts of datetime strings, arithmetic functions used in calculating intervals/durations, culminating in example code for date imputation common for ADaM datasets ADAE and ADCM. A basic understanding of packages in R and the use of tidyverse, particularly the magrittr pipe and mutate function, are necessary to follow along.

QT-147 : Overcoming the Challenge of SAS Numeric Date Comparisons
Shikha Sreshtha, DY Patil University in Mumbai

The outputs from the PROC COMPARE are tedious to work with especially when handling SAS numeric dates in huge datasets. The SAS dates are essentially numbers, that is, the number of days between 01JAN1960 to the input date and with the FORMAT statement, the date variables are only displayed in one of the standard date formats. When comparing numeric dates in clinical formats of standard e8601da10. using PROC COMPARE, all we see are frustrating little stars in the list file. To the rescue, are these huge array of options that the COMPARE procedure comes with. When faced with record-SAS date differences, most programmers turn to temporary data sub-setting and patches of code to determine the key variables to locate where the issue lies. All this effort to identify the one tiny needle in stack of hay and inform the 'base' programmer of the intended variable value. With the help of a handy SAS user-defined macro, we can do a side-by-side comparison of the SAS numeric dates in the output dataset and shift the 'compare' programmers focus to do a comprehensive review. The output dataset from the macro is concise and displays the actual numeric dates that makes the interpretation easy. The macro is designed to highlight the key variables for which the record SAS date differences exist and do a side-by-side comparison in a dataset format rather than a list file format.

QT-159 : Manage SAS Log in Clinical Programming
Yuping Wu, PRA Health Science

This paper introduces a method to manage SAS log in clinical programming. It includes two steps of log checking in interactive and batch mode. Each step is conducted by a macro. The first macro is placed in program and is used to produce log, summarize and list log issues in current log window and save the log file in designated folder. It is mainly used for programmers to detect log issues when they are developing or running individual programs. The second macro is run in batch mode and is used to scan stored log files of the selected processes such as SDTM or ADaM or TFL or all of them, and produce a status report that highlights the programs with log issues. This macro provides a broad picture of log status for all programs of the study. With this method programmers can easily detect and fix log issues locally and identify programs with log issues in summary report.

QT-161 : "Who wrote this!?" Challenges of replacing Black Box software with more dynamic supportable processes utilizing Open-Source Languages
Frank Menius, YPrime
Monali Khanna, YPrime
Terek Peterson, YPrime
Dennis Sweitzer, YPrime

Good News - the software program that was written and put into production years ago still works. Challenge is dealing with the "Black Box" whereas input is entered, and output extracted correctly but no one truly understands the processes and transformations that take place inside the box. For validated applications, a company may be able to operate like this for a long time but what if a bug is located or the needed output format changes? These applications then must be enhanced or replaced. This paper is about the experience gathered in the effort to replace a SAS macro program utilized in data transfers with a more dynamic and supportable program in R. We will focus both on the technical challenges of converting a solution from one language to another and the inherent challenges of change management. We will share steps and lessons learned that can expedite a similar process.

QT-180 : The Mean, But Not as You Know It!
David Franklin,

The 'Mean', as most SAS programmers know it, is the Arithmetic Mean. However, there are situations where it may necessary to calculate different 'means'. This paper first looks at different methods that are widely used from a programmer's perspective, starting with the humble Arithmetic Mean, then proceeding to the other Pythagorean Means, known as the Geometric Mean and Harmonic Mean, before ending with a quick look at the Interquartile Mean and its related Truncated Mean. During the journey there will be examples of data and code given to demonstrate how each method is done and output.

QT-183 : PROC TABULATE and the Percentage
David Franklin,

PROC TABULATE is a very powerful procedure which can do statistics and frequency counts very efficiently, but it also it has the capability of calculating percentages on many levels for a category. This paper looks at the automatic percentage calculations that are provided, and then delves into how a user can specify the denominator for your custom percentage.

QT-186 : Where's Waldo? An SDTM Crawler
Derek Morgan, Clinical Solutions Group

This is a quick utility that will find all records for a given USUBJID across the entire SDTM database, and can be modified to display all common SDTM data. Originally created to detect false subject visits from a defective clinical data entry system, as written, it lists all --DTC variables from an SDTM data set, as well as VISIT and VISITNUM, if present.

QT-189 : Management of Metadata and Documentation When Your Data Base Structure is Fluid: What to Do if Your Data Dictionary has Varying Number of Variables
Louise Hadden, Abt Associates Inc.

A data dictionary for a file based on Electronic Medical Records (EMR) contains variables which represent an unknown number of COVID-19 tests for an unknown number of infants - there is no way to know in advance how many iterations of the COVID test variable will exist in the actual data file from medical entities. In addition, variables in this file may exist for three different groups (pregnant women, postpartum women, and infants), with PR, PP and IN prefixes, respectively. This presentation demonstrates how to process such variables in a data dictionary to drive label (and value label) description creation for iterated (and other) labels using SAS functions, as well as other utilities.

Real World Evidence and Big Data

RW-063 : Generating FDA-ready Submission Datasets directly from EHRs
Jozef Aerts, XML4Pharma

Use of Real-World-Data (RWD) in clinical research (also instead of placebo data) is growing rapidly, especially due to the rise of the HL7-FHIR standard. FDA however requires study data to be submitted as tables using the SDTM standard and in the outdated SAS Transport 5 format. In this presentation we will demonstrate how data from an Electronic Health Record (EHR) system using HL7-FHIR and its API are transformed fully automatically into FDA-submission-ready SDTM datasets. In the classic way (usually using SAS to reorganize the data), this is a process that usually takes weeks. With our prototype software, it is a matter of a few minutes. In the presentation we will also explain how the new HL7-FHIR "Clinical Research" resources will further boost the use of EHRs in clinical research. This will further be enabled by the results of the current "FHIR-to-CDISC" project that provides mappings and FHIRPath expressions (explained in the presentation) for a large number of standardized CDASH clinical research forms. The presentation will also elaborate on the future of FDA-submissions: "rolling data", modern formats such as JSON and XML, and the use of RESTful Web Services and APIs.

RW-126 : A Programmer's Experience Researching Real-World Evidence (RWE) COVID-19 Data
Ginger Barlow, UBC

In early 2020, when the COVID pandemic hit the US, researchers started gathering and analyzing data to help understand who was vulnerable to the infection, make major policy decisions to reduce the spread of the disease and start the race for an effective vaccine. At UBC, a team of researchers used the Electronic Health Record (EHR) data from Cerner Real World Data to analyze records of 14,371 inpatients with a COVID-19 diagnosis. The objective of the study was to evaluate the number, patient characteristics, and outcomes of patients hospitalized with COVID-19 using laboratory-confirmation to identify cases compared with clinically-diagnosed cases without positive lab tests. Our statistical programming team was called to analyze the data but there were many challenges in doing this type of research including how different the data in the RWE database was compared to interventional or observational trials data, classifying patients into 4 cohorts based on how their diagnosis was made possibly across multiple encounters and the lack of universal accurate laboratory testing and heterogeneity in ICD-10 codes. We defined 4 mutually exclusive COVID cohorts and then compared comorbidities, medications, complications and length of stay. The programming challenges were interesting and the conclusions of the study highlighted how important case definitions are when utilizing HER and claims-based datasets for research.

RW-136 : IOT or IOMT?
Lekitha Jk, Covance Clinical Development Pvt. Ltd.

We have all heard about the impending era of the internet of things (IOT)where smart devices will allow our homes, our cars, our businesses and countless other attributes of daily living to be managed via a click of a mobile device. IoT has the capacity to change and simplify the process of clinical trial to be more cost-effective and efficient. Stakeholders are looking for reducing the time taken for research on new treatments and diagnostic methods while improving ways of conducting clinical trials. Implementing Electronic Clinical Outcome Assessments or eCOA can improve data collection, reduce costs and increase patient participation or retention rates. Adoption of digital health on IoT platform can create limitless opportunity for smart technology sensors and medical devices. The initial response from clinical researchers through patient experience is positive in collection of biometric data on trial subjects by using the devices that collect data more efficiently. These devices can help gather data for analysis by connecting to the internet and improve transmission to other machines wirelessly. But how much has been said about the coming world of the internet of medical things (IOMT)? A world where patient, pharmacy, doctor, hospital, insurance company, pharma company and government are all connected, sharing data and information in real time and enabling virtual coordination of care regardless of physical location of the players? Only a fantasy, you say.? Perhaps it will arrive sooner than we think. Is pharma prepared to be part of a virtually managed health care system? What are the steps necessary to get ready and integrate with this future scenario?

RW-160 : Patient Registries - New Gold Standard for Real World Data
Neha Srivastava, Covance
Lavanya Peddibhotla, Covance
Sivasankar Konda, Covance

Patient registries have been evolving in the last few years as our quest for Real World Data (RWD) is increasing. The current global movement towards innovative and patient-centered healthcare is enabling patient registries to increasingly emerge as valuable tool not just for collecting Real World Data but also for making meaningful inferences and insights from the collected data. In the last few years, The Food and Drug Administration (FDA), European Medicines Agency (EMA) and other regulatory bodies have initiated several frameworks to highlight the importance of patient registries. In Sep, 2020 EMA has issued a draft guidance for considerations on patient registries. The paper highlights some key points from the guidance. Patient Registries are highly recommended to determine the effectiveness of the investigational product by exploring various subgroup of patients for extensive time duration. It also helps in assessing the safety or harm caused by the approved drug. Registries also prove worthy in measuring the quality of care and highlights the areas of improvement in various facets of the real world clinical practices. Registries can be designed precisely as per the research question and can serve as a significant medium to observe the real world clinical practices. Patient Registries are indeed going to be a new gold standard for real world data research.

RW-198 : Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS Software
Troy Hughes, Datmesis Analytics

Data mocking refers to the practice of manufacturing data that can be used in software functional and performance testing, including both load testing and stress testing. Mocked data are not production or "real" data, in that they do not abstract some real-world construct, but are considered to be sufficiently similar (to production data) to demonstrate how software would function and perform in a production environment. Data mocking is commonly employed during software development and testing phases and is especially useful where production data may be sensitive or where it may be infeasible to import production data into a non-production environment (such as where production data contain sensitive PII or PHI, company secrets, or federal government super-secrets). This text introduces the MOCKDATA SAS® macro, which creates mock data sets and/or text files for which SAS practitioners can vary (through parameterization) the number of observations, number of unique observations, randomization of observation order, number of character variables, length of character variables, number of numeric variables, highest numeric value, percentage of variables that have data, and whether character and/or numeric index variables (which cannot be missing) exist. An example implements MOCKDATA to compare the input/output (I/O) processing performance of SAS data sets and flat files, demonstrating the clear performance advantages of processing SAS data sets in lieu of text files.

RW-202 : NHANES Dietary Supplement component: a parallel programming project
Jayanth Iyengar, Data Systems Consultants LLC

The National Health and Nutrition Examination Survey (NHANES) contains many sections and components which report on and assess the nation's health status. A team of IT specialists and computer systems analysts handle data processing, quality control, and quality assurance for the survey. The most complex section of NHANES is dietary supplements, from which five publicly released data sets are derived. Because of its complexity, the Dietary Supplements section is assigned to two SAS programmers who are responsible for completing the project independently. This paper reviews the process for producing the Dietary Supplements section of NHANES, a parallel programming project, conducted by the National Center for Health Statistics, a center of the Centers for Disease Control (CDC)

RW-208 : Life After Drug Approval... What Programmers Need to Know About REMS
Cara Lacson, United BioSource Corporation
Carol Matthews, UBC

While most of the spotlight in the drug development process focuses on clinical trials and the effort to get drugs approved, FDA is turning to Risk Evaluation and Mitigation Strategy (REMS) as a way to approve drugs they may have safety concerns about while closely monitoring those potential safety issues once the drug is approved. A REMS often involves drugs that have a high risk for specific adverse events, and the FDA requires manufacturers to put a program in place to mitigate those risks. REMS are a growing sector of the market, and will only continue to grow in the future. Therefore, it is increasingly valuable to know the "ins and outs" of how to approach programming with REMS data, as it is very different from clinical trial programming. This paper explores the basic concepts of a REMS from a programming perspective: from a high level explanation of what a REMS is to the main differences programmers will see between clinical trials and REMS. We will discuss what types of data are typically included in REMS, what data issues to expect, general table structures and reported statistics, and how to effectively report data from an ongoing/changing database throughout the life of a REMS.

RW-209 : Standardizing Laboratory Data From Diverse RWD to Enable Meaningful Assessments of Drug Safety and Effectiveness
Irene Cosmatos, UBC
Michael Bulgrien, UBC

Laboratory results available in electronic medical record (EMR) data are becoming increasingly critical for the assessment of drug safety and effectiveness. However, unlike diagnoses and medications where vocabularies for data capture are well established, the recording of laboratory results in EMR databases is often not standardized, and differences occur across and even within EMR systems. This heterogeneity can create significant challenges for epidemiologic analyses requiring cohort and/or outcome definitions based on lab criteria. This project standardized diverse laboratory data from US and non-US EMR databases into a common structure to enable analyses to be compared across data sources that do not use a standardized coding system such as Logical Observation Identifiers Names and Codes (LOINC). UBC's database analysts and clinicians developed an approach to transform laboratory results from diverse RWD into a cohesive and accurate dataset based on standardized units and test names, while minimizing loss of data. Close clinical scrutiny and unit conversions were required to enable a common data structure. The effort focused on 3 liver function tests: ALT, AST and Total Bilirubin, using 3 data sources: 2 US and 1 EU. UBC will discuss our analytic and clinical challenges, such as differences between the EU and US in naming conventions of laboratory tests, and their resolutions. Results of this data standardization effort will be demonstrated, highlighting key issues that impact defining a study cohort or outcome based on quantitative lab criteria. This presentation will be relevant to any user interested in standardizing vocabularies used in RWD.

Statistics and Analytics

SA-032 : Back to Basics: Running an Analysis from Data to Refinement in SAS
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

Data Science is the new Space Race, launching us into a world of immeasurable possibility, but with only a few people to help us navigate it. As we dig deeper, discover more, and risk more, we can be simultaneously led to both great insight and loss. If we do not know what we are doing, Data Science can be a very dangerous thing. It is important for us all to learn at least a little bit about the possibilities and risks of this field of study, so we can navigate it together. This paper was written to give individuals new to SAS or Analytics a gentle nudge in the direction of the possibilities available through Data Science and SAS. It is designed to help you navigate through the process of data exploration by using publicly available COVID 19 data. We have all seen how fragile this data reporting can be, and this paper uses this fragility to help explain the dangers of an inappropriately implemented analytic process. Together, we will briefly touch on current best practices and common errors that occur at the different steps of an analysis (choosing data, exploring data, building and running a model, checking and refining model performance) while simultaneously reviewing common SAS procedures used in each of these steps (Data Step, Univariate Procedures, Multivariate Procedures, Power & Model Fit Procedures). At the end of this paper, the author provides several citations and recommended readings to help interested analysts further their education in Data Science implementation.

SA-043 : Visualization of Sparse PK Concentration Sampling Data, Step by Step (Improvement by Improvement)
Ilya Krivelevich, Eisai Inc.
Simon Lin, Eisai

Pharmacokinetic (PK) data is collected for a variety of purposes in clinical trials. PK data is critical in understanding a drug's safety and determining its dosing frequency. Pharmacokinetic analysis is a major part of clinical trials in order to obtain information on drug disposition in humans. Graphics are always a powerful tool in reporting, and Concentration vs. Timeline graphs are standard for reporting PK data. Data visualization tools are widely used in PK analysis, so meaningful graphics play a very important role. There are many presentations and examples that deal with displaying results of rich sample collections; however, there is a significant shortage of similar presentations that cover the visualization of results of the widely used sparse PK data collection. This article may be interesting from a methodology standpoint; it presents examples and suggestions of step-by-step development of figures for sparse PK concentration using powerful ODS SAS tools (PROC SGPLOT).

SA-047 : Build a model: Introducing a methodology to develop multiple regression models using R in oncology trial analyses
Girish Kankipati, Seagen Inc
Jai Deep Mittapalli, Seagen Inc

Regression analysis is a widely used modeling tool to estimate the relationship between two variables in statistics: a predictor variable whose value is observed through experiments and a response variable whose value is derived from the predictor. The general mathematical equation for a linear regression is Y = aX + b, where Y is the response or dependent variable, X is the predictor or independent variable, and a and b are constants called coefficients. If two or more predictor variables have a linear relationship with the dependent variable, the regression is called a multiple linear regression. In R, the lm function is widely used to build such regression models. This paper discusses a step-by-step process to build a multiple regression model in R using an example oncology trial data set: a) Get the regression equation on each predictor using lm function b) Check if multicollinearity exists per a pairwise Pearson's coefficient c) Select the model using the regsubsets function in the leaps package in R d) Check if any data transformation is required and identify any outliers and influential points e) Perform residual diagnostics to see whether the predictor variables met all model assumptions such as normality, homoscedasticity, and linearity. In our sample data set, prostate-specific antigen (PSA) level is the response variable (Y) and prostate cancer volume, prostate weight, and others are the predictor variables (X). Modeling the relationships among these variables in R will be explained thoroughly with the help of box and other plots using R shiny.

SA-062 : SAS Proc Mixed: A Statistical Programmer's Best Friend in QoL Analyses
Janaki Devi Manthena, Seagen Inc.
Varsha Korrapati, Seagen Inc.
Chiyu Zhang, Seagen Inc

SAS PROC MIXED is a powerful procedure that can be used to efficiently and comprehensively analyze longitudinal data such as many patient-reported outcomes (PRO) measurements overtime, especially when missing data are prevalent. This paper illustrates the commonly used statements and options in this procedure when used in such analyses. We will present a statistical programmer's perspective on how to calculate Least Square (LS) Mean, Standard Error, difference in LS Means between treatment arms, and corresponding 95% confidence interval at each time point using this procedure. This will be demonstrated using examples of PROC MIXED focusing on both linear mixed models and pattern mixture models on imputed and original QLQ-C30 questionnaire data, respectively.

SA-138 : Willing to Play with William's Design: An Exemplification of Randomization Schedule Generation using SAS
Abhyuday Chanda, Quartesian Research Private Limited

Crossover design is one of the most used design in the clinical trials. With 'n' treatments in a 'n' period crossover design, an imbalance of carryover effect of 1st order may be observed. To maintain the balance, one needs to make sure the number of times a treatment (Drug A) preceding another treatment (Drug B) is same as that treatment (Drug A) succeeding the other treatment (Drug B). A Williams design, which is a (generalized) Latin Square Design (LSD), balanced for first order carryover effects is applicable in these cases. If the number of treatments to be tested is even, the design is a Latin square (nXn), otherwise it consists of two Latin squares (2nXn). This paper will demonstrate the readers the step-by-step process of generating randomization schedules for both even (6X6) and odd (3X3) number of treatments with a Williams design using SAS®. The brief steps for generating the same are: 1) Generate a Latin square; 2) Create its mirror image and augment that to the main Latin square; 3) Interlace the columns of the main and the mirror Latin square; 4) Create the final sequences based on number of treatments; and 5.) Use those sequences for generating the final randomization schedule.

SA-141 : BAYESIAN Approach on COVID-19 test results
Vidya Srinivas, Covance Clinical Development Pvt Ltd
Ankita Sharma, Covance Clinical Development Pvt Ltd

Title: BAYESIAN Approach on COVID-19 test results. Description: Clinical trials are very expensive and their outcomes are crucial to the target population, concerned stakeholders, hence considerable pressure to optimize them. One route of optimization is to make better use of all available information, and Bayesian statistics provides this opportunity. Mainly, two statistical methodologies are applicable to the design and analysis of clinical trials: frequentist and Bayesian. However, Bayesian statistics starts with a prior belief (expressed as a prior distribution), which is then updated with the new evidence to yield a posterior belief (also a probability distribution). Bayesian statistics provides a mathematical method for calculating the likelihood of a future event, given the knowledge from prior events. These methods, thus, directly address the question of how new evidence should change what we currently believe.Posterior probabilities are updated via Bayes' theorem on the basis of accumulated data. Considering the current situation with PCR (Polymerase Chain Reaction) testing for COVID-19, false negative tests are particularly concerning, potentially leading to an inappropriate sense of security regarding infectivity. To accurately interpret test results, one needs to know the positive and negative predictive values of a test in the setting applied, which depend on its sensitivity and specificity, along with prevalence or pre-test probability. Bayes theorem can be applied to the interpretation of negative PCR results in patients with suspected COVID-19 infection. Through this presentation we will understand how the Bayesian approach can enhance the design of the clinical trial and its usefulness in near future.

SA-145 : Preserving the Privacy of Participants' Data by Data Anonymization
Chaithanya Velupam, Bangalore
Kaushik Sundaram, Covance

While the global landscape of clinical data sharing has become prominent day by day, it is challenging to store, mine, and analyze heterogeneous data across multiple data sources and multiple regions. Proactive sharing of clinical trial data has been a key strategic aim since last few years. Data must be shared in such a way as to ensure the protection of participant's privacy. Whilst this is the foremost priority in any data sharing exercise, the changing technology is increasingly disabling the ability to share data in enough depth and detail. We explore how anonymization of internationally sourced clinical trial data may be achieved while maintaining the scientific utility of the data. We focus on anonymization, which plays an important role for the re-use of clinical data and for the sharing of research data. We present a flexible solution for anonymizing distributed data in the semi-honest model by incorporating specific anonymization methods, so that the important insights of research data still prevail. Based on this case study, we provide useful recommendations that address some of the central questions of anonymization and consider the strengths and weaknesses of the anonymization process.

SA-153 : Statistical Considerations for Decentralised Trials
Ritu Karwal, Accenture Solutions

Pandemic of COVID-19 struck almost all aspects of regular way of life for billions of people and millions of businesses and pharmaceutical companies are not an exclusion. Due to massive travel restrictions that were imposed in many geographic regions a lot of running clinical trials were delayed or postponed. Decentralized trials (DCTs) are new trend. DCT utilises approach where patients do not have to travel to different sites for participation in clinical trials, they can take part from the convenience of their home. Latest successes in electronics, massive digitalization and secure communications allows to perform it. With this approach, compliance to the clinical trial can be improved. Decentralized trials mainly rely on digital collection of data, in turn enabling better engagement from patients and access to the data in real time. Not all trials can be run as DCTs (e.g. PK studies where blood samples need to be collected), but for whichever trials these can be run, these provide better access to the data. The present paper considers known complications for DCT from statistical point of view. Specifically, the authors concentrate on analysis of a change in the endpoints for clinical trials that were made as a result of their adaptation to the condition of decentralization. It is suggested that some of the approaches developed for conduction of the clinical research in the form of DCT can be very well adopted and used even after the pandemic end.

SA-154 : Leveraging Generalized Additive Models in Assessing the Disease Progression through Digital Wearable Devices on Patients with Parkinson's Disease with Dementia
Sakshi Kaushik, Quartesian Research Private Limited
Sakthivel Sivam, Quartesian Research Private Limited

With all the new technology outbreaks, it is high time to switch from a traditional path to a digitalized one in health care. Recently a significant spike has been observed in the use of digital wearable devices in clinical trials. This is owing to their many benefits including new and objective longitudinal data, low overall costs, data consistency across multiple sites and improvement in the effectiveness of the trials. Actigraph is the most used digital wearable devices that continuously collects epoch level (ranging from 15 to 60 seconds) data related to physical activity and quality of sleep. It is statistically and computationally challenging to analyse the vast amount of data (e.g. minute-by-minute data collected over 14 weeks) produced by actigraph. Analysing composite data obtained by combining (e.g., averaging) epoch-by-epoch level data into one single value may lead to loss of information. Thus, advanced statistical models such as Generalized Additive Model (GAM), which are known to be feasible for modelling large and longitudinal datasets are more appropriate for analysing actigraphy data. The objective of this paper is to explore the usefulness and applicability of GAM to analyse the epoch-by-epoch physical activity data, as measured by activity counts, obtained from actigraph to assess, and compare disease progression in different treatment groups in patients with Parkinson's Disease with Dementia (PDD). Key words: Actigraph, Activity, Digital Wearable Devices, Epoch, Generalized Additive Model (GAM), Parkinson's Disease with Dementia (PDD)

SA-155 : Evaluating anthropometric growth endpoints with Z-Scores and Percentiles
Snehal Sanas, Covance Clinical Development Pvt Ltd
Pradeep Umapathi, Covance Clinical Development Pvt Ltd

Disease conditions in children can potentially affect child's growth and development, hence an effective treatment for them should sufficiently demonstrate improvement not only in disease condition but also in growth pattern. Unlike in adults, assessment of drug safety and efficacy in children, requires specific consideration towards impact of treatment on child growth. Measuring overall child development can be 'complex task', as it involves evaluating anthropometric measures like right weight, height, body postures at different stages of developing age. Clinical trials involving pediatric population need to have composite efficacy endpoints to account for impact on growth. WHO and CDC have established worldwide anthropometric data standards against which child's growth can be validated and conclusions can be drawn whether child is growing correct or has any growth issues. This presentation will aim at understanding rationale behind data collected by WHO and CDC..., what are Z scores & percentiles..., how to calculate those.., how to include this reference data into programs for statistical analysis..., also how these are interpreted to evaluate growth. These standards have been used in clinical trials to demonstrate growth related drug efficacy, across various therapeutic areas as unique as rare congenital enzyme deficiencies to pediatric cardiology, infections etc. Presentation will attempt to describe a case for implementing Z scores and Percentiles into clinical trial using a macro in SAS and also touch base upon how this approach is useful not only in clinical trials but also in general assessment of child growth by pediatricians, health care providers.

SA-211 : How simulation will impact the future of Healthcare & Life Sciences.
Lois Wright, SAS Institute Inc
Allison Sealy, SAS Institute Inc
Bahar Biller, SAS Institute Inc
Shawn Tedman, SAS Institute Inc
Pritesh Desai, SAS Institute Inc

Simulation is a way to model a real-life or hypothetical process so that it can be studied to understand how the system works. By changing the parameters in simulation, predictions may be made about the behaviour of the system. Current simulation models in Healthcare and Life Sciences lack the details needed to glean valuable insights. This paper will review the current state of simulation in the Healthcare and Life Sciences space and also provide relevant examples of recent projects completed in partnership with the SAS Operations Research Center of Excellence. A roadmap of future trends, developments, and discussion of how machine learning will play a role and actually improve patient outcomes.

Strategic Implementation

SI-019 : Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
Stephen Sloan, Accenture
Lindsey Puryear, SAS Institute

The Challenge: Instead of managing a single project, we had to craft a solution that would manage hundreds of higher- and lower-priority projects, taking place in different locations and different parts of a large organization, all competing for common pools of resources. Our Solution: Develop a Project Optimizer tool using the CPM procedure to schedule the projects and using the GANTT procedure to display the resulting schedule. The Project Optimizer harnesses the power of the delay analysis feature of PROC CPM and its coordination with PROC GANTT to resolve resource conflicts, improve throughput, clearly illustrate results and improvements, and more efficiently take advantage of available people and equipment.

SI-028 : A TransCelerate Initiative - how can you modernize your statistical environment?
Vincent Amoruccio, Pfizer
Min Lee, Transcelerate
Daniel Woodie, Merck

A lack of evolution within the pharmaceutical industry's analytic capabilities has given rise to inefficiency and a failure to leverage modern statistical analysis techniques within the clinical development space. Moreover, a lack of regulatory perspective on this matter has become a barrier in implementing and leveraging newer analytical software capabilities. A TransCelerate initiative has developed a foundational framework that increases confidence in use of modernized statistical analysis and efficient technology across the industry. This has created an opportunity to provide clarity and instill confidence in the use of modern software technologies, including those that generate outputs that support regulatory submissions. This framework is based upon a methodology that establishes the principles of accuracy, traceability, and reproducibility for a modern analytical software environment to demonstrate that any output it generates is reliable. A set of best practices and guiding principles is also provided to support the framework's implementation across industry sponsors and other stakeholders. Along with a developed framework, engagement and consultation with Regulatory Agencies can be conducted to gain clarity on the framework's robustness and ability to provide confidence in using modern software technologies within clinical development. To demonstrate the value proposition of this modernized statistical framework, an illustration of how it can be used for a global regulatory submission will be presented.

SI-046 : PROC FUTURE PROOF v1.1-- Linked Data
Amy Gillespie, Merck & Co., Inc.
Susan Kramlik, Merck & Co., Inc.
Suhas Sanjee, Merck & Co., Inc.

As critical contributors to regulatory submissions, manuscripts, and statistical analyses, clinical trial programmers author programming code and leverage programming standards to produce deliverables in a validated, efficient, and reproducible manner. With the function having remained relatively constant for more than 20 years, there are potential opportunities and a need to transform the clinical trial programming role for continued success. Last year we published a paper that evaluated recent advances in technology and the clinical trial programming skillset to identify opportunities for improved programming efficiencies and compliance to regulatory requirements while ultimately optimizing the programming function. Use cases leveraging natural language processing and linked data were explored to determine potential value these digital technologies could add within clinical trial programming processes. This paper shares an overview of steps and challenges building a linked data proof of concept and a readout. The linked data proof of concept is linking analysis results end to end from clinical study reports to source data.

SI-068 : Lesson learned from a successful FDA Oncology Drug Advisory Committee (ODAC) meeting
Weiwei Guo, Merck & Co., Inc..
Chintan Pandya, Merck & Co., Inc..
Christine Teng, Merck & Co., Inc..

In many instances, before Food and Drug Administration (FDA) making a regulatory decision regarding potential approval of a new drug, the clinical data will be reviewed and debated by an advisory committee (ACM). The advisory committee then provides independent opinion to FDA with their recommendation of approval or rejection. Oncologic Drugs Advisory Committee (ODAC) is the committee responsible to review clinical safety and efficacy data in oncology therapeutic area. ODAC members provide their opinion to FDA for oncology drug application. To have a well-prepared presentation in the ODAC meeting, the sponsor needs to prepare materials related to the potential concerns which could be raised during the meeting. In addition, simulated meetings before the actual ODAC meeting may generate more questions that would require additional analyses. The ODAC preparation process is very challenging for all functional groups that are involved. This paper will share our lesson learned experience from programming perspective.

SI-074 : A Strategy to Develop Specification for R Functions in Regulated Clinical Trial Environments
Aiming Yang, Merck & Inc
Yalin Zhu, Merck & Inc
Yilong Zhang, Merck & Inc.

A well-written programming requirements specification is an essential document for a regulatory compliant and software development life cycle (SDLC). It is a technical document outlining the requirement and usage of a specific computer program. With the growing interest in using R for regulated clinical trial data analysis, there is a need to document specifications of R functions within a R package. The specification generation has direct implications on source code and file management. In this paper, we propose a strategy of using Roxygen to develop the specification of an R function with examples. We will conduct demos for the specification generating process and present real examples based on a recently developed R package (r2rtf, available on CRAN. We hope the proposed specification documentation process can help pharmaceutical industry standardize the SDLC for internal R package development.

SI-076 : The Search for a Statistical Computing Platform
Susan Kramlik, Merck & Co., Inc.
Eunice Ndungu, Merck & Co., Inc.
Hemu Shere, Merck & Co., Inc.

The clinical trial statistical analysis and reporting computing platform should facilitate efficient, quality and compliant development and delivery of statistical analysis and reporting outputs. While there have been commercial products developed with the aim to achieve this, the solution that currently seems to best meet our company's needs is a system built within the company. Key aspects of the environment, such as hardware, operating system, audit trail, traceability, access control, and security all play a part in enabling these facets. They are considerations when building a new clinical trial computing platform or enhancing an already existing one. Additionally, a modern computing platform must be scalable to meet changing regulatory requirements, varying data standards and structures, and be agnostic to evolving technology. The ideal next generation platform has eluded the pharmaceutical industry. In this paper, we will discuss Merck's journey in this arena.

SI-083 : Agile Project Management in Analysis and Reporting of Late Stage Clinical Trial
Sarad Nepal, Merck
Uday Preetham Palukuru, MERCK
Peikun Wu, Merck
Madhusudhan Ginnaram, Merck
Ruchitbhai Patel, Merck
Abhilash Chimbirithy, Merck & Co.
Changhong Shi, Merck
Yilong Zhang, Merck & Co. Inc.

Analysis and reporting in late stage clinical trials requires careful planning especially when multiple projects are ongoing at the same time. The current practice typically uses a waterfall project management structure that assumes one phase to be completed before another can start. When a clinical trial or clinical program nears data base lock, the time demands from statisticians and programmers increases due to both planned and unplanned activities. Clinical statistical reports (CSR) that are required to be delivered in a short period of time after database lock also contribute to the increase in challenges and time put towards the project. Several agile management tools including Confluence and Jira can be used to streamline the project plan in an agile way. In this paper, we present details of how agile project management tools were utilized to enhance team collaboration and ensure timely project delivery. We demonstrate this idea using a recently completed clinical program that involved three studies which were ongoing in parallel and overlapping database lock dates. Analysis and reporting team comprising of several programmers and statisticians collaborated and worked efficiently to deliver five CSRs and multiple submission packages. These deliverables were successfully submitted to multiple regulatory agencies without any technical issues reported.

SI-084 : A Process to Validate Internal Developed R Package under Regulatory Environment
Madhusudhan Ginnaram, Merck
Amin Shirazi, Merck
Simiao Ye, Merck & Co., Inc.
Yalin Zhu, Merck
Yilong Zhang, Merck & Co. Inc.

The use of R in regulated environments has been on the rise and is playing a role in the submission process of clinical trial data to regulatory agencies. Validation is an important part of R package development and is an essential step in the regulated environment. In this paper we propose a detailed validation procedure for testing R packages under regulatory environment especially for Analysis and Reporting (A&R) deliverables in clinical trials. In general, R package testing involves 3 major stages: planning, implementation and reporting. Our proposal is following the same principles that can enhance the traceability and robustness of an R package development within an organization. The proposed procedures help an organization to develop an R package following the risk-based approach to assess R package within a validated infrastructure ( suggested by R Validation hub. We use a recently developed R package r2rtf ( to illustrate the proposed procedures for R package validation. We hope this proposed validation procedure can provide much needed guidance on utilizing R with adequate testing for clinical data analysis within an organization and eventual submission to regulatory agencies.

SI-108 : Bookdown and Blogdown: Using R packages to document and communicate new processes to Clinical Programming
Benjamin Straub, GlaxoSmithKline

The landscape of clinical reporting is changing fast. Open source languages that used to be off-limits for clinical reporting or seen as only niche hobbies are now being embraced industry wide. Providing documentation and communications surrounding the use of an open source language can be a herculean task for a Clinical Programming department. At GSK, the Clinical Programming (CP) department has decided to embrace the use of the R language as its open source language of choice. The choice of R has allowed the department to make great use of two packages, bookdown and blogdown, to help document and communicate the use of R for clinical reporting. This paper will discuss GSK-CP's journey with adoption of R into its process while focusing on how bookdown and blogdown has helped to accelerate and codify our use of R. Readers will be given a brief background of how R was initially adopted within GSK-CP, issues surrounding R that were identified and solved, processes that were created to generate interest and desire to use R, positive spillover effects in our SAS processes and where the department sees itself in the coming years. Readers are encouraged to look at the Advanced Programming companion paper for a deep-dive discussion on the technical and programming aspects of the R packages bookdown and blogdown.

SI-166 : Delight the Customer using agile transformation in Clinical Research
Aman Bahl, Syneos Health
Steve Benjamin, Syneos Health
Hrideep Antony, Syneos Health USA

In the new era of drug development, the pharmaceutical industry needs new solutions to meet rapidly changing client and patient necessities. The healthcare industry is under increasing pressure to improve quality, cost efficiency, and to come up with innovative solutions to keep up to the phase of rapidly evolving markets. Applying Agile methodologies will allow clinical organizations to collaborate, focus swiftly, and diligently prioritize the innovations and developmental tasks that matter the most, resulting in a shorter developmental timeframe. Agile methodology was created as a response to the inadequacies of traditional development methods such as the Waterfall method that are traditionally used in the drug development process. Agile teams are usually small, self-governing, cross-functional groups dedicated to solving complex problems rapidly. In an agile process, the project progress is examined frequently and if a chosen course of action isn't delivering results, then the direction of the project execution path is swiftly changed. The Covid-19 pandemic has affected clinical trials around the world. The traditional drug developmental process is being met with unforeseen challenges. Given all the challenges that the industry is facing, agile frameworks might provide new value in supporting teams along their journey from groundbreaking discoveries to successful drugs in the market. In this paper, we will be discussing in detail how clinical organizations can adopt agile methodology/mindset, the benefits of adopting agile, and some of the key learnings along with agile success stories.

SI-173 : The Tortoise and the Hare - Lessons Learned in Developing ADaM Training
Alyssa Wittle, Covance, Inc.

A challenge: increase the number of outstanding ADaM spec writers and programmers. The reasons are obvious for why this is an essential part of our business. More outstanding ADaM programmers and spec writers mean less QC issues, more time & money saved, less Pinnacle21 findings, less time in review cycles, and more. Plus, outstanding programmers tend to make outstanding spec writers and eventually become subject matter experts who provide invaluable education and information to the rest of a team. There are a few routes to take in determining the most effective approach with the highest level of effectiveness. About a year into the redesign of our ADaM Mentoring Program, a few strategies have been applied and shown to have success. This paper will discuss the options considered, the strategy for picking candidates for training, and the key training aspects which have shown improvement and positive feedback from every trainee who have gone through the program. Spoiler alert: slow and steady wins the race!

SI-196 : Chasing Master Data Interoperability: Facilitating Master Data Management (MDM) Through CSV Control Tables that Contain Data Rules that Support SAS and Python Data-Driven Software Design
Troy Hughes, Datmesis Analytics

Control tables are the tabular data structures that contain control data-the data that direct software execution and which can prescribe dynamic software functionality. Control tables offer a preferred alternative to hardcoded conditional logic statements, which require code customization to modify. Thus, control tables can dramatically improve software maintainability and configurability by empowering developers and, in some cases, nontechnical end users to alter software functionality without modifying code. Moreover, when control tables are maintained within canonical data structures such as comma-separated values (CSV) files, they facilitate master data interoperability by enabling one control table to drive not only SAS software but also non-SAS applications. This text introduces a reusable method that preloads CSV control tables into SAS temporary arrays to facilitate the evaluation of business rules and other data rules within SAS data sets. To demonstrate the interoperability of canonical data structures, including CSV control tables, a functionally equivalent Python program also ingests these control tables. Master data management (MDM) objectives are facilitated because only one instance of the master data-the control table, and single source of the truth-is maintained, yet it can drive limitless processes across varied applications and programming languages. Finally, when data rules must be modified, the control data within the control table can be changed only once to effect corresponding changes in all derivative uses of those master data.

SI-203 : R Validation: Approaches and Considerations
Phil Bowsher, RStudio Inc.
Sean Lopp, RStudio

RStudio will be presenting an overview of current developments for validation in R for the R user community at PharmaSUG. This talk will review various approaches that have developed in the pharma community when using R within the regulatory environment. This is a great opportunity to learn about best practices when approaching validation in R. No prior knowledge of R/RStudio is needed. This short talk will provide an introduction to the current landscape of validation as well as recent developments. RStudio will share insights and advice from the last 6 years in helping pharma organizations incorporate R into clinical environments. This presentation will highlight many of the current approaches to validation in adding R (and some python) to an GxP environment.

SI-213 : Manage TFLs Development in LSAF 5.3 using SAS and R code
Stijn Rogiers, SAS
Jean Marc Ferran, Qualiance

TFL development and validation is often a labor-intensive task during the conduct of clinical trials where collaboration between multiple programmers across the globe is often needed while oversight of progress can be difficult to tack. Life Science Analytics Framework (LSAF) 5.3 enables users to manage and execute both SAS and R programs in the same platform in addition to provide advanced features around versioning, eSignature and use of Workflows. These features coupled with an advanced API can enable to customize and streamline the definition and production of TFLs across studies and projects. This presentation will feature, based on a prototype, how to specify TFLs, follow their development lifecycle, validation status using both SAS and R programs and keep control over development activities during the course of a study.

Submission Standards

SS-016 : Design ADaM Specification Template that Simplifies ADaM Programming and Creation of Define XML in CDISC Era
Xiang Wang, Bristol-Myers Squibb
Daniel Huang, Bristol-Myers Squibb

Regulatory agencies such as FDA require sponsors to implement CDISC standards for any clinical studies started after December 17, 2016, thus it is essential if companies can automate the process from ADaM specifications to Define.xml and ensure they are fully CDISC compliant. To our knowledge, many companies still conduct heavily manual work on this process. In this paper we exhibited a well-designed ADaM specification template that is user-friendly for ADaM datasets programming, and then using R tool to convert the specification in the format that is ready to be imported into Pinnacle 21 Enterprise (P21E) to create ADaM define.xml. This new approach helps statistical programmer for creating ADaM datasets more smoothly and generating Define.xml more efficiently.

SS-055 : BIMO SAS Macros and Programming Tools
Mi Young Kwon, Regeneron, Inc.
Rohit Kamath, Regeneron, Inc.

As part of the regulatory review process, FDA conducts site inspections to ensure that clinical investigators, sponsors, and Institutional Review Boards (IRBs) comply with regulations. The current submission format for study data in NDA and BLA packages does not facilitate efficient site selection for FDA because these data are submitted as subject-level data. Therefore, FDA has requested that pharmaceutical companies submit data that describes the characteristics and outcomes of clinical investigations at the site level. FDA uses this data to plan their site inspections. The submission of data for this request is also sometimes referred to a Bioresearch Monitoring Program (BIMO) submission because the data is placed in the BIMO section of Module 5 in the eCTD. FDA BIMO develops guidelines for inspections of clinical investigators, sponsors, and IRBs. Submission teams will make a proposal as to which studies are within scope of BIMO and which studies are not in scope for BIMO with their FDA review team during pre-NDA or pre-BLA communications to determine what Summary Level Clinical Site data to provide for their submission. Per FDA guidance document Specifications for Preparing and Submitting Summary Level Clinical Site Data for CDER's Inspection Planning, the site-level dataset should contain data from all major (e.g. pivotal) studies used to support safety and efficacy in the application, including studies with different treatment indications. This paper will introduce SAS macros and programming tools to automate and standardize the BIMO dataset and listing generation per regulatory authorization requirements and standards for electronic submissions.

SS-058 : Shiny App for Generating Data Dependency Flowchart in ADRG
Simiao Ye, Merck & Co., Inc.
Yilong Zhang, Merck & Co. Inc.

Analysis Data Reviewer's Guide (ADRG), which provides agency reviewers with context for analysis datasets and terminology, is an important part of a standards-compliant analysis data submission for clinical trials. Within ADRG, the data dependencies section is used to describe any dependencies among analysis datasets, where a flowchart (diagram) is recommended to demonstrate the relationship concisely and intuitively. This paper discusses how to utilize Shiny, an open source R package for building web applications, to automatically generate data dependency flowchart through reading in define.xml and adapting user-input dataflow. The shiny app simplifies manual steps to create the analysis datasets flowchart and ensures the flowchart matches with define.xml. Implementation of the app, relevant package/function details, and examples will be provided in this paper.

SS-072 : Challenges and Solutions for an ISM Development
Siru Tang, MSD
Yanhong Li, MSD

The integrated summary of microbiology (ISM) is the special integrated summary of efficacy (ISE) in antibacterial drug submission for regulatory approval. Designing integrated datasets and presenting microbiology results are very challenging for programmers due to the complexity of data structures and logics. This paper discusses the challenges we met recently when working on an ISM. It elaborates how we transferred external excel data to internal SDTM format data, and how we adopted the data standards and pooled multiple studies' data sets together. It provides the method about how to merge treatment groups for a better display format in the reports. It also shares some tips about how we designed ADMB and ADMS datasets to include microbiological responses, pathogen names and treatment groups so that we could generate them all in one table. It then introduces how we composed the submission document-ADRG for this ISM.

SS-073 : New Requirements of Clinical Trial Data Submission for China Filing: An Implementation
Yuan Yuan Dong, MSD
Wang Zhang, MSD
Lili Ling, MSD

The National Medical Products Administration (NMPA) has released the Guideline on the submission of clinical trial data in October 1st, 2020, which demonstrated the specific requirements to data submission to agency in China filings. This brought new opportunities and challenges to all sponsors. This paper introduces the new requirements of China data submission guideline, as well as two types of execution processes which we used in recent pilot filings as implementations to the guideline. Moreover, the paper summaries the benefits and issues that we encountered during the preparation of submission packages.

SS-110 : Assembling Reviewer-friendly eSubmission Packages
Shefalica Chand, Seattle Genetics, Inc.

One thing 2020 has taught us and reinforced is the importance of getting novel treatments out faster to the patients. Pharma and Biotech companies are constantly racing against time to develop new therapies as fast as possible, but once the drug is submitted for regulatory review, the clock switches hands to FDA for review and decision-making. A fair question, then, is what Biometrics staff preparing submissions could do in advance to help facilitate and expedite this review and decisioning process. FDA regularly shares guidelines, specifications, recommendations, and general considerations for sponsors to submit standardized study data for animal and human trials in the form of documents like the Study Data Technical Conformance Guide, Electronic Common Technical Document (eCTD), FDA Data Standards Catalog, etc. Apart from these guidance documents and industry practices, there are several methods and good practices sponsors can adopt to prepare an optimally accessible eSub package that is friendlier for FDA reviewers. This paper will share some of those practical processes that can help expedite the review by improving package navigation and proactively submitting helpful non-standard components, which can in turn help reduce unnecessary FDA questions and resulting delays during the review. We will be sharing practical examples, best practices, list of do's and don'ts, useful checklists, etc., to help teams prepare an optimally "reviewer-friendly" eSub package.

SS-137 : Effective Approach for ADaM Submission to FDA and PMDA
Ranvir Singh, Mr.

Regulatory Authorities are continuously updating their submission requirements, like USFDA released the Study Data Technical Conformance Guide V4.6 in Nov'2020 and PMDA revised the Technical Conformance Guide on e-Study Data Submissions in Jan'2019. Accordingly, CDISC ADaMIG versions are being continuously evolved. Currently USFDA is accepting the ADaM datasets as per ADaMIG V1.1 whereas PMDA requires as per V1.0. Hence, it is a challenge to adhere to the differences between the guidelines of these two agencies. If any Sponsor needs to get approval for a Drug in US and Japan both then Study data must be in compliance to both the regulatory requirements and meet CDISC standards. From CDISC perspective, in place of creating separate ADaM datasets for USFDA as per V1.1 and for PMDA as per V1.0. Here, the suggested approach is to analyze the difference between two regulators and accommodate requirements in such a way and create one ADaM which is optimum for both agencies. This paper provides some examples of ADaM Submission Package requirements w.r.t. the Standard catalogue/Control Terminology/Required Documents like ARM/ADRG and Define file and also the handling of Pinnacle 21 issues (Error/Warning/Reject).

SS-171 : A Lead Programmer's Guide to a Successful Submission
Pranav Soanker, Covance
Santosh Shivakavi, SCL IT Technologies

A successful NDA (or other submissions) depends upon a robust Biostatistics/Biometrics department which usually consist of few statisticians and many statistical programmers among others. It requires great co-ordination with all stake holders (CDM, External vendors, Medical Writing, CROs etc.,) and great understanding of core competencies of each group to achieve a high-quality submission ready package. The programmers usually have a wide and diverse educational background, with overwhelming majority without a statistical degree or clinical trial knowledge depending upon their experience. Thus, the role of a Lead Programmer is very critical as a liaison between programmers and statistician. In this this paper we walk through in detail some of the checks a Lead Programmer should do as part of Senior Review (for Raw Data, Protocol/SAP, SDTM & ADaM data, TLFs, Define package, ADRG & SDRG) in addition to the CDISC compliance checks and independent validation. This significantly enhances the quality of the submission and instills great confidence to the statistician and minimizes rework prior to a submission.


EP-004 : Migration of SAS Data from Unix server to Linux server
Hengwei Liu, Daiichi Sankyo Inc

When a company decided to migrate SAS data from Solaris Unix server to Red Hat Enterprise Linux (RHEL) server, one issue was how to deal with the formats catalog files. The formats catalog files are not platform independent. If they are copied from Unix server to Linux server, they will not be of any use in the Linux server. On the other hand, SAS datasets can be used across different platforms. It was decided to convert the formats catalog files to SAS datasets in the Unix server, migrate those SAS datasets to Linux server, and convert them back to formats catalog files. In this paper we discuss the migration process. Unix command and SAS programs were used for this project.

EP-006 : Handling Patient/Safety Narrative Header by SAS Programming
Pareshkumar Paghdar, IQVIA India(RDS)
Jayesh Patel, IQVIA India (RDS) Pvt Ltd

Narrative is a first person 'story' written by a clinician that describes a specific clinical event or situation. A patient safety narrative provides a full and clinically relevant chronological account of the progression of an event experienced during or immediately following a clinical study. Events that will often require a narrative are deaths, other serious adverse events, and significant adverse events determined to be of special interest because they are clinically relevant. SAS programming is useful to get the require data of patient easily from the database and can be tabulate in header part of the narrative. Narrative header express the data and base for Clinician/writer to write narratives from the ready available data from header. Instead of sifting through a variety of listing sources (demographics, medical history, adverse events, concomitant medications, etc.) for the subject of interest, the information of that one subject can be listed or graphically presented together in one place. one file is produced per subject: - Producing individual files can make validation and review seem easier since they can be done on a case by case basis, however, management of the potentially large number of files can be annoying for the medical reviewers. All individual narratives for a given batch are included in one large file. - Dumping many profiles into one large file makes file management easy, but review of all the narratives becomes tedious when changes required for one narrative can inadvertently affect many others.

EP-013 : Reduce Review Time: Using VB to Catch Spelling Errors & Blank Pages in TLFs
Corey Evans, LLX Solutions, LLC
Ting Su, LLX Solutions, LLC
Qingshi Zhou, LLX Solutions, LLC

Checking blank pages and spelling errors is usually time consuming and monotonous. We developed a macro using VB embedded in SAS that will assist with checking for unwanted blank pages generated from SAS ODS procedures. With one click, the macro will run and return an excel file with the desired results populated for all the outputs under one folder. Also, the utility can help to check spelling errors and populate unique errors/counts in the excel file. Additionally, with some minor adjustments, this macro can also perform these same tasks for PDF files.

EP-029 : A Macro for Point Estimate and Confidence Interval of the Survival Time Percentiles
Hengwei Liu, Daiichi Sankyo Inc

The PROC LIFETEST in SAS® creates a table called quartiles. It contains the point estimate and confidence interval for the 25, 50 and 75 percentiles for the survival time. PROC LIFETEST doesn't have an option to calculate the point estimate and confidence interval for any other percentiles for the survival time. In this paper a macro is set up to calculate the point estimate and confidence interval for the survival time percentiles. The development is done with SAS version 9.4_M4 with SAS/STAT 14.2 in Linux.

EP-039 : Data Visualization Using GANNO, GMAP and GREMOVE to Map Data onto the Human Body
Rui Huang, RegeneronPharmaceuticals
Toshio Kimura, Regeneron Pharmaceuticals
Jennifer McGinniss, Regeneron Pharmaceuticals

In many therapeutic areas, symptom severity is captured separately for different body regions. Reporting these data through traditional summary tables by body regions and symptoms will be difficult for readers to understand and interpret. In contrast, data visualization tools can communicate complex information in an easily digestible format. This paper demonstrates two visualization methods in SAS to map data onto the human body where colors and symbols represent severity and symptoms in different body regions. The first approach uses the SAS GANNO function on a background image of the human body to overlay symbols on impacted body regions via X, Y coordinates with Unicode symbols indicating symptoms , size representing proportion of patients and color representing severity. The second approach handles the human body as geographic regions. This method requires a map dataset of the body regions which can be generated in SAS JMP through the Custom Map Creator add-on. SAS geographical map functions such as GREMOVE can be used to further customize the map by combining body regions, and the GMAP function is used to display data onto those regions as colors based on their value (such as symptom severity). Various applications and alternative forms can be used to highlight treatment effect including splitting the body into half where one side is placebo and the other side is the active treatment or one side is baseline and the other side is the final visit. Figure examples will be provided to demonstrate how the methods compare.

EP-057 : Generating .xpt files with SAS, R and Python
YuTing Tian, Vertex
Todd Case, Vertex Pharmaceuticals

The primary purpose of this paper is to first lay out a process of generating a simplified Transport (.xpt) file with RStudio and Python to meet study electronic data submission requirements of the Food & Drug Administration (FDA). The second purpose of this paper is to compare the .xpt files created from three different languages: R, Python and SAS. The paper is the expansion of the original FDA guideline document "CREATING SIMPLIFIED TS.XPT FILES", published in November, 2019. Transport files can be created by SAS, as well as open source software, including R and Python. According to the FDA guideline document mentioned above, .xpt files can be created by R and Python. This may allow Pharmaceutical companies to expand use of R and Python beyond data visualization and statistical analysis currently being generated by these two languages. Hopefully, readers can use the process shown in the paper as a template to create .xpt files.

EP-070 : First Time Creating a Submission Package? Don't Worry, We Got You Covered!
Lyma Faroz, Seattle Genetics

Creating a clinical trial data package for electronic submission to a regulatory agency is a daunting task. There are many steps that must be executed with precision and efficiency to create a good quality submission package. If you are working on a submission study for the first time, then this paper is for you. Each electronic submission contains 5 modules; however, this paper will focus on steps involved in creating data set components for Module 5. This includes validating SDTM and ADaM data sets using Pinnacle 21 Enterprise software, generating reviewers' guides (cSDRG and ADRG), creating the define.xml for SDTM and ADaM, and much more! We will also look at important points from FDA's Study Data Technical Conformance Guide and sprinkle in various lessons learned from earlier submissions in pulling all this together to create a high-quality submission package.

EP-077 : Introducing and Increasing Compliance of Good Programming Habits
Alan Meier, Cytel

SAS code is often shared within a task, project and organization. The better written the code is, the easier it is to understand, use, modify and validate. Improving programmers' coding habits increases the productivity of the entire team. The challenge is getting programmers to learn and use good habits. To achieve improvement, a manager first needs to define what good coding habits are. These include in-program documentation, style requirements, use of modular programming techniques, rules on hardcoding, and naming conventions, to name a few. These habits then need to become part of the organization's expectation, usually through SOPs or Work Practice documents. Most programmers have their own styles. Getting them to learn new habits can be challenging. Like any change, the manager needs to role them out to the staff by explaining the benefits, building excitement around them, educating staff, allow time to practice and then require their usage. Finally, the manager needs to ensure compliance to the new requirements.

EP-085 : Making Customized RTF Output with R Package r2rtf
Huei-Ling Chen, Merck
Heng Zhou, Merck & Co.
Madhusudhan Ginnaram, Merck
Yilong Zhang, Merck & Co. Inc.

The Rich Text Format (RTF) file is a typical output file format for reporting, conventionally created with SAS procedures. However, with R's increasing popularity in the industry, there is also a growing need to generate RTF files with R code. Among many existing R functions and packages to create RTF files, a newly developed open-source code package, r2rtf, stands out. Equipped with easy-to-use features and powerful functions, it allows R users to produce various RTF outputs effortlessly. This paper uses three examples to demonstrate different methods of creating rtf tables using this package. We have papers presented in PharmaSUG 2019 and 2020 that showed two of these same outputs produced using SAS macros.

EP-087 : Make Your Life Easy! How SpotFire and SAS together can help with better Data Review and Data Quality
Raj Sharma, Fate Therapeutics

Data quality and data review to identify data issues are both very important to ensure quality, accuracy and integrity of the data analysis in clinical studies. Often, depending on size of company and tools available, clinical data review could be very cumbersome, time consuming and manual. It leads reviewer to miss critical issues which leads to poor quality of data hence less accurate results. Spotfire along with SAS, with all there features, ensures very efficient data and timely data review. Many different functions can also use SpotFire visualization for their own benefits. For ex- SAS programmers can easily identify data issues critical for analysis or do quick informal qc on data, Clinical Scientist/Data Monitors can easily and optimally conduct data review by drilling down data using SpotFire features, Safety Science/Pharmacovigilance can easily identify safety issues with subjects and Data Managers can get a quick summary of outstanding data issues. This paper talks about an actual use case and how SpotFire was utilized to achieve the propose of quality and timely data review and also how different including statistical programming can benefit using SpotFire for data visualization and data review.

EP-105 : Restriction of Macro Variable Length - Dynamic Approach to Overcome
Kalyani Telu, Uniformed Services University of the Health Sciences

Macro variables are one of the most powerful tools available in SAS with the maximum length of 65534 characters. Programmers sometimes face the situation to store data larger than this allotted limit, and it is difficult to overcome. This paper discusses the scenarios where the user needs to store large data and dynamic approaches to circumnavigate the maximum limitation.

EP-122 : Interactive Clinical Dashboards using R Studio
Syam Prasad Chandrala, Allogene
Chaitanya Chowdagam, Ephiicacy

Data review listings are the standard way of reviewing data by Study Management Team (SMT) involving cross functional groups like clinical science, clinical operations, data managers and biostatisticians. These reports are produced in the form of pdf, excel or rtf files or sometimes combination of these file formats. Users in SMT team prefers these reports in formats such as excel or pdf. Some user groups don't prefer excel since they are editable. Data Managers and clinops prefer excel since it is easy to identify the data issues. Statistical programmer needs to spend lot of time in generating these reports based on the preference of each user groups, which adds burden during tight timelines. Flexdashboard package in R is a powerful tool to create these reports in the form of listings and figures and publish them in a webpage using RStudio Connect. DT and Plotly packages can be used with flexdashboard to make these reports interactive and customizable. In addition to this, other functionalities can be added in such a way that this report can be copied to the clipboard or can be generated in various file formats such as pdf or excel files. This paper will demonstrate how to create and publish interactive reports in the form of a Clinical Dashboard using various R packages.

EP-123 : Intelligent Axis Scaling in SAS plots: %IAS
Sidian Liu, Genentech Inc,
Toshio Kimura, Regeneron Pharmaceuticals

When creating graphs to visualize statistical analysis, the intuitiveness of the axis boundary and interval values can considerably impact the ease of interpretation. The default SAS plotting algorithm focuses primarily on structure where the figure is centered and covers a substantial proportion of the canvas. However, the SAS default approach may compromise interpretability. Figures were generated from simulated data to compare SAS defaults and the proposed algorithm. In some cases, the SAS default axis values were outside of the axis labels being printed: for example, when the actual values range from 0.7 to 4.1, the printed tick marks by the SAS default algorithm were 1 to 4 whereas a more preferable range would have been 0 to 5. We propose an alternative intelligent axis scaling algorithm %IAS, implemented through a SAS macro that addresses this concern by focusing on the interpretability of the y-axis. The algorithm has 3 main properties: 1) Boundary and interval values are multiples of power of 10; 2) Number of intervals is between 5 to 10; and 3) Figure is still centered and covers a substantial proportion of the canvas. The algorithm places a high emphasis on selecting easily interpretable critical values such as 0 and multiples of the power of 10 as the min/max boundary values and more logical intervals as reference tick marks. The values selected by this alternative algorithm as opposed to the SAS defaults will facilitate the interpretation of the graphs being generated.

EP-124 : No Black Boxes: An Utterly Transparent Batch Randomization System in R
Dennis Sweitzer, YPrime

Randomized trials are a key part of clinical research, but are often implemented as somewhat of an afterthought. In order to accelerate the delivery of high quality randomization schedules, we implement a batch stratified permuted block randomization system which includes: A flexible spreadsheet based syntax to structure the randomization as a series of 'superblocks' within each strata, cohort, or combination, easing the use of best- practices such as variable block sizes, and relaxed or mixed permuted block methods; Complete computational transparency, with an audit trail from parameters and random numbers through block selection and unit assignments, and implemented in open-source R code; Simulations by default for testing, validation, or rerandomization analyses using pseudo-random number streams generated by the Wichmann-Hill 2006 algorithm for repeatable and independent parallel random number sequences; Reporting functions to measure quality of generated schedules on metrics such as potential selection bias, treatment imbalance, and risk of under-enrolling strata. The resulting system allows rapid implementation, testing, validation and documentation.

EP-125 : Intermediate Dataset for Oncology Efficacy Endpoint ADaM Data - Here Come Some Details
Yaling Teng, Amgen

Despite wide acceptance of using intermediate dataset for the endpoints derivation by the industry, it has not been implemented sufficiently to fully realize the expectted benefits, which include, among others, facilitating complex derivation and improving the traceability. This paper reviews the example intermediate datasets presented in the two TAUGs and contains the author's design of intermediate datasets for oncology efficacy endpoint datasets. This design includes certain derivation that is usually done in the endpoint ADaM datasets (eg, ADTTE) in the intermediate datasets. The derived information in the intermediate dataset will then be used to further derive multiple related endpoints in the endpoint ADaM dataset. For example, 'Progression-free Survival (PFS)' and 'Duration of Response' share the same derivation of disease progression event/censor status and the date. Therefore this desgin can realize the aforementioned benefits of intermediate datasets. In addition, this design reduces the repetitive derivation algorithm of related endpoints in the endpoint ADaM data by referencing the intermediate data. Another benefit of this design is to address the needs of multiple censoring rules for PFS to support primary and sensitivity analyses. With this design, for each defined censoring rule, the main derivation will be performed once in the intermediate dataset and then be used for a series of endpoints. Data flow chart of oncology efficacy data, metadata and example datasets are provided to elaborate the design. SAS macro idea is also proposed for related data processing.

EP-143 : Customizing define.xml files
Steffen Müller, mainanalytics GmbH

A metadata description following the Define-XML standard is a key component of the electronic data submission package that is sent to the health authorities in drug approval processes. Since those authorities, e.g. FDA and PMDA, have differing requirements, it is useful to have a process in place that is able to convert a submission package from one standard to the other in a mostly automated way. An update of the contained controlled terminology or other components could also be a reason to update an existing submission package. On our poster we will show how such a conversion could be done using standard tools such as SAS, Excel, and Pinnacle 21 Community, focusing on the update of the contained define.xml file.

EP-146 : e-Submissions key difference of FDA and PMDA
Giri Prasad, Covance Pvt ltd
Hareeshkumar Gurrala, Covance pvt ltd

Recent drug developments are done in a more globally integrated way and similar clinical data package should be used for regulatory submissions. Preparing regulatory submissions in the pharmaceutical industry can be a very stressful experience due to the large amount of work and the need to research and understand regulatory and e-submissions requirements. FDA (Food and Drug Administration) and PMDA (Pharmaceuticals and Medical Devices Agency) are playing a pivotal role and are responsible for securing public health by assuring that marketed and new medical products are safe and effective, and assessment of potential new therapies are done properly. FDA and PMDA has started accepting electronic study data (e-study data) submissions from 2016. The e-submission requirements of FDA are same as PMDA. However, there are Key differences in the requirements between both health authorities. Sponsors need to understand the differences precisely and consider efficient processes to meet requirements of each health authority (HA). This Poster illustrates about e-Submissions and focuses on key differences between FDA, PMDA requirements.

EP-157 : Fast, Efficient, and Automated Real-time Analytics for Clinical Trials
Aditya Gadiko, MMS Holdings
Christopher Hurley, MMS Holdings
Anish Iyer, MMS Holdings

Opportunities are boundless for real-time visual analytics to provide rapid, actionable insights from essential clinical trial information. Dashboards based on standard CDISC data domains empower trial stakeholders to quickly react to the tens of thousands or even hundreds of thousands of data points collected throughout the lifecycle of a clinical trial. Operational data can be displayed in dashboards to significantly improve the efficiency of the trials. Data visualization can be very helpful when looking for hidden trends in the data and assessing risks from any aspect of the trial. Dashboards can be used to augment the establishment of efficacy, tolerability, and safety profiles of investigational therapies. Application Programming Interface (API) functionality with near-real-time visibility into the EDC domains provide frequent data updates when time is of the essence. Standardized data models like SDTM and ADaM provide an excellent source for more result-oriented dashboards where endpoint summarization and advanced analytics are in demand by the study team. This paper will present visual analytics within the context of a typical study. It will show the end-to-end process starting from automated recurring EDC data feed, transformation, carrying through to the end of the process of visualizing, summarizing, and gaining valuable insights into trial safety, efficiency, and more.

EP-158 : NONMEM: An Intricate Dataset Programming defined in a simpler way !!
Ravichandra Hugar, Covance
Sandeep Lakkol, Covance

Pharmacokinetics is the study of what the body does to a drug whereas Pharmacodynamics is often summarized as the study of what a drug does to the body which helps explain the relationship between the dose and response, i.e. the drug's effects. PK/PD analysis guides in taking critical decision for the drug development such as optimizing the dose, frequency and duration of exposure. Selecting the tools for making such decisions is equally important. Fortunately, PK/PD analysis software has evolved greatly in recent years. The effect of the drug on the target population of patients is predicted using nonlinear mixed effects modelling software called NONMEM Analysis (a powerful software tool for population pharmacokinetic/ pharmacodynamics (PK/PD) modeling), which is becoming more and more common in clinical trials. Here the SAS Programmer's role is to create a standard dataset structure ready to be compatible for NONMEM software model. This is often the nightmare part of the process because it involves collecting multiple domains of datasets together along with the time dependent / independent covariate variables that are needed for the analysis, which includes imputation rules for covariates. And thereby getting the desired NONMEM data structure comprising of dosing & concentration records. This paper provides the simplest way for the Programming group in clearly understanding the characteristics and programming methods in achieving the required model structure. This also overcomes the problems or challenges that may arise and finally minimizing the time taken for creating the final NONMEM dataset.

EP-167 : Forward Planning: How to get the most of your eCOA data
Frank Menius, YPrime
Monali Khanna, YPrime

The use of electronic clinical outcome assessments (eCOA) offers numerous advantages for clinical trials such as increased patient involvement, better understanding of patient experience, undisrupted data collection, and higher patient compliance. Clinical outcome assessments are generally assessed using standard validated questionnaires where the responses to items are collected and stored in a database. It is important to understand how collected questionnaire data will be utilized for the final statistical analyses. Once it is decided that clinical outcome assessments are to be used in a clinical trial, early planning as to how captured data is stored, transferred, and used for reporting purposes becomes important. Engaging in discussions early can help achieve efficient data capture, expedite database lock and regulatory approvals. This paper will discuss how streamlining processes and early engagement of stake holders from across the study life cycle create better oversight and faster delivery of conformant data. The paper will strive to outline key suggestions for interdepartmental team meetings and discussions. It will focus on topics such as protocol review; database design; understanding of how eCOA data is stored vs. transferred/integrated; roles and responsibilities of creating a data transfer specification; specific training to sites for minimizing missing and out of range data; identifying potential sources of missing data and it's handling; maximizing compliance and preserving timelines through data monitoring reports to sponsors and sites. Efficiencies can be gained across the eCOA data life cycle through early engagement on these topics.

EP-169 : Split Character Variables into Meaningful Text
Savithri Jajam, Covance
Lavanya Peddibhotla, Covance

All clinical trial data must meet the CDISC standard in order to go for a submission, and one of the primary criteria is to have 200 lengths for any character variable. One of the challenges to follow this is when we have data more than 200 and we have to split into multiple variables with meaningful text. To achieve this we can use different approaches. Few methods will be discussed in this paper with examples.

EP-178 : Containers you can Count On: A Framework for Qualifying Community Container Images
Eli Miller, Atorus Research

Several industry groups have explored using Docker and other container runtimes to supplement analysis and regulatory submissions. These projects are still on the cutting edge for where most of the industry is at, however there is a gap in how these images are qualified. As container images are immutable, they are prime targets for an initiative for qualifying new versions of libraries. They will behave the same on any runtime regardless of the host that is running the container. This paper will explore an initiative to qualify existing images that are published by rocker and RStudio as well as a process for qualifying any R based image. The initiative will first focus on qualifying well known R images but will expand to Python, Julia, and any other available open-source languages as resources become available.

EP-182 : Mashing Two Datasets Together
David Franklin,

The MERGE statement is the most common way to merge one-to-one or one-to-many data. This works very well most of the time but there are other methods that are useful, and sometime more efficient, that should be every SAS® programmers toolbox. This paper touches on four methods that can be more efficient; quick look at PROC SQL and some of the options that help, HASH tables and some of the considerations for using this format, PROC FORMAT, and the KEY= option in the SET statement.

EP-190 : Looking for the Missing(ness) Piece
Louise Hadden, Abt Associates Inc.

Reporting on missing and/or non-response data is of paramount importance when working with longitudinal surveillance, laboratory and medical record data. Reshaping the data over time to produce such statistics is a tried and true technique, but for a quick initial look at data files for problem areas, there's an easier way. This quick tip will speed up your data cleaning reconnaissance and help you find your missing(ness) piece. Additional tips on making true missingness easy to identify are included.

EP-191 : Only Get What You Need - To Simplify Analysis Data Validation Report from PROC COMPARE Output
Wenjun He, The Emmes Company, LLC

Independent programming is a gold standard validation method to generate accurate analysis data sets in biotechnology and pharmaceutical companies. A timely and efficient validation of derivative analysis datasets from dynamic source data is demanding for clinical programmers in phase I trials. SAS® software provides PROC COMPARE as a useful tool to identify differences between two data sets. In order to efficiently keep monitoring the validation status of the whole set of a study trial specific analysis datasets which are constantly updated to match the dynamic source data, this paper introduces the techniques to speed up identifying the discrepancies between each pair of production and validation datasets in various SAS® libraries and simplify the generation of a straightforward validation report by extracting and reorganize the output from PROC COMPARE.